How can I make an ambiguous Symbol like Y or R?

The IBU defines standard codes for symbols that are ambiguous such as Y to indicate C or T and R to indicate G or C or N to indicate any nucleotide. BioRuby represents these symbols as the same Bio::Sequence::NA object which can be easily converted to Regular expression that matches components of the ambiguous symbols. In turn, Bio::Sequence::NA object can contain symbols matching one or more component symbols that are valid members of the same alphabet as the Bio::Sequence::NA and are therefore capable of being ambiguous.

Generally an ambiguity symbol is converted to a Regexp object by calling the to_re method from the Bio::Sequence::NA that contains the symbol itself. You don't need to make symbol 'Y' by yourself because it is already built in the Bio::NucleicAcid class as a hash named Bio::NucleicAcid::Names.



#!/usr/bin/env ruby

require 'bio'

# creating a Bio::Sequence::NA object containing ambiguous alphabets

ambiguous_seq = Bio::Sequence::NA.new("atgcyrwskmbdhvn")

# show the contents and class of the DNA sequence object

p ambiguous_seq # => "atgcyrwskmbdhvn"

p ambiguous_seq.class # => Bio::Sequence::NA

# convert the sequence to a Regexp object

p ambiguous_seq.to_re # => /atgc[tc][ag][at][gc][tg][ac][tgc][atg][atc][agc][atgc]/

p ambiguous_seq.to_re.class # => Regexp

# example to match an ambiguous sequence to the rigid sequence

att_or_atc = Bio::Sequence::NA.new("aty").to_re

puts "match" if att_or_atc.match(Bio::Sequence::NA.new("att"))

if Bio::Sequence::NA.new("atc") =~ att_or_atc

puts "also match"

end


Share/Bookmark

No comments:

Post a Comment


Powered by  MyPagerank.Net

LinkWithin