How can I make an ambiguous Symbol like Y or R?
The IBU defines standard codes for symbols that are ambiguous such as Y to indicate C or T and R to indicate G or C or N to indicate any nucleotide. BioRuby represents these symbols as the same Bio::Sequence::NA object which can be easily converted to Regular expression that matches components of the ambiguous symbols. In turn, Bio::Sequence::NA object can contain symbols matching one or more component symbols that are valid members of the same alphabet as the Bio::Sequence::NA and are therefore capable of being ambiguous.
Generally an ambiguity symbol is converted to a Regexp object by calling the to_re method from the Bio::Sequence::NA that contains the symbol itself. You don't need to make symbol 'Y' by yourself because it is already built in the Bio::NucleicAcid class as a hash named Bio::NucleicAcid::Names.
#!/usr/bin/env ruby
require 'bio'
# creating a Bio::Sequence::NA object containing ambiguous alphabets
ambiguous_seq = Bio::Sequence::NA.new("atgcyrwskmbdhvn")
# show the contents and class of the DNA sequence object
p ambiguous_seq # => "atgcyrwskmbdhvn"
p ambiguous_seq.class # => Bio::Sequence::NA
# convert the sequence to a Regexp object
p ambiguous_seq.to_re # => /atgc[tc][ag][at][gc][tg][ac][tgc][atg][atc][agc][atgc]/
p ambiguous_seq.to_re.class # => Regexp
# example to match an ambiguous sequence to the rigid sequence
att_or_atc = Bio::Sequence::NA.new("aty").to_re
puts "match" if att_or_atc.match(Bio::Sequence::NA.new("att"))
if Bio::Sequence::NA.new("atc") =~ att_or_atc
puts "also match"
end
No comments:
Post a Comment