How do I make a Sequence from a String or make a Sequence Object back into a String?
A lot of the time we see sequence represented as a String of characters eg "atgccgtggcatcgaggcatatagc". It's a convenient method for viewing and succinctly representing a more complex biological polymer. BioRuby makes use of a Ruby's String class to represent these biological polymers as Objects. Unlike BioJava's SymbolList, BioRuby's Bio::Sequence inherits String and provide extra methods for the sequence manipulation. We don't have a container class like a BioJava's Sequence class, to store things like the name of the sequence and any features it might have, you can think of to use other container classes such as a Bio::FastaFormat, Bio::GFF, Bio::Features etc. for now (We have a plan to prepare a general container class for this to be compatible with a Sequence class in other Open Bio* projects).
Bio::Sequence class has same capability as a Ruby's String class, it is simple easy to use. You can represent a DNA sequence within the Bio::Sequence::NA class and a protein sequence within the Bio::Sequence::AA class. You can translate DNA sequence into protein sequence with a single method call and can concatenate them with the same method '+' as a String class's.
String to Bio::Sequence object
Simply pass the sequence string to the constructor.
#!/usr/bin/env ruby
require 'bio'
# create a DNA sequence object from a String
dna = Bio::Sequence::NA.new("atcggtcggctta")
# create a RNA sequence object from a String
rna = Bio::Sequence::NA.new("auugccuacauaggc")
# create a Protein sequence from a String
aa = Bio::Sequence::AA.new("AGFAVENDSA")
# you can check if the sequence contains illegal characters
# that is not an accepted IUB character for that symbol
# (should prepare a Bio::Sequence::AA#illegal_symbols method also)
puts dna.illegal_bases
# translate and concatenate a DNA sequence to Protein sequence
newseq = aa + dna.translate
puts newseq # => "AGFAVENDSAIGRL"
No comments:
Post a Comment