Showing posts with label Bio-Ruby. Show all posts
Showing posts with label Bio-Ruby. Show all posts

How do I make a Sequence from a String or make a Sequence Object back into a String?

A lot of the time we see sequence represented as a String of characters eg "atgccgtggcatcgaggcatatagc". It's a convenient method for viewing and succinctly representing a more complex biological polymer. BioRuby makes use of a Ruby's String class to represent these biological polymers as Objects. Unlike BioJava's SymbolList, BioRuby's Bio::Sequence inherits String and provide extra methods for the sequence manipulation. We don't have a container class like a BioJava's Sequence class, to store things like the name of the sequence and any features it might have, you can think of to use other container classes such as a Bio::FastaFormat, Bio::GFF, Bio::Features etc. for now (We have a plan to prepare a general container class for this to be compatible with a Sequence class in other Open Bio* projects).

Bio::Sequence class has same capability as a Ruby's String class, it is simple easy to use. You can represent a DNA sequence within the Bio::Sequence::NA class and a protein sequence within the Bio::Sequence::AA class. You can translate DNA sequence into protein sequence with a single method call and can concatenate them with the same method '+' as a String class's.

String to Bio::Sequence object

Simply pass the sequence string to the constructor.

#!/usr/bin/env ruby

require 'bio'

# create a DNA sequence object from a String

dna = Bio::Sequence::NA.new("atcggtcggctta")

# create a RNA sequence object from a String

rna = Bio::Sequence::NA.new("auugccuacauaggc")

# create a Protein sequence from a String

aa = Bio::Sequence::AA.new("AGFAVENDSA")

# you can check if the sequence contains illegal characters

# that is not an accepted IUB character for that symbol

# (should prepare a Bio::Sequence::AA#illegal_symbols method also)

puts dna.illegal_bases

# translate and concatenate a DNA sequence to Protein sequence

newseq = aa + dna.translate

puts newseq # => "AGFAVENDSAIGRL"


Share/Bookmark

How can I make an ambiguous Symbol like Y or R?

The IBU defines standard codes for symbols that are ambiguous such as Y to indicate C or T and R to indicate G or C or N to indicate any nucleotide. BioRuby represents these symbols as the same Bio::Sequence::NA object which can be easily converted to Regular expression that matches components of the ambiguous symbols. In turn, Bio::Sequence::NA object can contain symbols matching one or more component symbols that are valid members of the same alphabet as the Bio::Sequence::NA and are therefore capable of being ambiguous.

Generally an ambiguity symbol is converted to a Regexp object by calling the to_re method from the Bio::Sequence::NA that contains the symbol itself. You don't need to make symbol 'Y' by yourself because it is already built in the Bio::NucleicAcid class as a hash named Bio::NucleicAcid::Names.



#!/usr/bin/env ruby

require 'bio'

# creating a Bio::Sequence::NA object containing ambiguous alphabets

ambiguous_seq = Bio::Sequence::NA.new("atgcyrwskmbdhvn")

# show the contents and class of the DNA sequence object

p ambiguous_seq # => "atgcyrwskmbdhvn"

p ambiguous_seq.class # => Bio::Sequence::NA

# convert the sequence to a Regexp object

p ambiguous_seq.to_re # => /atgc[tc][ag][at][gc][tg][ac][tgc][atg][atc][agc][atgc]/

p ambiguous_seq.to_re.class # => Regexp

# example to match an ambiguous sequence to the rigid sequence

att_or_atc = Bio::Sequence::NA.new("aty").to_re

puts "match" if att_or_atc.match(Bio::Sequence::NA.new("att"))

if Bio::Sequence::NA.new("atc") =~ att_or_atc

puts "also match"

end


Share/Bookmark

How do I transcribe a DNA Sequence to a RNA Sequence?

In BioRuby, DNA and RNA sequences are stored in the same Bio::Sequence::NA class just using different Alphabets, you can convert from DNA to RNA or RNA to DNA using the rna or dna methods, respectively.



#!/usr/bin/env ruby
require 'bio'
# make a DNA sequence
dna = Bio::Sequence::NA.new("atgccgaatcgtaa")
# transcribe it to RNA
rna = dna.rna
# just to prove it worked
puts dna # => "atgccgaatcgtaa"
puts rna # => "augccgaaucguaa"
# revert to the DNA again
puts rna.dna # => "atgccgaatcgtaa"

Share/Bookmark

Share/Bookmark

Powered by  MyPagerank.Net

LinkWithin