BIOINFORMATICS

Showing posts with label Cloud computing. Show all posts

Moving Towards Open Cloud?

Cloud Computing Interoperability Forum (CCIF) is drafting an Open Cloud Manifesto. Idea is to embrace the fundamental principles of an open cloud with the help of worldwide cloud community. Open Cloud Manifesto will describe principles and guidelines for interoperability in cloud computing. Unfortunately two major cloud computing players, Amazon and Microsoft have not given any positive commitment about Open Cloud Manifesto. On his blog Steve Martin (Microsoft Azure product manager) writes,

We were admittedly disappointed by the lack of openness in the development of the Cloud Manifesto. What we heard was that there was no desire to discuss, much less implement, enhancements to the document despite the fact that we have learned through direct experience. Very recently we were privately shown a copy of the document, warned that it was a secret, and told that it must be signed "as is," without modifications or additional input. It appears to us that one company, or just a few companies, would prefer to control the evolution of cloud computing, as opposed to reaching a consensus across key stakeholders (including cloud users) through an “open” process. An open Manifesto emerging from a closed process is at least mildly ironic.

Well all I can say that the idea is worth to try but the intentions are more important. Now a days a lot of self promotional open movements are playing around, you can not stake on every and any effort. Suddenly there is flood of "Open" movements, people are really confused what's going on. To me it looks like a rat race to win Novel Prize for "Open" movement

Backup and fault tolerance in systems biology: Striking similarity with Cloud computing

Striking similarity between biological systems and computing paradigms is not new, and in past there have been several attempts to draw an analogy between systems biology and computing systems. For interested readers I will recommend my last post which examine how systems biology of human can be describes asa grid of super-computers. Over the time researchers have developed several bio-inspired fault-tolerance methods to support fault detection and removal in both hardware and softwares systems, such as fault-tolerant hardware inspired by ideas of embryology and immune systems. Fault tolerance is the ability of a system to retain intended functionality even in the presence of faults, and in case of living cells fault-tolerance is due to the intrinsic robustness of their gene regulatory networks which can be easily observed in case of mutation-insensitivity expression of genes with phenotypic feature. In recent issue of journal Molecular Systems Biology, Anthony Gitter and other co-authors suggest that gene regulatory networks also have backup plans very much like cloud computing networks or MapReduce framework where failure of a computing node is managed by by re-executing its task on another node. Fault-tolerant is seen as mechanism to retain the functionality of master gene in very extreme circumstances through a controller mechanism, while backup plan employs another gene with reasonable sequence similarity to master gene in order to perform the tasks which are key for the survival of cell itself. Their finding suggest that

[T]he overwhelming majority of genes bound by a particular transcription factor (TF) are not affected when that factor is knocked out. Here, we show that this surprising result can be partially explained by considering the broader cellular context in which TFs operate. Factors whose functions are not backed up by redundant paralogs show a fourfold increase in the agreement between their bound targets and the expression levels of those targets.

The yellow TF which has sequence similarity as well as shared interactions with green TF can replace the green TF when it is knocked out and is able to recruit the transcription machinery leading to only small overlap between binding and knockout results

In order to understand the systems biology of robustness provided by redundant TFs and their role in broader cellular context authors explored dependence of findings on the TFs' homology relationships and shared protein interaction network. They observed that TFs with the most similar paralogs had no overlap between their binding and knockout data, while protein interaction networks provide physical support for knockout effects.
Further Gitter describes importance of his research as,

It's extremely rare in nature that a cell would lose both a master gene and its backup, so for the most part cells are very robust machines. We now have reason to think of cells as robust computational devices, employing redundancy in the same way that enables large computing systems, such as Amazon, to keep operating despite the fact that servers routinely fail

A simple backup mechanism in MapReduce framework

Cloud computing: a new standard platform?

Cloud computing is becoming a technology mature enough for its use in genome research experiments. The use of large datasets, its highly demanding algorithms and the need for sudden computational resources, make large-scale sequencing experiments an attractive test-case for cloud computing. So far I have seen cloud computing demonstrated using R (1). However, it remains to be seen a rigorous comparison of its performance using a BLAST (2) search and its ability to cope with ever-increasing databases and open source frameworks such as bioperl (3) or bioconductor (4).

Cloud computing claims to be a resource where IT power is delivered over the Internet as you need it, rather than drawn from a desktop computer (5), in a fashion seemingly similar to having your own virtual servers available over the Internet (6). Some of the most important aspects of cloud computing are:

* Software as a Service (SaaS): where you buy a software license for a determined period of time.
* Utility Computing: storage and virtual servers that IT can access on demand.
* Web Services.

My first exposure to cloud computing came of an email from Matt Wood (7), a newly established group leader at the Sanger Institute (8), announcing the Cloud Computing Group (9) in Cambridge, UK. At that point I had no idea of what it meant. When I attended the meeting at Cambridge University’s Centre for Mathematical Sciences (10), to my surprise I found there a very select audience, ranging from the director of IT at Sanger, Phil Butcher (11), one of the Ensembl (12) software coordinators, Glenn Proctor (13), and quite a few local start-up companies.

Among the presenters, we had Simone Brunozzi, from Amazon’s Cloud Computing (14). I think he had an interesting story to tell: how Amazon, a well known company, is now involved in the business of cloud computing and selling it. Apparently, this technology they sell was developed for Amazon’s own business. Among their main challenges was to be able to address the capricious shopping habits of customers, with orders peaking around Christmas and quite flat the rest of the year. These trends required rapid adaptability of computational resources. The idea of cloud computing fitted well with their business model of e-commerce: you don’t need to care about where your computation is done, the only thing you care about is that you have the needed resources and do not have to pay for them when you don’t need them. One of the things that stroke me about Amazon’s presentation was that they would not tell us the number of processors they had at their disposal.

When it comes to using cloud computing for genomics research, prices may be quite expensive when they add up. The bioinformatics field, greatly influenced by the open-source movement, is not likely to rush to join Amazon’s cloud. Private efforts trying to make money out of human genome technology have remained rather unsuccessful to date: think of Celera Genomics or Lion Bioscience. I am skeptical of the bioinformatics community adopting cloud computing unless open source ideals are embraced: i) allowing people to develop and contribute to the technology if and when they want to, ii) allowing total openness in terms of its achievements and pitfalls and iii) making it free to use for everyone. I do not think that making it free does not mean there is no margin for profit. Think of the profitability of free-to-use technologies such as java (15) or MySQL (16), both components of SUN Microsystems’ (17) business.

Despite the promise of potential benefits for the bioinformatics community, the way the cloud is being portrayed does not conform the ideals of free access and openness. Unless these ideals are implemented to some extent, I see it difficult for the cloud to take root in the bioinformatics field and become a new standard platform for genome research.

References

1. http://www.r-project.org/
2. http://blast.ncbi.nlm.nih.gov/Blast.cgi
3. http://www.bioperl.org/wiki/Main_Page
4. http://www.bioconductor.org/
5. http://www.guardian.co.uk/technology/2008/sep/29/cloud.computing.richard.stallman
6. http://www.infoworld.com/article/08/04/07/15FE-cloud-computing-reality_1.html
7. http://www.sanger.ac.uk/Users/mw4/
8. http://www.sanger.ac.uk/
9. http://cloudcamb.org/
10. http://www.cms.cam.ac.uk/site/
11. http://www.yourgenome.org/people/phil_butcher.shtml
12. http://www.ensembl.org/index.html
13. http://www.ebi.ac.uk/Information/Staff/person_maintx.php?s_person_id=299
14. http://aws.amazon.com/ec2/
15. http://www.java.com/en/
16 http://www.mysql.com/
17. http://www.sun.com/