Sequencing the banana genome has revealed the secrets of its 520 million base pairs, the “letters” of the genetic code. This work is a big step toward understanding the genetics of and improving banana varieties, and was done within the framework of the Global Musa Genomics Consortium.

The results are published in the July 12 issue of the journal Nature. The banana genome was found to contain more than 36,000 genes, slightly more than the human genome.

Eric Lyons, an assistant professor in the University of Arizona School of Plant Sciences and a member of the iPlant Collaborative, which is based at the UA’s BIO5 Institute, contributed to the project by developing a key part of the cyber infrastructure necessary to handle and analyze the huge amounts of data generated by deciphering the sequence. The tool helps in figuring out the meaning behind the genetic alphabet of the banana by comparing it to other plant genomes.

“We are dealing with huge amounts of information,” Lyons said. “Plant genomes are incredibly dynamic, which makes them some of the most fascinating and at the same time most difficult organisms to study.”

Funded through a $50 million grant from the National Science Foundation in 2008, the iPlant Collaborative has since brought together researchers from all biological fields and biomedical sciences from across the nation as well as overseas. Collaborating with high-power computing experts, they use iPlant as a new platform to gather, store and interpret the immense amounts of data generated by projects such as comparisons among entire genomes.

Lyons has developed a system that does just that: CoGe, which is short for Comparative Genomics. He said CoGe provides the tools allowing any scientist in the world to compare and analyze any genome side by side. Originally developed for plant genomes, the software is designed to accommodate any set of genomes from all domains of life.

CoGe currently contains almost 20,000 genomes from 15,000 organisms, including viruses, bacteria, plants, insects, amphibians and mammals – and, as of now, the banana.

“The number of genomes has exploded,” Lyons said. “The whole reason I designed the system was that we needed ways to compare genomes quickly. However, we also needed to easily manage those data, because no matter where we are today, tomorrow we'll have a new version of our favorite genome and 10 more to which to compare it.”

Of the many varieties of banana, whose scientific name is Musa acuminata, one called DH-Pahang is a breed known for its susceptibility for disease, making it a poor crop choice. Shunned by the agriculture industry, DH-Pahang rose to stardom when the sequencing team, led by two French research organizations, CIRAD and CEA-Genoscope, chose the variety for its project.

The DH-Pahang banana differs from its relatives in that it has what geneticists call a homozygous genome.

“It means both copies of each chromosome are identical,” Lyons explained. “Working with a homozygous genome makes it easier to solve the jigsaw puzzle of the genome and correctly assemble all the pieces. You don't get confused by having slightly different puzzle pieces, or sequences, for gene alleles across a genome.”