Genome Sequence of Kaohsiung’s Emblematic Tree
Bombax ceiba L., the red silk cotton or kapok tree, has now a sequenced genome. The tree is famous and loved for its magnificent red flowers, and was chosen as the city flower of Kaohsiung and Guangzhou. It is also economically important as a source for fiber, food, and timber. The new genomic resource, published recently in the journal GigaScience, will also help scientists to better understand how the tree is adapted to its original dry valley habitat.
In spring, the red silk cotton tree (Bombax ceiba L.) transforms numerous parks and open spaces with its outburst of fleshy red petals. No wonder then that the citizens of Kaohsiung voted the iconic plant as its emblematic city flower. Apart from its beauty, the tree, which can grow up to 40 meters, has a number of uses: The kapok-like fiber inside capsule is used for pillows and mattresses and dried flowers are a popular basis for tea or soup. Traditional Chinese Medicine even attributes health benefits to the plant. Despite B.ceibabeing widely cultivated in humid Southern China, there is a city named after this plant in a dry-hot part of the upper Yangtze River. This wide range may reflect its tolerance to extreme drought and heat.
A team of Chinese scientists led by Lizhou Tang and Bin Tian of Qujing Normal University, Southwest Forestry University, has now sequenced and assembled the genome of this remarkable tree. They combined different genomic techniques to produce a resource of outstanding technical quality. The final genome assembly is 895 million bases (DNA letters) long, and containing roughly 52,000 genes.
ProfessorTang explains the challenges of the project: “The high genome heterozygosity of Bombax ceiba caused us some problems during assembly of the genome. Fortunately, we have achieved a relatively intact genome draft with the help of third generation sequencing technologies.”
When scientist determine the genome sequence of an organism, they cannot just “read” the DNA letters of an entire chromosome from end to end. Instead, bioinformaticians reconstruct the genome from a large number of small pieces. Until a couple of years ago, the dominant technique to produce these sequence fragments, called “short read” sequencing, was a cost-effective method with one major drawback: The individual fragments were very small, not more than a couple of hundred DNA letters each. The resulting genome assemblies had a large number of gaps. More recently, a new technique, Single Molecule Real Time Sequencing (SMRT), has become more widely available. The technique produces longer stretches of continuous sequence, and the researchers used this additional source of high-quality data to reduce the number of gaps in their reconstruction of the Bombax ceiba genome with contig and scaffold N50 sizes of 1.0 Mb and 2.06 Mb, respectively.
To improve the quality and usefulness of the genome data even more, they then combined the sequence data with information from so-called “optical maps”. This method works by labelling DNA fragments of different sizes with fluorescent markers, making it possible to sort and orient the pieces, to provide an optimal representation of the actual chromosomes. These techniques have been tried and tested before by other researchers with a number of plant and animal species, and now helped to produce a high quality genome assembly of the red silk cotton tree.
“We intend to use the new genome resource to help with our future breeding projects of the red silk cotton tree”, Professor Tang points out. The data will also be used to better understand the adaptation of the tree to its natural habitat in dry valleys, and the authors already identified a couple of potential leads for genes that may be involved in this adaptation to quite extreme habitats.
GigaScience
GigaScience is co-published by BGI and Oxford University Press. Winner of the 2018 PROSE award for Innovation in Journal Publishing (Multidisciplinary), the journal covers research that uses or produces ‘big data’ from the full spectrum of the life sciences. It also serves as a forum for discussing the difficulties of and unique needs for handling large-scale data from all areas of the life sciences. The journal has a completely novel publication format — one that integrates manuscript publication with complete data hosting, and analyses tool incorporation. To encourage transparent reporting of scientific research as well as enable future access and analyses, it is a requirement of manuscript submission to GigaScience that all supporting data and source code be made available in the GigaScience database, GigaDB, as well as in publicly available repositories. GigaScience will provide users access to associated online tools and workflows, and has integrated a data analysis platform, maximizing thepotential utility and re-use of data.
The genome data is provided in the GigaDB repository, NCBI (Bioproject ID PRJNA429932) alongside the open access article in GigaScience.
In keeping with the journal’s goals of making the data underlying the analyses used in published research fully and freely available, all data from this project are available under a CC0 waiver in the GigaScience database, GigaDB, in a citable format as follows:
Gao, Y; Wang, H; Liu, C; Chu, H; Dai, D; Song, S; Yu, L; Han, L; Fu, Y; Tian, B; Tang, L (2018): Supporting data for “De novo genome assembly of the red silk cotton tree (Bombax ceiba)”. GigaScience Database. http://dx.doi.org/10.5524/100445
Reference
Gao, Yet al. (2018): De novo genome assembly of the red silk cotton tree (Bombax ceiba).GigaScience. doi.org/10.1093/gigascience/giy051
©www.geneonline.com All rights reserved. Collaborate with us: [email protected]