Home    Site Map  Download      FAQ         Links       Genome Browser  
Current TIGR Rice Genome Pseudomolecules Release
 

We are pleased to announce release 5 of the TIGR Rice Pseudomolecules and Genome Annotation. The official release date for this version is January 24, 2007. Click on the links below to go to the relevant section for detailed information:





Description of TIGR Rice Genome Pseudomolecules

As part of our National Science Foundation-funded Rice Genome Annotation Project, we constructed pseudomolecules (virtual contigs) for each of rice 12 chromosomes. In release 5, we retained our release 4 pseudomolecules as there was no significant amount of Oryza sativa (japonica cultivar-group) genomic sequences deposited in GenBank/EMBL/DDBJ in the past year.

The pseudomolecules were constructed by resolving discrepancies between overlapping BAC/PAC clones, trimming the overlap regions at junction points in which the phase 3 BAC/PAC sequences are preferably used, and linking the unique sequences to form a contigous sequence. A list of the ordered BAC/PAC clones for each of the 12 chromosomes was obtained from the IRGSP. Although we used all of the BAC/PAC sequences from the IRGSP and which are available in Genbank/EMBL/DDBJ, our pseudomolecules do not represent the official pseudomolecules generated by the IRGSP which are available here.

A total of 3,450 rice BAC/PAC clones were included in the pseudomolecules. At the time these pseudomolecules were constructed, 3,408 BAC/PAC clones (98.8 %) were finished and 42 BAC/PAC (1.2 %) clones were unfinished (phase 2) as defined by Genbank. Gaps between clones (i.e., physical gaps are denoted with 1000 Ns) and the location of these gaps can be seen in the graphical views of each of the chromosomes below. Centromeres were identified using the CentO centromeric sequence (AY101510; Cheng et al., 2002). The centromeres are adjacent to these clones on each of the 12 rice chromosomes. Please be aware that there may also be other gaps in unfinished BACs which also could be denoted with a string of Ns. In total, there are 38 physical gaps within the 12 pseudomolecules in addition to gaps at 10 centromeres and 10 telomeres

All of the BAC/PAC clones were annotated using our automated/semi-automated rice annotation pipeline (click here to see the details). In the current release (Osa1 version 5.0), there are 372,077,801 bp of non-overlapping rice genome sequence from the 12 rice chromosomes and 56,278 genes (loci) were identified, of which 6,498 have 10,432 additional alternative splicing isoforms resulting in a total of 66,710 transcripts (or gene models) in the rice genome. Note that we have excluded 740 small gene models (<50 amino acids) from our annotated gene set.

Transposable element-related (TE-related) gene models were identified using two approaches: BLASTN searches against the TIGR Oryza Repeat Database and by identifying gene models containing TE-related Pfam domains. These genes (15,232) and their models (15,424) were annotated based on the Pfam domain or the nomenclature in the TIGR Oryza Repeat Database. Pack-MULEs were identified only on chromosome 1 and 10. They were manually annotated as described in Jiang et al. 2004. Transduplicate MULEs identified by Juretic et al. 2005 were aligned to the TIGR v5 Pseudomolecules. Note that the Jiang Pack-MULEs and the transduplicate MULEs are only identified on the Genome Browser and not in our functional annotation.

A total of 33,882 gene models (24,435 genes) were further improved based on the experimental evidence provided by EST and full-length cDNA sequences. This was done using the TIGR PASA program. A portion of PASA validation failed models was manually reviewed and curated. The structure of 1,648 gene models were manually annotated using EST paring information and comparative genomics analyses (Zhu and Buell, Genome Research, 2007). Using the structural annotation from the Community Annotation project (CA), we modified 43 loci encompassing 9 different CA protein families. In addition, we added 20 new loci from 5 different CA protein families to the TIGR annotation. We updated functional assignment for 378 loci using the Community Annotation.

Please note that these pseudomolecules are constructed from finished and unfinished sequence and a majority of the gene models have not been manually curated.




Features of the TIGR Rice Annotation Release 5

  • Our rice genome browser has been updated and now contains 62 tracks of annotation. These tracks have been updated to include the latest evidence and datasets.
  • Gene model structure has been improved for 33,882 gene models (28,706 genes) with ~ 1.2 million of EST and/or full length cDNA evidence using the TIGR PASA program. A portion of models has been manually curated using the expression evidence and comparative genomics studies.
  • Rice gene expression Anatomy Viewer/Digital Northern and Tissue Specific Expression page have been created.
  • The rice community annotation has been carried out, and the results have been integrated into the TIGR rice annotation.
  • Locus names have been assigned to chloroplast and mitochondrial genes.
  • A new protein function category, conserved hypothetical protein, has been introduced for proteins only matching to proteins without known function in other organisms.



Table of Rice Pseudomolecule, Loci, and Gene Models in Release 5

Chr BAC/ PAC No. Sequence Length in Pseudomolecule (bp) Genes/Locia Gene Modelsa Ordered List of BAC/PAC Clones Graphic View Download Sequences
TEb Non-TEc Total TEb Non-TEc Total
1d 393 43,596,771 1,307 5,313 6,620 1,334 6,766 8,100 Chr01 Chr01 Download
2 358 35,925,388 1,096 4,319 5,415 1,112 5,608 6,720 Chr02 Chr02 Download
3 327 36,345,490 1,038 4,559 5,597 1,058 6,027 7,085 Chr03 Chr03 Download
4 292 35,244,269 1,759 3,613 5,372 1,773 4,464 6,237 Chr04 Chr04 Download
5 286 29,874,162 1,344 3,298 4,642 1,363 4,216 5,579 Chr05 Chr05 Download
6 281 31,246,789 1,321 3,420 4,741 1,341 4,158 5,499 Chr06 Chr06 Download
7 287 29,688,601 1,237 3,270 4,507 1,264 3,988 5,252 Chr07 Chr07 Download
8 275 28,309,179 1,299 2,905 4,204 1,302 3,585 4,887 Chr08 Chr08 Download
9 223 23,011,239 1,033 2,399 3,432 1,046 2,926 3,972 Chr09 Chr09 Download
10d 202 22,876,596 1,071 2,404 3,475 1,078 2,942 4,020 Chr10 Chr10 Download
11 257 28,462,103 1,288 2,936 4,224 1,295 3,439 4,734 Chr11 Chr11 Download
12 269 27,497,214 1,439 2,610 4,049 1,458 3,167 4,625 Chr12 Chr12 Download
Totale 3,450 372,077,801 15,232 41,046 56,278 15,424 51,286 66,710

Download

a Excluding small gene models (< 50 amino acids).
b TE: Transposable elements related genes and gene models. The rice proteome was searched against the TIGR Oryza Repeat Database with TBLASTN and against the TE-related Pfam domains with hmmpfam. Genes and gene models with matches above cut-offs were annotated as TE-related gene models.
c Non-TE: Non-TE related gene models.
d Pack-MULEs were only annotated using data available from Jiang et al. 2004 and Juretic et al. 2005.
e Note these pseudomolecules do not represent the official IRGSP pseudomolecules.

You can also get a sub-dataset of TIGR rice pseudomolecules by using TIGR Rice Genome Data Extractor



Rice Pseudomolecule Gap Table

 
   
 
For Rice Comments/Questions send mail to the TIGR rice team.
 
Photographs courtesy of Robin Buell (TIGR), Jiming Jiang (University of Wisconsin), and the USDA Agricultural Research Service