Summary
This site is a catalog of small RNA-producing regions of plant genomes. Small RNA producing regions have been annotated empirically using high-throughput small RNA sequencing data (sRNA-seq). In general, the workflow is:
- Identify and obtain public sRNA-seq data for a given species. Datasets are chosen with an eye toward diversity in tissue-type and environmental conditions, as well as certain minimal quality and depth standards.
- Merge the sRNA-seq datasets into a 'Reference Set', which is a single conglomerate dataset of expressed small RNAs
- Use ShortStack to generate genome-wide, de-novo small RNA locus annotations
- When available, analyze pre-exisitng small RNA locus annotations (most often, MIRNA loci from miRBase) using the reference set of sRNA-seq data
Citation
If you make use of these data and/or this website, please cite: Lunardon et al. (2020) Genome Research https://doi.org/10.1101/gr.256750.119
How to use
- Bulk data : All bulk data are user-accessible. Annotations of small RNA loci are available in GFF3 or csv formats from the genome home page. Go to the plants page and select a genome, and you will find links for downloads. FASTA files of small RNA loci sequences, and their most predominant small RNAs, can also be obtained. Actual sRNA-seq alignments are also available as .BAM formatted files. The entire BAM file can be downloaded, or you can access sub-regions of the alignment from your local command line using samtools (instructions and urls are given on the genome pages). Finally, most bulk data files are also available for download from the genome's browser.
- Search : Go to the Search page for searches. You can search for a specific small RNA sequence (across all genomes or a chosen single genome), or search for specific locus types (see below), annotation names, or families. In addition, you can search all genomes or small RNA loci sets using BLASTn.
- Genome Browsers : There is a genome browser available for each genome. Go to the plants page and select a genome, and there will be a link to the genome browser. Each browser has, as default tracks, the annotated mRNAs, the ShortStack-derived de novo small RNA annotations, and a coverage plot of the reference set of sRNA-seq data. Additional tracks are selectable using the 'Select tracks' tab. Small RNA data and annotations are color-coded based on small RNA length (see below). Clicking a small RNA annotation brings you to a summary page describing the locus. Annotated loci can be searched by name by directly typing in the address bar of the browser. Additionally, the browser's highlight tool automatically triggers a de-novo ShortStack analysis of the indicated region.
Nomenclature and conventions
- Reference set : A reference set refers to a set of small RNA-seq data corresponding to multiple sRNA-seq libraries, all merged together. The de-novo genome-wide annotations performed for each species were based on a reference set alignment. We attempted to gather as much diverse data together as possible for each reference set, in order to capture as many small RNA loci as possible
- Locus names : All de-novo small RNA locus annotations follow a systematic nomenclature as follows.
- Example: can-b1.6r1-10510
- can : Three letter code representing the genus and species. These follow the same codes as used in miRBase. If the species wasn't in miRBase, we made up a three letter code. This example, can, is Capsicum annuum
- -b1.6 : Genome build version. In this example, it is genome build (b) 1.6.
- r1 : Reference annotation number. In this example, number 1. This is because there could be more than one annotation versio for a given genome build.
- -10510 : Unique number of the locus. Loci are simply numbered sequentially in order of discovery.
- Locus types : Small loci are automatically classified into one of the following types, based on the sizes of aligned small RNAs and their alignment patterns.
- MIRNA : The locus was called a microRNA by ShortStack. These calls are independent of any prior annotations and supported entirely by the aligned small RNAs, and the predicted secondary structure
- nearMIRNA : The locus met all criteria for a MIRNA with the exception that the exact predicted microRNA* was not present
- siRNA20 : The predominant small RNA size was 20 nucleotides, the locus was not called a MIRNA or nearMIRNA, and >= 80% of aligned reads at the locus were 20-24nts in length.
- siRNA21 : The predominant small RNA size was 21 nucleotides, the locus was not called a MIRNA or nearMIRNA, and >= 80% of aligned reads at the locus were 20-24nts in length.
- siRNA22 : The predominant small RNA size was 22 nucleotides, the locus was not called a MIRNA or nearMIRNA, and >= 80% of aligned reads at the locus were 20-24nts in length.
- siRNA23 : The predominant small RNA size was 23 nucleotides, the locus was not called a MIRNA or nearMIRNA, and >= 80% of aligned reads at the locus were 20-24nts in length.
- siRNA24 : The predominant small RNA size was 24 nucleotides, the locus was not called a MIRNA or nearMIRNA, and >= 80% of aligned reads at the locus were 20-24nts in length.
- OtherRNA : < 80% of aligned reads at the locus were 20-24nts in length. These loci are mostly degraded fragments of other RNAs, such as rRNAs, tRNAs, or mRNAs.
- NotExpressed : No small RNAs were aligned at the locus in the reference set of sRNA-seq reads. These are annotations imported from other sources that had no support in our analysis.
- Color codes : On both the genome browsers and in other data displays a consistent color scheme is used to indicate small RNA sizes. The colors were chosen to be discernable by individuals with the most common forms of color-blindness. The color scheme was inspired by / modified from those used by Brigette Hofmeister's small RNA jbrowse plugin.
- 20 nucleotides (skyblue)
- 21 nucleotides (blue)
- 22 nucleotides (mediumseagreen)
- 23 nucleotides (orange)
- 24 nucleotides (tomato)
- Other sizes (NOT 20-24) (gray)