About Digital Expression Explorer

The compendium is designed to bring biologists closer to large scale gene expression data sets. We have processed thousands of public RNA-seq data sets from a veriety of organisms with open-source bioinformatics tools and make them freely accessible. We would like to thank the folks at SRA for hosting these raw data sets.

About us

The compendium is brought to you by the Epigenetics in Human Health and Disease Laboratory and Computational Biology Unit at Baker IDI. We value your feedback, so feel free to contact us by email.

Data processing

Our data processing procedure entails:

  1. -Download from NCBI SRA

  2. -Diagnose sequence format

  3. -Sequence quality trimming

  4. -Alignment to genome

  5. -Quanitification of aligned reads

A detailed description of our data processing method is available here.

Reference genome information

The compendium relies on reference genome sequence and annotation information provided by Ensembl Genomes .

Species Genome Reference Sequence and Annotation
Arabidopsis thaliana Arabidopsis_thaliana.TAIR10.25.dna_sm.toplevel.fa
Arabidopsis_thaliana.TAIR10.25.gtf
Caenorhabditis elegans Caenorhabditis_elegans.WBcel235.dna_sm.toplevel.fa
Caenorhabditis_elegans.WBcel235.78.gtf
Drosophila melanogaster Drosophila_melanogaster.BDGP5.dna_sm.toplevel.fa
Drosophila_melanogaster.BDGP5.78.gtf
Danio rerio Danio_rerio.Zv9.dna.toplevel.fa
Danio_rerio.Zv9.78.gtf
Escherichia coli Escherichia_coli_str_k_12_substr_dh10b.GCA_000019425.1.25.dna_sm.toplevel.fa
Escherichia_coli_str_k_12_substr_dh10b.GCA_000019425.1.25.gtf
Homo sapiens Homo_sapiens.GRCh38.dna.primary_assembly.fa
Homo_sapiens.GRCh38.78.gtf
Mus musculus Mus_musculus.GRCm38.dna.primary_assembly.fa
Mus_musculus.GRCm38.78.gtf
Rattus norvegicus Rattus_norvegicus.Rnor_5.0.dna.toplevel.fa
Rattus_norvegicus.Rnor_5.0.78.gtf
Saccharomyces cerevisiae Saccharomyces_cerevisiae.R64-1-1.25.dna_sm.toplevel.fa
Saccharomyces_cerevisiae.R64-1-1.25.gtf

Update schedule

New datasets deposited at SRA will be incorporated into the compendium weekly with the latest upgrade. Upon release of an updated genome build, we intend to update the data for that organism within a year, keeping a previously archived version for bulk download only. For consistency, gene annotation sets will not be updated independent of the genome build.

.