Data Specifications

The datasets provided in this portal conform to the following specifications to facilitate comparisons across tissues. The raw data for the tissues in this portal come from multiple studies. The processed data, however, may differ from the results reported in each study for a couple reasons:

  1. Different processing decisions could be made for each study, while the data in this portal are processed uniformly to facilitate comparison across tissues.
  2. The published study results are immutable, while new versions of data in this portal may be released, for example to improve quality or to use updated reference genome and annotations.
Nevertheless, we also host the original results from individual studies on the Download page when available.

Processing

The ratgtex-pipeline repository contains all RNA-Seq and eQTL/sQTL mapping pipeline code. The steps are also on protocols.io: http://dx.doi.org/10.17504/protocols.io.rm7vzyk92lx1/v1. The ratgtex-server-data repository contains code that processes those results into additional data files for download and for the API.

Formats

Gene expression ({tissue}.{genome}.expr.{units}.bed.gz)

Expression tables are provided in BED format, with four columns describing the gene plus one column per sample. Only the transcription start sites are specified, not the whole gene length.

All significant cis-eQTL gene-SNP pairs ({tissue}.{genome}.cis_qtl_signif.txt.gz)

A compressed tab-separated table.

Strong associations from trans-eQTL mapping ({tissue}.{genome}.trans_qtl_pairs.txt.gz)

All measured SNPs genome-wide were tested against expression of each gene, and pairs with p-value < 1e-5 and > 5 Mb TSS distance are included here. Rows are sorted by variant location.

Conditionally independent cis-eQTLs ({genome}.eqtls_indep.txt)

A table of cis-eQTLs from all tissues. Stepwise regression was used to test for cis-eQTLs beyond and uncorrelated with the top cis-eQTL per gene.

Top association per gene ({genome}.top_assoc.txt)

A table of the strongest variant-gene association per gene per tissue, even if not significant.

Splice phenotypes ({tissue}.{genome}.leafcutter.bed.gz)

Splice phenotypes are provided in BED format, with four columns describing the splice junctions and genes plus one column per sample. Only the gene transcription start sites are specified, not the whole gene length or the splice junction itself, as these are used to determine the gene's cis-window for QTL mapping. See leafcutter docs for descriptions of these junctions and junction clusters.

All significant cis-sQTL phenotype-SNP pairs ({tissue}.{genome}.splice.cis_qtl_signif.txt.gz)

A compressed tab-separated table.

Strong associations from trans-sQTL mapping ({tissue}.{genome}.splice.trans_qtl_pairs.txt.gz)

All measured SNPs genome-wide were tested against each splice phenotype, and pairs with p-value < 1e-5 and > 5 Mb TSS distance are included here. Rows are sorted by variant location.

Conditionally independent cis-sQTLs ({genome}.sqtls_indep.txt)

A table of cis-sQTLs from all tissues. Stepwise regression was used to test for cis-sQTLs beyond and uncorrelated with the top cis-sQTL per gene.

Top splice association per gene ({genome}.top_assoc_splice.txt)

A table of the strongest variant-gene association per gene per tissue, even if not significant.