FB2024_02 , released April 23, 2024
Project: modENCODE_mRNA-Seq_development
Open Close
General Information
Name
modENCODE_mRNA-Seq_development
Species
D. melanogaster
Project type
FlyBase ID
FBlc0000085
Parent Project
Data Provider
[modENCODE: Drosophila transcriptome](http://www.modencode.org/celniker/)
Title
Transcriptional profile of D. melanogaster developmental stages, unstranded RNA-Seq, modENCODE.
Overview
Description

Umbrella record for 30 collections that differ by developmental stage, from 0-2-hr embryos to 30-day adults. Consists of 76-nt single-end and paired-end reads; transcriptome represented as frequency of reads along genome (data in wiggle format; see GBrowse presentation). RNA junctions identified and characterized (presented in GBrowse and in RNA junction reports). Restricted to uniquely aligned reads.

Reagent type
Key genes
SO term(s)
Details
Sample preparation
Protocol

Frozen samples were homogenized and extracted using the TRIzol reagent protocol (Invitrogen). RNA was purified on an RNeasy spin column (Qiagen), and DNase treated. Polyadenylated RNAs were purified from total RNA extracts via oligo(dT) binding, using standard Illumina protocol. The poly(A)+ RNA was fragmented using divalent cations under elevated temperature, following by first and second strand cDNA synthesis primed with random hexamers. The cDNA fragments were end-repaired using T4 DNA polymerase and Klenow DNA polymerase, and phosphorylated at their 5' ends with T4 polynucleotide kinase. After adding A bases to the 3' end of the DNA fragments, Illumina adaptor oligonucleotides were ligated to the ends and ~ 300 bp fragments were isolated from an agarose gel, enriched by PCR amplification, and gel-purified again.

Mode of Assay

The samples were quantitated using a Nanodrop, and loaded onto a flow cell for cluster generation and sequenced on an Illumina Genome Analyzer II using either single read or paired end protocols (Illumina).

Data analysis

Reads were aligned to Dmel_Release_6 using the STAR aligner v2.3.0e (Linux x86_64) with default parameters on the FASTQ files to generate multiply-mapped BAM files. These were filtered to include reads with only 1 aligned hit ( NH:i:1 attribute) to generate uniquely-mapped BAM files. A custom script was used to convert BAM files into bedgraph files (bam2bedgraph.cc).

Note that for each pair of paired-end reads, the two reads were mapped independently, and only those reads mapping uniquely to the genome were included in the data submission to FlyBase. In other words, information from one read was not used to resolve ambiguous mapping of its paired read.

FlyBase reports gene expression levels calculated from RNA-Seq coverage data as RPKM (reads per kilobase of exon model per million mapped reads). The RPKM value is calculated as follows. The uniquely transcribed region(s) for each gene is determined by taking regions covered by exons of the gene and excluding transcribed regions from any overlapping genes, both with respect to genes lying on same strand (for calculation using strand-specific RNA-Seq coverage data), and for genes on either strand (for calculation using unstranded RNA-Seq coverage data). RNA-Seq coverage read-count data was then correlated by location with the uniquely transcribed region(s) of each gene to produce the sum of reads over the entire uniquely transcribed region for the gene. Reads per kilobase of exon model per million mapped reads (RPKM) was then calculated using the method from Motazavi et al, Nat. Methods 6, 621-628 (2008). (RPKM = 10^9 * C / N * L * R, where C = number of reads in gene, N = number of uniquely mappable reads in the experiment, L = sum of uniquely transcribed bases in bp, and R = read length in bp).

The RPKM values are binned into eight expression levels: Bin 0: No/Extremely low expression (0); Bin 1: Very low expression (1-3), percentiles 1-25, approximately; Bin 2: Low expression (4-10), percentiles 26-50, approximately; Bin 3: Moderate expression (11-25), percentiles 51-75, approximately; Bin 4: Moderately high expression (26-50), percentiles 76-85, approximately; Bin 5: High expression (51-100), percentiles 86-95, approximately; Bin 6: Very high expression (101-1000), percentiles 96-99, approximately; Bin 7: Extremely high (>1000), the 100th percentile, approximately.

FlyBase RPKM data for all genes can be downloaded from the FlyBase Downloads page (link in the blue navigation bar at the top of all FlyBase web pages).

Comments

Quantitation of expression at gene level: coverage data for each developmental stage were intersected with FlyBase exons (D. melanogaster Annotation Release 5.26) to calculate a single value reflecting average coverage per kb per gene. Each gene data point was then classified into one of nine expression level bins, and the graphical and text summaries were produced from the binned values.

Additional Information

[modENCODE Developmental Time Course Profile](http://intermine.modencode.org/release-33/experiment.do?experiment=Developmental+Time+Course+Transcriptional+Profiling+of+D.+melanogaster+Using+Illumina+poly%28A%29%2B+RNA-Seq)

The RNA-seq profiles displayed by FlyBase in GBrowse and used for RPKM calculation can be accessed at the FTP link below as .wig files. Please take note of how these FlyBase .wig files represent data for a contiguous sequence of bases with the same signal value. The value is declared only for the first position of that region, and applies to all positions that follow (these are not explicitly listed) until a new value at a new base position is declared.

Synonyms and Secondary IDs (6)
Reported As
Symbol Synonym
Drosophila transcriptome
modENCODE_574
modENCODE_mRNA-Seq_U
Name Synonyms
Transcriptional profile of D. melanogaster developmental stages, unstranded RNA-Seq, modENCODE.
Secondary FlyBase IDs
    References (13)