FB2024_03 , released June 25, 2024
Reference Report
Open Close
Reference
Citation
FlyBase Genome Annotators, (2019-). Gene model assessment based on new PhyloCSF data. 
FlyBase ID
FBrf0243886
Publication Type
FlyBase analysis
Abstract
PubMed ID
PubMed Central ID
DOI
Related Publication(s)
Research paper

Discovery of high-confidence human protein-coding genes and exons by whole-genome PhyloCSF helps elucidate 118 GWAS loci.
Mudge et al., 2019, Genome Res. 9(12): 2073--2087 [FBrf0244540]

Associated Information
Comments

FlyBase curators have been re-evaluating existing gene model annotations and creating new annotations based on conserved genomic extents with protein-coding signatures, using an updated PhyloCSF analysis by Irwin Jungreis and colleagues; see: https://data.broadinstitute.org/compbio1/PhyloCSFtracks/trackHub/hub.DOC.html and https://data.broadinstitute.org/compbio1/Novel_PhyloCSF_Regions/.

As of release 6.32, this analysis has resulted in 42 new protein-coding gene annotations, 8 of which correspond to lncRNA genes reannotated as coding and 16 of which are dicistronic or polycistronic. Thirty-one new pseudogenes have been annotated, including 6 protein-coding genes reannotated as pseudogenes and 4 lncRNA genes reannotated as pseudogenes. Five existing protein-coding genes were newly identified as mutations in the sequenced strain; 2 new protein-coding annotations correspond to a gene split supported by the PhyloCSF data. Almost 150 additional gene models were improved or corrected, including 35 new stop-codon readthroughs and 2 cases annotated with a non-AUG start codon. For 22 genes, a comment has been added indicating that a non-AUG translation start may be used, but such an alternative start was not annotated. Fifteen calls correspond to prior updates, including 10 stop-codon read-throughs based on a similar analysis. The PhyloCSF assessment also flags regions of known mutations in the strain; these have not been included in the list of associated genes below.

Associated Files
Other Information
Secondary IDs
    Language of Publication
    English
    Additional Languages of Abstract
    Parent Publication
    Publication Type
    Abbreviation
    Title
    ISBN/ISSN
    Data From Reference
    Genes (259)
    List limited to the first 200 records. Use the HitList export button in the left sidebar to view all records.