FlyBase curators have been re-evaluating existing gene model annotations and creating new annotations based on conserved genomic extents with protein-coding signatures, using an updated PhyloCSF analysis by Irwin Jungreis and colleagues; see: https://data.broadinstitute.org/compbio1/PhyloCSFtracks/trackHub/hub.DOC.html and https://data.broadinstitute.org/compbio1/Novel_PhyloCSF_Regions/.
As of release 6.32, this analysis has resulted in 42 new protein-coding gene annotations, 8 of which correspond to lncRNA genes reannotated as coding and 16 of which are dicistronic or polycistronic. Thirty-one new pseudogenes have been annotated, including 6 protein-coding genes reannotated as pseudogenes and 4 lncRNA genes reannotated as pseudogenes. Five existing protein-coding genes were newly identified as mutations in the sequenced strain; 2 new protein-coding annotations correspond to a gene split supported by the PhyloCSF data. Almost 150 additional gene models were improved or corrected, including 35 new stop-codon readthroughs and 2 cases annotated with a non-AUG start codon. For 22 genes, a comment has been added indicating that a non-AUG translation start may be used, but such an alternative start was not annotated. Fifteen calls correspond to prior updates, including 10 stop-codon read-throughs based on a similar analysis. The PhyloCSF assessment also flags regions of known mutations in the strain; these have not been included in the list of associated genes below.