Associations with GLEANR gene prediction models (which comprise the Release 1.0 annotation sets) were determined for previously described genes for which nucleotide or protein GenBank accessions are available. A tBLASTn (for cases with protein sequence data) or a BLASTn (for cases with only nucleotide sequence data) alignment was done to identify the correct region of the genome and any corresponding GLEANR prediction in that region.
The numbers of such associations made for each of the newly sequenced species are indicated below. (Analysis for D. pseudoobscura is described in a separate communication.)
D. ananassae 51
D. erecta 192
D. grimshawi 26
D. mojavensis 46
D. persimilis 29
D. sechellia 101
D. simulans 501
D. virilis 214
D. willistoni 57
D. yakuba 864
Detailed comparisons were beyond the scope of this analysis, but major discrepancies (such as incomplete annotations or annotations that need to be merged) or atypical cases (such as unresolved dicistronics) were noted. No changes were made to the GLEANR gene prediction models.