Open Access Open Access  Restricted Access Subscription or Fee Access

APPENDIX 3 Codon Usage in C. elegans

Paul M. Sharp, Keith R. Bradnam

Abstract


Synonymous codon usage in C. elegans was investigated by Stenico et al. (1994). The major conclusion of this analysis was that synonymous codon usage patterns vary among genes in a manner correlated with their expression level. Some genes have extremely biased codon usage: these genes appear to be expressed at higher levels, and it was inferred that natural selection has favored a limited number of translationally optimal codons. Other genes (apparently those expressed at low levels) have relatively unbiased codon usage, although there was some nonrandomness consistent with context-dependent mutational biases. These results echo those found in a number of unicellular eukaryotes and in Drosophila melanogaster (see Sharp et al. 1995). Here the analyses of codon usage in C. elegans have been updated using similar techniques, but with a much larger dataset.

All nuclear protein-coding sequences annotated as deriving from C. elegans were extracted from the GenBank/EMBL/DDBJ DNA sequence data library (GenBank release 92), using the ACNUC retrieval system (Gouy et al. 1985). Duplicate sequences, partial sequences, and sequences containing ambiguous codons or multiple stop codons were excluded, yielding a total dataset of 4027 open reading frames (ORFs). Although some of these sequences were determined by the “traditional” approach — i.e., the genes were identified and sequenced because of some known function or phenotype — many others were found within cosmids sequenced as part of the genome project and many of the genes thus identified remain putative. Therefore, gene sequences were first designated as (1) “genes” if the sequence was determined by...


Full Text:

PDF


DOI: http://dx.doi.org/10.1101/0.1053-1057