We have analyzed the sequences of 77 nuclear genes of N. crassa thought to be transcribed by RNA polymerase II (References 1-72) which should represent virtually all of the presently published nuclear gene sequences for this fungus. Kozak (1988, Nucl. Acids Res. 15:8125-) analyzed 699 vertebrate genes leading to identification of the vertebrate consensus sequence for initiation of translation, or Kozak Sequence:
G44C39C53(A61/G36)(C49/A27)C55A100T100G100G46
We show here that the N. crassa Kozak sequence is:
C57NNNC77A81(A44/C43)"T"3A99T100G99G51C53
where the subscript number indicates the % occurrence of the particular nucleotide and "T" indicates the conserved absence of that particular nucleotide.
We arbitrarily decided that a nucleotide was to be included in the consensus only if it was present in at least 50% of all the sequences analyzed. If two nucleotides, each represented at less than 50%, gave a summed total of at least 75% representation for a single position, then both are shown in brackets.
Table I. Consensus for initiation of translation and stop codons in Neurospora crassa
No. Ref. Gene Distance from +1 Kozak Sequence Stop codon to ATG (bases) Consensus:CNNNCAATGGC 1 1 acp 46 AATATCACAATGGCG TAA 2 2 acu-3 - CTGCCCATCATGGCT TAG 3 3 acu-5 103 ATACGAGTTATGGCG TAA 4 4 acu-8 - TCACCAACCATGGCG TAA 5 5 acu-9 60 CTTTTCACAATGGCT TAA 6 6 al-1 - ACAGACAAAATGGCT TAG 7 7 al-3 90 CACGTCACCATGGCC TGA 8 8 alc 54 TCCCTCACCATGACC TAA 9 9 am 109 ACCTTCAAAATGTCT TAA 10 10 arg-2 118 CAAGTCAAGATGTTC TAA 11 11 atp-1 90 CTCCACAACATGTTC TAA 12 11 atp-2 58 ATCGTCAAGATGTTC TAA 13 12 bli-7 110 ACCGCCAAAATGCAG TAA 14 13 Bml - ACCGTCAAGATGCGT TAA 15 14 chs-1 69 TCCGCAACCATGGCG TGA 16 15 cmt 127 TCTATCAAAATGGGT TAA 17 16 con-8 221 ACAATAACCATGGAT TGA 18 17 con-10 91 ATCGTCAACATGGCT TAG 19 18 con-13 86 CGTCGCAAGATGCCC TGA 20 19 cot-1 - GGTACCAAGATGGAC TAA 21 20 cpc-1 622 TCCATCAAGATGCGT TAA 22 21 cpi - TTAGTGAAAATGTTT TAA 23 22 crp-1 - GCAGACAACATGGTA TAA 24 23 crp-2 62 ACCGTCAAGATGCCC TGA 25 24 crp-3 58 GCCGGCAAAATGGGT TAA 26 25 cya-4 146 GCCGCCACCATGCTT TAA 27 26 cys-3 30 CATGGCACAATGTCT TAA 28 27 cys-14 32 GACACTCAGATGGCT TAA 29 28 cyt-2 - TCAGTCGCAATGGGT TAA 30 29 cyt-18 - TCACATCAAATGCTG TAA 31 30 cyt-20 57 GTCCTCTGGATGCCG TAA 32 31 cyt-21 125 CGGTCCAACATGGTT TGA 33 32 for 66 TCAGTCACCATGTCT TAA 34 33 frq - GAAACCTGAGTTGGA TGA 35 34 grg-1 89 TCAACCAAAATGGAT TAA 36 35 H3 - ACCATCACAATGGCC TAA 37 35 H4 - CATATCAAAATGACT TAA 38 36 his-3 124 GAAAACACCATGGAG TAA 39 37 hsp30 120 AAGTCAAAAATGGCG TAA 40 38 ilv-2 - TCCATCACAATGGCC TAA 41 39 laccase 190 TTTATCACCATGAAA TAG 42 40 leu-5 146 CACAACGCGATGCCT TAG 43 41 leu-6 220 TAAACAAACATGGCC TAA 44 42 lox 123 TCATACAAGATGAAG TGA 45 43 met-7 98 ATCACAGCCATGCTT TGA 46 44 mrp-3 - CCTCTCACCATGATC TAA 47 45 mta-1 - ACCGAAACAATGGAC TGA 48 46 mtA-1 - AGAAACACGATGTCG TAG 49 47 nac 162 CCGGTGACAATGACG TAA 50 48 ncypt1 - TTGCCCATCATGAAC TAA 51 49 nit-2 284 TGTGCGACAATGGCG TAA 52 50 nit-3 110 AGCATCATCATGGAG TGA 53 51 nit-4 39 CCCCGGCAGATGAAC TGA 54 52 nuc-1 - GCGGGCGTGATGAAC TAA 55 53 nur22 - ACCGTCAAGATGGCG TGA 56 54 nur40 - ACTCACAAGATGGCT TGA 57 55 nur49 - CAAACAACAATGGCG TAA 58 56 pho-4 145 TCGTTCAAGATGGTT TGA 59 57+58 pma-1 56 ATAACGCCAATGGCG TAA 60 59 preg - GGATTTGTGATGCTG TAA 61 60 pyr-4 61 ACAGCCAACATGTCG TAG 62 61 qa-1F 330 AATCCCAACATGCCG TAG 63 61+62 qa-1S 346 GCCGCCATCATGAAC TGA 64 61 qa-2 85 CCAAACACAATGGCG TGA 65 61 qa-3 83 TATATCACCATGTCG TGA 66 61+63 qa-4 190 CCTTTCGCCATGCCG TAA 67 61 qa-x 84 TCAGCAGCCATGACA TGA 68 61 qa-y 133 CGCGTCAAGATGACT TAA 69 64 sod-1 - TCCGTCAAAATGGTC TAA 70 65 spe-1 535 TCTTGGGATATGGTT TAA 71 66 T 94 GCAGCAACCATGAGC TGA 72 67 trp-1 29 CCAATCACAATGTCG TAA 73 68 trp-3 147 TCATACACAATGGAG TAA 74 69 Ubi - ACCCCCATCATGCAG TAA 75 70 ucr - ACCGACACAATGGCG TAA 76 71 vma-1 - TCGCCCAAGATGGCT TGA 77 72 vma-2 - TCTTCCACAATGGCC TAAKey: - in the Distance from +1 to ATG (bases) means that the authors had not determined the +1 position
The reason why the methionine start codon (ATG) is not 100% perfectly conserved within the Kozak consensus is that, for reasons unknown, the gene frq (Ref 33) starts its protein sequence with a valine (GTT).
It is also interesting to note that the choice of the second codon appears to be limited in that about half of the second codons have a guanosine in the first position and another half have a cytosine in the second position.
On the whole, our consensus shows a good resemblance to the mammalian Kozak sequence with a similar hierarchy of nucleotide preference for a given position, although the degree of preference may be shifted. An exception is the nucleotide position immediately preceding the initiator methionine codon (ATG) where N. crassa exhibits a definite suppression of thymine in contrast to a positive preference for any other nucleotide.
Fifty genes among the 77 analyzed have a determined mRNA 5' end. When several 5' ends were presented, +1 was taken to be the most distal from the ATG except when given by the authors themselves. In this way the mRNA sequences before the ATG have lengths between 30 and 622 bases.
The stop codon, determined by computer analysis by the authors, TAA in 62% of the cases, TGA in 27% and TAG in 11%
REFERENCES
1 Arends and Sebald, 1984, EMBO J., 3:377-382
2 Gainey et al. 1992, Curr. Genet., 21:43-47
3 Connerton et al. 1990, Molec. Microbiol., 4:451-460
4 Marathe et al. 1990, Mol. Cell. Biol., 10:2638-2644
5 Sandeman et al. 1991, Mol. Gen. Genet., 228:445-452
6 Schmidhauser et al. 1990, Mol. Cell. Biol., 10:5064-5070
7 Carrattoli et al. 1991, J. Biol. Chem., 266:5854-5859
8 Lee et al. 1990, Biochemistry 29:8779-8787
9 Kinnaird and Fincham, 1983, Gene, 26:253-260
10 Orbach et al. 1990, J. Biol. Chem., 265:10981-10987
11 Bowman and Knock, 1992, Gene, 114:157-163
12 Eberle and Russo, 1992, DNA Sequence, 3:131-141
13 Orbach et al. 1986, Mol. Cell. Biol., 6:2452-2461
14 Yarden and Yanofsky, 1991, Genes Dev., 4:2420-2430
15 Munger et al. 1985, EMBO J., 4:2665-2668
16 Roberts and Yanofsky, 1989, Nucl. Acids Res., 17:197-214
17 Roberts et al. 1988, Mol. Cell. Biol., 8:2411-2418
18 Hager and Yanofsky, 1990, Gene, 96:153-159
19 Yarden et al. 1992, EMBO J., 11:2159-2166
20 Paluh et al. 1988, Proc. Natl. Acad. Sci. USA, 85:3728-3732
21 Tropschug, 1990, Nucl. Acids Res., 18:190
22 Kreader and Heckman, 1987, Nucl. Acids Res., 15:9027-9042
23 Tyler and Harrison, 1990, Nucl. Acids Res., 18:5759-5766
24 Shi and Tyler, 1991,Nucl. Acids Res., 19:6511-6517
25 Sachs et al. 1989, Mol. Cell. Biol., 9:566-577
26 Fu et al. 1989, Mol. Cell. Biol., 9:1120-1127
27 Ketter et al. 1991, Biochemistry, 30:1780-1787
28 Drygas et al. 1989, J. Biol. Chem., 264:17897-17906
29 Akins and Lambowitz, 1987, Cell, 50:331-345
30 Kubelik et al. 1991, Mol. Cell. Biol., 11:4022-4035
31 Kuiper et al. 1988, J. Biol. Chem., 263:2840-2852
32 McClung et al. 1992, Mol. Cell. Biol., 12:1412-1421
33 McClung et al. 1989, Nature, 339:558-562
34 McNally and Free, 1988, Curr. Genet., 14:545-551
35 Woudt et al. 1983 Nucl. Acids Res., 11:5347-5366
36 Legerton and Yanofsky, 1985, Gene, 39:129-140
37 Plesofsky-Vig and Brambl, 1990, J. Biol. Chem 265:15432-15440
38 Sista and Bowman, 1992, Gene, 120:115-118
39 Germann et al. 1988, J. Biol. Chem. 263:885-896
40 Chow et al. 1989, Mol. Cell. Biol., 9:4631-4644
41 Chow and RajBhandary, 1989, Mol. Cell. Biol., 9:4645-4652
42 Niedermann and Lerch, 1990, J. Biol. Chem., 265:17246-17251
43 Crawford et al. 1992, Gene, 111:265-266
44 Kreader et al. 1989, J. Biol. Chem., 264:317-327
45 Staben and Yanofsky, 1990, Proc. Natl. Acad. Sci. USA 87:4917-4921
46 Glass et al. 1990, Proc. Natl. Acad. Sci. USA 87:4912-4916
47 Kore-eda et al. 1991, Jap. J. Genet., 66:317-334
48 Heintz et al. 1992, Mol. Gen. Genet., 235:413-421
49 Fu and Marzluf, 1990, Mol. Cell. Biol., 10:1056-1065
50 Okamoto et al. 1991, Mol. Gen. Genet., 227:213-223
51 Yuan et al. 1991, Mol. Cell. Biol., 11:5735-5745
52 Kang and Metzenberg, 1990, Mol. Cell Biol., 10:5839-5848
53 Nehls et al. 1991, Biochim. Biophys. Acta, 1088:325-326
54 Rohlen et al. 1991, FEBS, 278:75-78
55 Preis et al. 1990, Curr. Genet., 18:59-64
56 Mann et al. 1989, Gene, 83:281-290
57 Aaronson et al. 1988, J. Biol. Chem., 263:14552-14558
58 Hager et al. 1986, Proc. Natl. Acad. Sci. USA 83:7693-7697
59 Kang and Metzenberg, 1993, Genetics, 133:193-202
60 Glazebrook et al. 1987, Mol. Gen. Genet., 209:399-402
61 Geever et al. 1989, J. Mol. Biol., 207:15-34
62 Huiet and Giles. 1986, Proc. Natl. Acad. Sci. USA, 83:3381-3385
63 Rutledge, 1984, Gene, 32:275-287
64 Chary et al. 1990, J. Biol. Chem., 265:18961-18967
65 Williams et al. 1992, Mol. Cell Biol., 12:347-359
66 Kupper et al. 1989. J. Biol. Chem 264:17250-17258
67 Schechtman and Yanofsky, 1983, J Molec. Appl. Genet., 2:83-99
68 Burns and Yanofsky, 1989, J. Biol. Chem., 264:3840-3848
69 Taccioli et al. 1989, Nucl. Acids Res., 17:6153-6166
70 Harnish et al. 1985, Eur. J. Biochem., 149:95-99
71 Bowman et al. 1988, J. Biol. Chem. 263:13994-14001
72 Bowman et al. 1988, J. Biol. Chem., 263:14002-14007