This paper reports sequence features within nuclear genes from Sordaria macrospora. Eight nuclear gene sequences were analyzed for codon usage, GC content, intron regulatory sequences and translation initiation sites.
The homothallic ascomycete Sordaria macrospora is an excellent model
system to study not only meiotic pairing and recombination (Zickler 1977 Chromosoma
61:29-316) but also fruiting body development
(Esser and Straub 1958 Z. Vererbungslehre 89:729-746). Recently,
these studies have been extended to a molecular level (Walz and Kück 1995 Curr.
Genet. 29:88-95) and knowledge about sequence
features would be a helpful tool in sequence analysis. Until now, sequence information
from S. macrospora was only available from a single nuclear gene
(LeChevanton and Leblon 1989 Gene 77:39-49). Here we compile sequence
data from eight recently sequenced genes to determine common features of nuclear genes
from S. macrospora.
We provide a consensus sequence for the translation initiation site (Table 1),
a codon usage table (Table 2), and consensus sequences for intron regulatory
sequences (Table 3). Comparison of the data presented here with sequence features from
the well studied ascomycete Neurospora crassa (data taken from Brucherez
et al. 1993 Fungal Genet. Newsl. 40:85-95; and Edelman
and Staben 1994 Exp Mycol 18:70-81) shows that S. macrospora
sequence characteristics are very similar to those determined for N. crassa genes.
Table 1. Translation initiation context
Gene | Gene product | Translation initiation | Referencea |
EF1- | EF1- translation elongation factor | CCGTCAAAATGGG | 1 |
tuba | -tubulin | CATACAAAATGCG | 2 |
ura3 | orotidine phosphoribosyl transferase | CCGCCACAATGTC | 3 |
ura5 | orotidine monophosphate decarboxylase | CCAGCACAATGGC | 4 |
SmtA-1 | mating-type protein | GAAGTACGATGTC | 5 |
SmtA-2 | mating-type protein | CGACTGACATGGA | 5 |
SmtA-3 | mating-type protein | CTTTCAGCATGTC | 6 |
Smta-1 | mating-type protein | TCGAAACAATGGA | 5 |
a (1)Gagny, Koll and Silar, unpublished (Accession # X96615) (2) Pöggeler et al., submitted (Accession # Z70290) (3) Nouwrousian, unpublished (Accession # Z70291) (4) Nouwrousian, unpublished (5) Pöggeler et al., submitted (Accession # Y10616) (6) Pöggeler, unpublished
Consensus translation initiation
S. macrospora
*The subscript number indicates the percentage occurrence of the particular nucleotide.A38 C50 G50
C75* C50 N C63 A88 A63 A100 T100 G100 C50
G38 A38 T38
N. crassa
A
C N N N C A A A T G G C
C
The S. macrospora consensus for initiation of translation shows a high degree of identity to the N. crassa translation initiation consensus sequence and, as N. crassa, a prevalence of GC following the ATG which means that an alanine (GCN) is found at the amino terminus of most proteins studied so far.
Table 2. Codon usage analysis based upon 2497 codons
TTT-Phe 24( 25.0%)a TCT-Ser 22( 16.4%) TAT-Tyr 21( 26.9%) TGT-Cys 3(8.6%) TTC-Phe 72( 75.0%) TCC-Ser 42( 31.3%) TAC-Tyr 57( 73.1%) TGC-Cys 32( 91.4%) TTA-Leu 3( 1.7%) TCA-Ser 12( 9.0%) TAA-Ter 4( 57.1%) TGA-Ter 1( 14.3%) TTG-Leu 21( 11.6%) TCG-Ser 28( 20.9%) TAG-Ter 2( 28.6%) TGG-Trp 32(100.0%) CTT-Leu 45( 24.9%) CCT-Pro 41( 30.6%) CAT-His 23( 30.7%) CGT-Arg 34( 27.6%) CTC-Leu 80( 44.2%) CCC-Pro 68( 50.7%) CAC-His 52( 69.3%) CGC-Arg 57( 46.3%) CTA-Leu 3( 1.7%) CCA-Pro 13( 9.7%) CAA-Gln 24( 23.8%) CGA-Arg 7( 5.7%) CTG-Leu 29( 16.0%) CCG-Pro 12( 9.0%) CAG-Gln 77( 76.2%) CGG-Arg 6( 4.9%) ATT-Ile 47( 32.6%) ACT-Thr 25( 19.2%) AAT-Asn 17( 17.3%) AGT-Ser 5( 3.7%) ATC-Ile 95( 66.0%) ACC-Thr 71( 54.6%) AAC-Asn 81( 82.7%) AGC-Ser 25( 18.7%) ATA-Ile 2( 1.4%) ACA-Thr 15( 11.5%) AAA-Lys 12( 7.2%) AGA-Arg 7( 5.7%) ATG-Met 63(100.0%) ACG-Thr 19( 14.6%) AAG-Lys 154( 92.8%) AGG-Arg 12( 9.8%) GTT-Val 40( 24.2%) GCT-Ala 70( 30.7%) GAT-Asp 61( 42.4%) GGT-Gly 61( 32.3%) GTC-Val 101( 61.2%) GCC-Ala 115( 50.4%) GAC-Asp 83( 57.6%) GGC-Gly 94( 49.7%) GTA-Val 5( 3.0%) GCA-Ala 21( 9.2%) GAA-Glu 33( 19.0%) GGA-Gly 24( 12.7%) GTG-Val 19( 11.5%) GCG-Ala 22( 9.6%) GAG-Glu 141( 81.0%) GGG-Gly 10( 5.3%)
The GC content in a coding region of 7491 nucleotides is 56.7%. For comparison in N. crassa the GC content is 58.6% in the coding region (GC content in total DNA 54.1%). In cases where amino acids are represented by more than one codon, S. macrospora, as many other organisms, does not use synonym codons equally (Table 2).
In S. macrospora, as in N. crassa, codons are preferred with a C in the third position and in four codon families the codon ending in T is usually preferred to those ending in A or G. The stop codon TAA is more frequently used than TAG or TGA, respectively. The six least used codons for S. macrospora are ATA (Ile), TTA (Leu), CTA (Leu), TGT (Cys), GTA (Val), and AGT (Ser). All of these six codons are belonging to low-usage codons in N. crassa as well. As reported by Zhang et al. (1991 Gene 105:61-67) in many organisms, low-usage codons are clearly avoided in abundant proteins and therefore may affect translation rates.
Table 3. Intron regulatory sequences and intron length
Intron | 5' Intron
Donor |
Branch
Site |
Distance to
3' Splice-Site/ntb |
3' Intron
Acceptor |
Intron
Length / bp |
SmtA-1/1a | T^GTAAGT | ACTGATT | -19- | TTCAG^ | 58 |
SmtA-1/2a | G^GTTAGT | ACTCGTG | -21- | GGCAG^ | 60 |
SmtA-2/1a | G^GTAACA | ACTGATG | -14- | GCCAG^ | 57 |
SmtA-2/2a | G^GTGAGT | ACTGACA | -12- | GATAG^ | 71 |
SmtA-2/3a | T^GTAAGA | ACTAATA | -12- | GACAG^ | 47 |
SmtA-2/4a | G^GTTTGC | GCTAACA | -16- | GACAG^ | 55 |
SmtA-3/1a | C^GTGAGT | ACTGACT | -12- | GTTAG^ | 54 |
Smta-1/1a | A^GTAAGT | ACTGACC | -15- | TTTAG^ | 53 |
Smta-1/2a | T^GTAGGT | ACTAACC | -12- | CTTAG^ | 57 |
tuba/1 | G^GTACGT | GCTAACG | -22- | TCTAG^ | 256 |
tuba/2 | G^GTAGGT | GCTAACC | -15- | ATTAG^ | 149 |
tuba/3 | G^GTAAGC | GCTAACC | -17- | TACAG^ | 80 |
tuba/4 | G^GTACAT | GCTTACA | -18- | CACAG^ | 60 |
tuba/5 | G^GTATGT | ACTAACT | -16- | CTTAG^ | 64 |
tuba/6 | T^GTAAGT | GCTAACT | -14- | CCTAG^ | 57 |
ef1/1 | G^GTAATG | GCTAACG | -14- | AACAG^ | 100 |
ef1/2 | G^GTTAGT | ACTGACT | -15- | AACAG^ | 243 |
ef1/3 | G^GTATGT | GCTAACT | -17- | AAAAG^ | 60 |
Consensus 5' Intron-Donor
S. macrospora G67^ G100 T100 A72 A61 G83 T72 N. crassa G^ G T A A G T
Consensus Intron Branch Site
A56 A56 S. macrospora C100 T100 A94 C78 N G44 G33 A A C N. crassa C T A C G G AConsensus 3' Intron-Acceptor
G33 A39 C56 S. macrospora A100 G100 A27 T39 T44 A A T N. crassa A G T T CIn S. macrospora genes the intron length lies between 47 bp and 256 bp, the average length is 88 bp and the median length is 60 bp. Intron length in N. crassa ranges from 46 to 856 bp with a tendency toward 60 to 70 bp. Among the eight genes analyzed so far, two genes, ura3 and ura5, do not contain introns. In S. macrospora introns the distance from the C of the splice branch site to the G of the 3' splice site is between 12 nt and 22 nt. This distance varies in N. crassa from 14 to 30 nucleotides. The S. macrospora intron signals (5' donor site, intron branch site and 3' intron acceptor site) are very similar to the N. crassa intron consensus sequences.
Acknowledgments
I would like to thank Prof. Dr. U. Kück (Bochum) for his generous support, M. Nouwrousian for providing the unpublished sequence from the S. macrospora ura3 and ura5 genes, and D. Hahn for critical reading of the manuscript. This work was supported by a grant of the Deutsche Forschungsgemeinschaft (Bonn-Bad Godesberg).
Return to the FGN 44 Table of Contents