Identification of putative regulatory signals including the HAP1 binding site in the upstream sequence of the Aspergillus nidulans cytochrome c gene (cycA).
Linda J. Johnson and Rosie E. Bradshaw - Institute of Molecular BioSciences, Massey University, Palmerston North, New Zealand.
We speculate that a HAP1-like protein, similar to those which regulate oxygen transcriptional activation of many yeast respiratory genes, will probably also regulate the A. nidulans cytochrome c (cycA) gene. As part of a study to investigate the significance of a putative HAP1 (Haem Activator Protein) binding site in the regulatory region of the cycA gene, routine sequencing revealed an error in the published sequence (Raitt et al. 1994 Mol. Gen. Genet. 242: 17-22). Examination of the corrected sequence, including RT-PCR analysis of cycA mRNA, showed that an extra intron was present, and that the published translational start site was incorrect. This meant that the putative HAP1-binding site proposed by Raitt et al. could not be a regulatory element. However, further sequence analysis of the upstream sequence of the corrected cycA gene revealed putative regulatory signals, including possible HAP1 binding sites which are a closer match to recently reported yeast consensus sequences (Ha et al. 1996 Nucl. Acids Res. 24: 1453-1459).
Two amplification products of 596 bp and 298 bp, indicative for the splicing of three introns, were produced by RT-PCR using two different pairs of primers (positioned at -266 and 867 nt, and at -266 and 415 nt, respectively, relative to the translational start site at +1). Subsequent sequencing of both these products confirmed that the cycA gene contained three introns instead of the published two. Consequently, the published ATG initiation codon fell within the region of the previously undetected intron (Intron I).
We propose a new translational start site, which has the correct reading frame to produce the conserved cytochrome c protein after the splicing of the new intron. This ATG initiation codon is preceded by a strong Kozak sequence for initiation of translation, and provides the first AUG in the mRNA. In addition the new predicted N-terminal region is very similar in sequence to the N-terminal region of the S. cerevisiae CYC7 gene, and the position of the additional intron (Intron I) is conserved, being found at an identical position to that of an intron in the Neurospora crassa cytochrome c gene.
Due to the re-location of the translational start codon, the putative HAP1 binding site proposed by Raitt et al. was found to be situated in the coding region of the gene, and hence is not an upstream regulatory element. To determine if alternative HAP1 binding sites and/or other upstream regulatory elements are present, the upstream (5') sequence of the cycA gene was examined.
A 2.1 kb EcoRI fragment containing the 5' region of the cycA gene was identified from an Aspergillus nidulans l library (kindly provided by Michael Hynes, University of Melbourne) and an additional 1297 bp of cycA sequence was obtained upstream of the coding sequence (Figure 1). Analysis of this region revealed possible consensus sequences for the binding of regulatory proteins.
The CCAAT motif, which is a recognition site for the A. nidulans AnCF complex (A. nidulans CCAAT binding Factor), was found at position -446 nt (Figure 1). This complex is required to set the basal level of amds transcription in A. nidulans (Bonnefoy et al. 1995 Mol. Gen. Genet. 246: 223-227). If the AnCF complex acts via the CCAAT sequence in the cycA promoter, it seems likely that it will probably affect the expression of the cycA gene by setting a basal level of transcription.
In addition, three candidate HAP1 binding sites were found in the cycA upstream region which were similar to the 'optimal' HAP1 binding site CGG N3 TA N CGG N3 TA (Ha et al. 1996 Nucl. Acids Res. 24: 1453-1459). These are aligned with the known yeast HAP1 binding sites in Figure 2. The study by Ha et al. showed that HAP1 will only bind to direct CGG repeats with a 6-bp spacer. If the CGG repeats are not conserved (ie. are degenerate forms), the TA repeats positioned asymmetrically in the spacer region are then essential for HAP1 binding.
The A. nidulans HAP1 site at -669 of the cycA sequence is a strong candidate for a HAP1 binding site as it has a direct repeat of GGC and has the TA sequence to stabilise protein binding. The other putative HAP1 site at -634 is also a good candidate since it has a direct repeat of the optimal CGG triplet, but does not have the TA sequence. The sequence at -905 has some features of a HAP1 binding site but is situated 900 bp upstream from the ATG start codon, whereas most regulatory binding sites are usually found closer to the ATG. Thus the -634 HAP1 region is proposed to contain the most likely binding site for HAP1, but the possibility exists that both the -634 and -669 sites, only 35 bp apart, may both be HAP1 sites.
Given these results, we propose that a HAP1-type gene protein will be involved in the oxygen induced transcriptional activation of the cycA gene, and that the A. nidulans AnCF complex, which is analogous to the yeast HAP2/3/4/5 complex, will regulate the cycA gene by setting the basal level of transcription. Examination of the functional significance of the proposed cycA regulatory motifs is underway in our laboratory.
. . . . -1247 TCACATAGCTCCCAACCCAGAAAGCAGTTTGCGGGTAAAT . . . . . . -1207 GAGTACGCACAAAAGCAATCCAGACATGAATCCACCGACTCGTCAAAAACCGAAACATGA . . . . . . -1147 CCGTCCCCTCGGGCGGGAACATATTCGGGTACTTCTTTTTTGCCCGCTCCGCCTCTTCTT . . . . . . -1087 TCTCAGAGAACTTGGGACCGGGGTAGTTAACGACTTTACCATTGCGTGTTGGAACGACGC . . . . . . -1027 GGCGCGCAGGATGTGAGGGCGGGCGAAATTTATCTGGTTGCGCGAGGACGCGCGGATTTT . . . . . . -967 GGCTGTTGTCACTTGAGGAATTTGTTGATGCGCAGCGGTGCGTCGGTGGGAGCCTGCCGA . . . . . . -907 TGCGGGAGGACCGGGACAGGAATAGGATTGCTCGTGCTGAGGGAACACCTCGGGGAAGGA . . . . . . -847 GGAATGGGGGAATGGCTGATGGGGGCATTCTGTACGAATTGCTGGTTCTTGGCTGCGATT . . . . . . -787 TCTCTATATGCTAGCTTCTGGTCGCCGGCATAACATTTTGGTGTTGATATAATCATGTGA . . . . . . -727 CTTCTGCCGCCGGGAATAAGGCATCAAGGCATCAAGGCACAAACACATTTTTTCTAATGG . . . . . . -667 CCGCTAAGGCATCAGGCCACTTCGGATTAGGGCCGGGGAGAGCGGGAAAAACTCGCCATG . . . . . . -607 ACTAGGCGAATGAAAGGATGCAGATTTGTTATTACGGGGAGGGCTACTCCGGCCTCCGTA . . . . . . -547 GCCCACCGTTGCCCATTCCCCGAGACAGACAGTGCAGAGCTCCAAGTAACCAGCGTCTCT . . . . _______ . -487 ATGCGCTGGAATGAGGTCATGCCATGACGGCATGCAGAATCACCAATCATCTTTTACAGT . . . . . . -427 ATAGTAGTTAGGCTCTATGATAGATAGATGTCATAGAAGGTGCATTGTTGCTATCTAGAG . . . ______ . ________ -367 CTGCATAACTGAGCCCTTAGACGTAGTATATAGGATTACAATAGTCTCTAAATAAAGCTT . . ______ . . . -307 CATCCAGCCAGGCGTTATTTGCAGTGACCGATTCCTGACCTCAGACCCGGCAGCGCCCGT . . * . . . . -247 TACTCTGAGCACAGTGCTGAATCATCCTACCTCTGATTGGTCAATTCCCAGATCACGGGT . . . . . . -187 GTCGTCGGGGGCGCGACCAAGAAACCAGCTCTACAAATTCCCTCCAAGTTTTTTTCTTCC . . . . . . -127 CTTTGGCCAGTCCGCTTGACTTGAATTCTGTCTTTCATCTTCTCTGCTACATACAACTCT . . . . . . -67 TTGTACTATACCACTTACCTCTTTACATAACCCTTTCTCTCTACCTCTTTATTTTTTATC . . . . . . -7 ACTCACAATGGCTAAGGGCGGTGACAGCTACTCTCCTGgtaagtagtttgaattcatctc M A K G G D S Y S P G . . . . . . 54 ttcgtttcggctaggcgctctctgctgggggtgatattcactccactgcgtgcattgtcg . . . . . . 114 aagtgaacgatatgtgagacacaggctgtgaatgatgtgttgggtctggtgagaacattg . . . . . . 174 tccgatccaacacaagctcaaagttgcccacttctggatcgccatttgatcgccagcaca . . . . . . 234 atacaatttctacttctatcgcgcgctggcaatcctcactttgcagcggatgctttattc . . . . . . 294 ttcatcgtcgtcgccacaatgacgagcttcgacgcttactgcttcacgaccactctcagc . . . . . . 354 atgcgcgaagccgaactggaagagctggagaaagagaaagcggaccagaatgctaatcaa . . . . . . 414 ttgttttttccagGCGACTCTACCAAGGGTGCTAAGCTCTTCGAGACCCGTTGCAAGCAG D S T K G A K L F E T R C K Q . . . . . . 474 TGCCACACTGTCGAGAACGGCGGCGGCCACAAGGTCGGCCCCAACCTCCACGGTCTCTTC C H T V E N G G G H K V G P N L H G L F . . . . . . 534 GGCCGTAAGACTGGTCAGGCTGGAGGCTACGCCTACACCGATGCCAACAAGCAGGCCGAC G R K T G Q A G G Y A Y T D A N K Q A D . . . . . . 594 GTCACCTGGGACGAGAACTCTCTGgtacaatcccatgacagctctaacagctctgggccg V T W D E N S L . . . . . . 654 attgctaactttctttcaccaacagTTCAAGTACCTCGAGAACCCCAAGAAGTACATCCC F K Y L E N P K K Y I P . . . . . . 714 TGGTACCAAGATGGCTTTCGGTGGTCTCAAGAAGACCAAGGAGAGGAACGATCTCATCAC G T K M A F G G L K K T K E R N D L I T . . . . . . 774 gtatgtaacgctgcttaccacggatagggcacataggctaacaggatgcacagCTACCTC Y L . . . . . . 834 AAGGAGAGCACTGCTTAAATCGTTCGCGATTAGACGAGATAAACCGCCCCCCTGGGATTA K E S T A end 894 GACGAGGGCCTCTGGCTAGGTGACAGGCGGGTACTGTAACATTACACCTAGACCTGGTTT 954 TGAAGGTCGTCGGGACATGGAGGATATTATAGATCTTGTTTCCTTCGCCATCCTTGTCTA 1014 TATCTTATTCTTTCCTTTACCTTGACGAGTGTTTTTTCAGCTTTGTGGTACC
Figure 1. The nucleotide sequence of the cycA gene. The nucleotide sequence of the cycA gene published by Raitt et al. has been presented here in its corrected form, along with the additional upstream sequence (-1247 to -248 nt) obtained from this study. Nucleotides are numbered from the A of the newly proposed translational start codon (+1). The major transcriptional start site (revealed by primer extension) is indicated by an asterisk. Three intervening regions (Introns I, II and III) are displayed in lower case letters. The predicted amino acid sequence is shown below the coding strand. The position of the observed sequencing error within Intron I is underlined. The HAP1 consensus sequences are double underlined, the putative AnCF complex binding site is both over and underlined, and putative TATA motifs are overlined.
Known Yeast UASs of HAP1:
CYC1 TGGC CGG GGTTTA CGG ACGATGA
CYC7 CCCT CGC TATTAT CGC TATTAGC
CTT1 GGAA TGG AGATAA CGG AGGTTCT
CYB2 GGCA AGG AGATAT CGG CAGGCTT
CYT1 CCGC CGG AAATAC CGG CCGCCCA
CYT1 (reverse) CGGC CGG TATTTC CGG CGGCCAA
KlCYC1 (reverse) ATTT CGG GAACAT CGG TCAAGAC
A. nidulans putative HAP1 UAS:
CYCA (-634) CCGC CGG GGAGAG CGG GAAAAGG
CYCA (-669) TAAT GGC CGCTAA GGC ATCAGGC
CYCA (-905) GATG CGG GAGGAC CGG GACAGGA
OPTIMAL CGG NNNTAN CGG NNNTA
Figure 2. Comparison of known yeast and putative A. nidulans HAP1 UASs.
The known (functional) HAP1 binding sites from yeast are aligned with the putative HAP1 binding sites from the A. nidulans cycA gene. Characters underlined indicate nucleotides which match the conserved CGG triplets or TA repeats given in the 'optimal' HAP1 sequence.
Genes with HAP1 sites (Ha et al. 1996 Nucleic Acids Res 24: 1453-1459):
CYC1: Iso-1-cytochrome c from S. cerevisiae
CYC7: Iso-2-cytochrome c from S. cerevisiae
CTT1: Catalase T from S. cerevisiae
CYB2: Cytochrome b2 from S. cerevisiae
CYT1: Cytochrome c1 from S. cerevisiae
KlCYC1: Cytochrome c from Kluyveromyces lactis