Dolan et al, FGN 47:

Neurospora Proteome 2000.

Patricia L. Dolan, Donald O. Natvig and Mary Anne Nelson - Department of Biology, University of New Mexico, Albuquerque, NM 87131

The filamentous fungus, Neurospora crassa, has an eminent history as a central organism in the elucidation of the tenets of classical and biochemical genetics. Of particular significance are the experiments of George Beadle and Edward Tatum in the 1940s with N. crassa that led to the "one gene-one enzyme" hypothesis (Beadle and Tatum 1941 Proc. Natl. Acad. Sci. USA 27:499 506). In six decades, over 1,000 genes have been mapped and characterized (Perkins, Radford and Sachs 2000 The Neurospora Compendium: Chromosomal Loci. Academic Press; Perkins 2000 Fungal Genet. Newsl., this volume), but that leaves perhaps 10,000 or more genes not yet identified by classical genetics. High-throughput, automated partial sequencing of cDNA libraries to generate expressed sequence tags (ESTs) allows for the rapid identification and characterization of preferentially expressed genes in different tissues, as well as the discovery of novel genes (Adams et al. 1991 Science 252:1651-1656; Okubo et al. 1992 Nature Genet. 1:173-179).

In 1995, the first systematic analysis of the Neurospora genome was undertaken by the Neurospora Genome Project at the University of New Mexico (Nelson et al. 1997 Fungal Genet. Biol. 21:348-363). Initially, three cDNA libraries were constructed for this project, using mRNA isolated from conidial (germinating asexual spores), starved mycelial (branching hyphae), and perithecial (fertilized fruiting bodies) tissues. Single-pass, partial sequencing of cDNA clones was used to determine the nature of the encoded products. Following the initial phase, subtractive hybridization was used to remove the most abundantly expressed genes from the cDNA libraries. In addition, a fourth cDNA library (Westergaard) was constructed using mRNA from unfertilized tissue grown under mating conditions (Westergaard and Mitchell 1947 Amer. J. Botany 34:573 577). cDNA clones from these subtracted libraries (conidial, mycelial and perithecial) and the Westergaard library have been sequenced and analyzed in this, the second phase of the Neurospora Genome Project at UNM.

The Neurospora Proteome represents the first systematic, functional classification of the known genes of a filamentous fungus. Proteome refers to the complement of proteins expressed by a genome or tissue. In order to identify putative protein homologs to the expressed genes of Neurospora, the BLASTX algorithm (Altschul et al. 1990 J. Mol. Biol. 215:403-410; Altschul et al. 1997 Nucl. Acids Res. 25:3389-3402) was used to translate each EST nucleotide sequence into the six possible reading frames and compare the predicted protein sequences with the protein sequence database available through the National Center for Biotechnology Information (NCBI, Bethesda, MD). Low complexity regions that could generate artificially high scores were removed by filtering, using the SEG program available through NCBI. The database matches were divided into groups according to statistical significance values (P or E values): highly significant (P/E values =10-20), moderately significant (10^-5 to 10^-19), weakly significant (10^-2 to 10^-4), and not statistically significant (>10^-2).

A total of 3,397 cDNA clones were analyzed for this version of the Neurospora Proteome (summarized in Table 1). Significantly more highly and moderately significant matches were detected in the conidial and Westergaard libraries (66.5% and 70.0%, respectively) than in the mycelial and perithecial libraries (37.0% and 40.5%). Correspondingly fewer cDNAs showing no similarity to any previously characterized genes were present in the conidial and Westergaard libraries (33.5% and 30.0%, respectively), while the mycelial and perithecial libraries had greater percentages of cDNAs encoding apparently novel genes (63.0% and 59.5%, respectively).

Those cDNA clones encoding products with highly or moderately significant matches in the NCBI nonredundant (nr) database are identified in Table 2, for a total of 680 different genes. They are categorized according to the classification scheme utilized by the Expressed Gene Anatomy Database, or EGAD, developed at The Institute for Genomic Research (TIGR, Rockville, MD), and available at: http://www.tigr.org/docs/tigr-scripts/egad_scripts/role_report.spl (White and Kerlavage 1996 Methods Enzymol. 266:24-40). Division into these major functional categories (I-VII) facilitates examination of the tissue-specific expression of functional classes of genes.

The pattern of tissue-specific gene expression has remained very similar to that identified in the first phase of the Neurospora Genome Project (Nelson et al. 1997 Fungal Genet. Biol. 21:348-363). The majority of identified genes from all four libraries encode products involved in metabolism or protein synthesis. Interestingly, the new Westergaard library appeared to mirror the conidial library in its pattern of expression, e.g., of genes involved in sugar metabolism, 38% and 50% were in the Westergaard and conidial libraries, respectively, and genes for ribosomal proteins were 39.4% and 49.6%, respectively. 80% of the genes involved in secondary metabolism were found only in the perithecial library, and the remaining 20% were from the mycelial library. An unexpected finding was the high expression of the nmt1 and nmt2 genes, particularly nmt1, in unfertilized sexual tissue (the Westergaard library). 10.4% of the Westergaard cDNA clones are nmt1 homologs, as compared with 1.1%, 0.3% and 0.1% of the conidial/subtracted conidial, mycelial/subtracted mycelial and perithecial/subtracted perithecial libraries, respectively. Both of these genes encode products involved in the biosynthesis of thiamine (Maundrell 1990 J. Biol. Chem. 265:10857-10864; Manetti et al. 1994 Yeast 10:1075-1082).

The burgeoning field of genomics has provided molecular biologists a revolutionary opportunity for large-scale study of complete genomes, their transcriptional expression patterns, and the products encoded by identified genes, i.e., the proteome. The Neurospora Proteome represents the systematic, in silico classification of those genes that appear to be expressed in specific developmental stages of N. crassa, as well as the first of what will likely be many such proteomes of the filamentous fungi.

Acknowledgements

We thank Matthew Crawford for his invaluable computer skills and his development of our relational database. We are, especially, grateful for the dedication and hard work of all the undergraduate students who participated in the Neurospora Genome Project. This research was supported by the National Science Foundation RIMI (Research Improvements in Minority Institutions) Program Grant HRD-9550649 to D.O.N. and M.A.N., the U.S. Public Health Service Grant GM47374 to M.A.N. and NSF Grant MCB-9874488 to M.A.N.

Table 1: Summary of cDNA clones^a

Library^b	Percent with Putative Identification	Percent Previously Uncharacterized (Novel)	Number cDNAs Analyzed
Conidial (C)	66.5%	33.5%	872
Mycelial (M)	37.0%	63.0%	1,104
Perithecial (P)	40.5%	59.5%	730
Westergaard (W)	70.0%	30.0%	691
Total	52.90	48.0%	3,397

^aThose cDNAs with P/E values equal to or less than 10^-5 were classified as encoding proteins with putative identifications, while those with P/E values greater than 10^-5 were classified as novel genes.

^bEach library includes clones from both subtracted and unsubtracted libraries.

Table 2. Putative Identification of Neurospora cDNAs

Legend and Footnotes to Table 2: The EGAD cellular role classification scheme (White and Kerlavage 1996 Methods Enzymol. 266:24-40) has been used whenever possible. Those identified Neurospora ORFs lacking homologs in the EGAD classification were classified as appropriate under Secondary metabolism, etc. Only those sequences with BLASTX P/E values of 10-5 or lower (highly or moderately significant) are reported in this table, except as noted. Fourteen sequences falling within this range were determined to reflect spurious matches (e.g., proline-rich regions) and not actual homology; those sequences were not included in this table. As additional sequence information becomes available, expanded versions of this table can be accessed at our web site: http://www.unm.edu/~ngp/

a A single representative clone ID is given in those cases in which multiple (duplicate) cDNAs have been identified.

b MatchAcc generally indicates the best match (identified by its accession number) to a sequence in the NCBI nonredundant protein database; however, in those cases in which the best match was to an unidentifed open reading frame, a less optimal match to an identified sequence is shown.

c Identification refers to the reported match in the NCBI protein database. The organism of the best match is indicated in parenthesis (see list of abbreviations below).

d The BLASTX P/E value is that obtained with the respective Neurospora cDNA clone and the sequence identified in the MatchAcc and Identification columns.

e The tissues from which the respective cDNAs were isolated are identified, where C indicates conidial, M denotes mycelial, P is perithecial and W is Westergaard (unfertilized sexual tissue). The number preceding these abbreviations indicates the number of duplicate cDNAs isolated from each tissue.

f The BLASTN P value is reported. The corresponding BLASTX P value was greater than 10-5, and so was not considered significant (see text).

g The following 40S ribosomal proteins were identified: MRP2, P40 homolog B, RP10, RP41, S2-18, S20-22, S24-28, S30-31, S33 and the putative ribosome-associated protein similar to ribosomal protein SA. The identified 60S ribosomal proteins included: L1-5, L7, L9-19, L22-23, L25-30, L32, L35, L37, L39- 40 and the acidic ribosomal proteins P0, P1 and P2.

Abbreviations of Organisms, Table 2
Aa, Alternaria alternata
Ab, Agaricus bisporus
Ac, Ajellomyces capsulatus
Acb, Acinetobacter sp.
Ae, Alcaligenes eutrophus
Ai, Ascobolus immersus
Am, Amanita muscaria
Aspa, Aspergillus awamorii
Aspac, Aspergillus aculeatus
Aspfl, Aspergillus flavus
Aspfu, Aspergillus fumigatus
Aspn, Aspergillus niger
Aspo, Aspergillus oryzae
Aspp, Aspergillus parasiticus
At, Arabidopsis thaliana
Av, Agrobacterium vitis
Bn, Brassica napus
Bs, Bacillus subtilis
Bt, Bos taurus
Ca, Candida albicans
Cc, Cochliobolus carbonum
Ce, Caenorhabditis elegans
Cg, Colletotrichum gloeosporioides (valvae)
Ci, Coccidioides immitis
Ck, Cercospora kikuchii
Cl, Cylindrocarpon lichenicola
Cm, Candida maltosa
Ct, Candida tropicalis
Ctl, Colletotrichum lagenarium
Cum, Cucurbita maxima
Dd, Dictyostelium discoideum
Dm, Drosophila melanogaster
Dr, Deinococcus radiodurans
Ec, Escherichia coli
En, Emericella nidulans
Fn, Filobasidiella neoformans
Fo, Fusarium oxysporum
Fs, Fusarium sporotrichioides
Galg, Gallus gallus
Gc, Glomerella cingulata
Gg, Gaeumannomyces graminis (graminis)
Hh, Halobacterium halobium
Hi, Haemophilus influenzae
Hp, Hansenula polymorpha
Hs, Homo sapiens
Hui, Humicola insolens
Hv, Hordeum vulgare
Km, Kluyveromyces marxianus var. lactis
Le, Lycopersicon esculentum
Lj, Lotus japonicus
Lm, Leishmania major
Ma, Metarhizium anisopliae
Mg, Magnaporthe grisea
Mm, Mus musculus
Mor, Moraxella sp.
Mp, Mycoplasma pneumoniae
Mt, Mycobacterium tuberculosis
Mx, Myxococcus xanthus
Nc, Neurospora crassa
Nh, Nectria haematococca
Nt, Nicotiana tabacum
Oc, Oryctolagus cuniculus
Os, Oryza sativa

Pa, Pichia angusta
Pg, Pyricularia grisea
Phi, Phytophthora infestans
Pip, Pichia pastoris
Pm, Pichia methanolica
Pnch, Penicillium chrysogenum
Pnci, Penicillium citrinum
Pnj, Penicillium janthinellum
Pnp, Penicillium patulum
Pnpur, Penicillium purporogenum
Pp, Physcomitrella patens
Psae, Pseudomonas aeruginosa
Pspc, Pseudomonas paucimobilis
Pspt, Pseudomonas putida
Pss, Pseudomonas syringae
Pyh, Pyrococcus horikoshii
Rh, Rosa hybrida
Rhrh, Rhodococcus rhodochrous
Rhz, Rhizobium sp.
Rn, Rattus norvegicus
Rr, Rattus rattus
Sa, Sulfolobus acidocaldarius
Sc, Saccharomyces cerevisiae
Sf, Saccharomycopsis fibuligera
Sm, Streptococcus mutans
Som, Sordaria macrospora
Sp, Schizosaccharomyces pombe
Spp, Sphingomonas paucimobilis
Ssd, Sus scrofa domestica
Strc, Streptomyces coelicolor
Strg, Streptomyces griseus
Strr, Streptomyces reticuli
Strv, Streptomyces violaceoruber
Syn, Synechocystis sp.
Tc, Trypanosoma cruzi
Tl, Thermococcus litoralis
Tm, Thermatoga maritima
Trh, Trichoderma harzianum
Trm, Trichophyton mentagrophytes
Trr, Trichoderma reesei
Um, Ustilago maydis
Vc, Volvox carteri
Yl, Yarrowia lipolytica

Return to the FGN 47 page
Return to the main FGN page
Return to the FGSC main page

Last modified 8/3/00 KMC