home uniprot
Protein Search Site Search
 
       Home      About PIR     Databases      Search/Retrieval      Download      Support
HOME / About / Staff Members / C.H. Wu

Cathy H. Wu, Ph.D.

Professor -Department of Biochemistry and Molecular & Cellular Biology
Department of Oncology
Director -Protein Information Resource
Georgetown University Medical Center
wuc@georgetown.edu
10/16/01- Dr. Wu appears in The Scientist
Cathy Wu at the Crossroads: She saved the Protein Information Resource database and now aims to restore it to the world's best
Full article
Primary Expertise
Dr. Wu has conducted bioinformatics research since 1990 and developed several protein classification systems and databases. She has managed large software and database projects, led the bioinformatics effort of the Protein Information Resource (PIR) since 1999, and becoming the PIR Director in 2001. Her research interests include protein family classification and functional annotation, biological data integration, and literature mining.

Academic Appointments
1989-1994 Assistant Professor, Department of Computer Science, University of Texas at Tyler
1990-1999 Assistant Professor (90-94); Associate Professor (94-98); Professor (98-99) of Biomathematics University of Texas Health Center at Tyler
1999-2002 Director of Bioinformatics, PIR (99-02); Vice President (00-02), National Biomedical Research Foundation, Washington, D.C.
2001-present Professor, Department of Biochemistry & Molecular Biology; Director, PIR, Georgetown University Medical Center (GUMC)
2002-present Professor, Department of Oncology; Member, Lombardi Comprehensive Cancer Center, GUMC

Professional Activities
Member, Advisory Committee, Protein Structure Initiative, NIGMS, NIH (2002-present).
Member, Board of Directors, International Society for Computational Biology (2002-2004).
Over 15 Conference Organizing/Program Committees, including: ISMB, PSB, EITC, CBGI, BIOKDD
Over 20 Grant Review Panels/Study Sections at NIH, NSF, and DOE
Over 70 Invited Presentations/Lectures at international conferences, workshops, academia, and industry

Education
B.S., Plant Pathology, National Taiwan University, Taiwan, 1978
M.S., Plant Pathology, Purdue University, W. Lafayette, IN. 1982
Ph.D., Molecular Plant Pathology, Purdue University, W. Lafayette, IN. 1984
Post. Doc., Molecular Biology, Michigan State University, E. Lansing, MI, 1986
M.S., Computer Science. University of Texas at Tyler, Tyler, TX. 1989

Patent
United States Patent No. 5,845,049, December 1, 1998, C. H. Wu. A neural network system with n-gram term weighting method for molecular sequence classification and motif identification

Publications
BOOK: Bioinformatics for Comparative Proteomics.
Wu CH, Chen C (Eds.).
Methods in Molecular Biology, Volume 694, Series Editor J.M Walker, Humana Press. 2011.
BOOK: Computational Biology and Genome Informatics.
Wang J, Wu CH, Wang P (Eds.).
World Scientific. 2003.
BOOK: Neural Networks and Genome Informatics.
Wu CH, McLarty JM (Eds.).
Methods in Computational Biology and Biochemistry, Volume 1, Series Editor A. K. Konopka, Elsevier Science. 2000.
A comprehensive protein-centric ID mapping service for molecular data integration.
Huang H, McGarvey PB, Suzek BE, Mazumder R, Zhang J, Chen Y, Wu CH.
Bioinformatics. Apr 15;27(8):1190-1.. 2011.
Protein-centric data integration for functional analysis of comparative proteomics data.
McGarvey PB, Zhang J, Natale DA, Wu CH, Huang H.
Methods Mol Biol. 694:323-39. 2011.
Structure-guided rule-based annotation of protein functional sites in UniProt Knowledgebase.
Vasudevan S, Vinayaka CR, Natale DA, Huang H, Kahsay RY, Wu CH.
Methods Mol Biol. 694:91-105. 2011.
eFIP: a tool for mining functional impact of phosphorylation from literature.
Arighi CN, Siu AY, Tudor CO, Nchoutmboube JA, Wu CH, Shanker VK.
Methods Mol Biol. 694:63-75. 2011.
Protein bioinformatics databases and resources.
Chen C, Huang H, Wu CH.
Methods Mol Biol. 694:3-24. 2011.
Ongoing and future developments at the Universal Protein Resource.
UniProt Consortium.
Nucleic Acids Res. 39(Database issue):D214-9. 2011.
The Protein Ontology: a structured representation of protein forms and complexes.
Natale DA, Arighi CN, Barker WC, Blake JA, Bult CJ, Caudy M, Drabkin HJ, D'Eustachio P, Evsikov AV, Huang H, Nchoutmboube J, Roberts NV, Smith B, Zhang J, Wu CH.
Nucleic Acids Res. 39(Database issue):D539-45. 2011.
Phylogenomic analysis of marine Roseobacters.
Tang K, Huang H, Jiao N, Wu CH.
PLoS One. 5(7):e11604. 2010.
Document classification for mining host pathogen protein-protein interactions.
Yin L, Xu G, Torii M, Niu Z, Maisog JM, Wu C, Hu Z, Liu H.
Artif Intell Med. 49(3):155-60. 2010.
Molecular mechanisms mediating the effect of mono-(2-ethylhexyl) phthalate on hormone-stimulated steroidogenesis in MA-10 mouse tumor Leydig cells.
Fan J, Traore K, Li W, Amri H, Huang H, Wu C, Chen H, Zirkin B, Papadopoulos V.
Endocrinology. 151(7):3348-62. 2010.
Prediction of Catalytic Residues in Proteins Using a Consensus of Prediction (CoP) Approach.
Petrova NV, Wu CH.
IEEE International Conference on Bioinformatics and Bioengineering, bibe, 226-231. 2010.
From protein sequences to 3D-structures and beyond: the example of the UniProt knowledgebase.
Hinz U; UniProt Consortium.
Cell Mol Life Sci. 67(7):1049-64. 2010.
Community annotation in biology.
Mazumder R, Natale DA, Julio JA, Yeh LS, Wu CH.
Biol Direct. 5:12. 2010.
Protein Bioinformatics Infrastructure for the Integration and Analysis of Multiple High-Throughput omics Data.
Chen C, McGarvey PB, Huang H, Wu CH.
Adv Bioinformatics. 2010; 2010:423589. 2010.
The Universal Protein Resource (UniProt) in 2010.
UniProt Consortium.
Nucleic Acids Res. 38(Database issue):D142-8. 2010.
Systems integration of biodefense omics data for analysis of pathogen-host interactions and identification of potential targets.
McGarvey PB, Huang H, Mazumder R, Zhang J, Chen Y, Zhang C, Cammer S, Will R, Odle M, Sobral B, Moore M, Wu CH.
PLoS One. 4(9):e7162. 2009.
Sequence signatures in envelope protein may determine whether flaviviruses produce hemorrhagic or encephalitic syndromes.
Barker WC, Mazumder R, Vasudevan S, Sagripanti JL, Wu CH.
Virus Genes. 39(1):1-9. 2009.
TGF-beta signaling proteins and the Protein Ontology.
Arighi CN, Liu H, Natale DA, Barker WC, Drabkin H, Blake JA, Smith B, Wu CH.
BMC Bioinformatics. 10 Suppl 5:S3. 2009.
BioTagger-GM: a gene/protein name recognition system.
Torii M, Hu Z, Wu CH, Liu H.
J Am Med Inform Assoc. 16(2):247-55. 2009.
InterPro: the integrative protein signature database.
Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Das U, Daugherty L, Duquenne L, Finn RD, Gough J, Haft D, Hulo N, Kahn D, Kelly E, Laugraud A, Letunic I, Lonsdale D, Lopez R, Madera M, Maslen J, McAnulla C, McDowall J, Mistry J, Mitchell A, Mulder N, Natale D, Orengo C, Quinn AF, Selengut JD, Sigrist CJ, Thimma M, Thomas PD, Valentin F, Wilson D, Wu CH, Yeats C.
Nucleic Acids Res. 37(Database issue):D211-5. 2009.
The Universal Protein Resource (UniProt) 2009.
UniProt Consortium.
Nucleic Acids Res. 37(Database issue):D169-74. 2009.
Integrated Bioinformatics for Radiation-Induced Pathway Analysis from Proteomics and Microarray Data.
Hu ZZ, Huang H, Cheema A, Jung M, Dritschilo A, Wu CH.
J Proteomics Bioinform. 1(2):47-60. 2008.
Protein Bioinformatics.
McGarvey P, Huang H, Wu CH.
in: Medical Applications of Mass Spectrometry. Part III Biomolecules, Chapter 10:203-222. K Vekey, A Telekes, A Vertes (Eds.) Elsevier Science. 2008.
An emerging cyberinfrastructure for biodefense pathogen and pathogen-host data.
Zhang C, Crasta O, Cammer S, Will R, Kenyon R, Sullivan D, Yu Q, Sun W, Jha R, Liu D, Xue T, Zhang Y, Moore M, McGarvey P, Huang H, Chen Y, Zhang J, Mazumder R, Wu C, Sobral B.
Nucleic Acids Res. 36(Database issue):D884-91. 2008.
Bioinformatic Databases.
Herbert KG, Spirollari J, Wang JTL, Piel WH, Westbrook J, Barker WC, Hu ZZ, Wu CH.
in: Wiley Encyclopedia of Computer Science and Engineering (Cassie Craig Assistant Editor), John Wiley & Sons, Ltd. 2007.
A comparison study on algorithms of detecting long forms for short forms in biomedical text.
Torii M, Hu ZZ, Song M, Wu CH, Liu H.
BMC Bioinformatics. 8 Suppl 9:S5. 2007.
Framework for a protein ontology.
Natale DA, Arighi CN, Barker WC, Blake J, Chang TC, Hu Z, Liu H, Smith B, Wu CH.
BMC Bioinformatics. 8 Suppl 9:S1. 2007.
Computational analysis and identification of amino acid sites in dengue E proteins relevant to development of diagnostics and vaccines.
Mazumder R, Hu ZZ, Vinayaka CR, Sagripanti JL, Frost SD, Kosakovsky Pond SL, Wu CH.
Virus Genes. 35(2):175-86. 2007.
Integration of bioinformatics resources for functional analysis of gene expression and proteomic data.
Huang H, Hu ZZ, Arighi CN, Wu CH.
Front Biosci. 12:5071-88. 2007.
UniRef: comprehensive and non-redundant UniProt reference clusters.
Suzek BE, Huang H, McGarvey P, Mazumder R, Wu CH.
Bioinformatics. 23(10):1282-8. 2007.
PIRSF family classification system for protein functional and evolutionary analysis.
Nikolskaya AN, Arighi CN, Huang H, Barker WC, Wu CH.
Evol Bioinform Online. 2:197-209. 2007.
Dependence network modeling for biomarker identification.
Qiu P, Wang ZJ, Liu KJ, Hu ZZ, Wu CH.
Bioinformatics. 23(2):198-206. 2007.
Comparative Bioinformatics Analyses and Profiling of Lysosome-Related Organelle Proteomes.
Hu ZZ, Valencia JC, Huang H, Chi A, Shabanowitz J, Hearing VJ, Appella E, Wu C.
Int J Mass Spectrom. 259(1-3):147-160. 2007.
New developments in the InterPro database.
Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Buillard V, Cerutti L, Copley R, Courcelle E, Das U, Daugherty L, Dibley M, Finn R, Fleischmann W, Gough J, Haft D, Hulo N, Hunter S, Kahn D, Kanapin A, Kejariwal A, Labarga A, Langendijk-Genevaux PS, Lonsdale D, Lopez R, Letunic I, Madera M, Maslen J, McAnulla C, McDowall J, Mistry J, Mitchell A, Nikolskaya AN, Orchard S, Orengo C, Petryszak R, Selengut JD, Sigrist CJ, Thomas PD, Valentin F, Wilson D, Wu CH, Yeats C.
Nucleic Acids Res. 35(Database issue):D224-8. 2007.
The Universal Protein Resource (UniProt).
UniProt Consortium.
Nucleic Acids Res. 35(Database issue):D193-7. 2007.
Proteomic and bioinformatic characterization of the biogenesis and function of melanosomes.
Chi A, Valencia JC, Hu ZZ, Watabe H, Yamaguchi H, Mangini NJ, Huang H, Canfield VA, Cheng KC, Yang F, Abe R, Yamagishi S, Shabanowitz J, Hearing VJ, Wu C, Appella E, Hunt DF.
J Proteome Res. 5(11):3135-44. 2006.
Substring selection for biomedical document classification.
Han B, Obradovic Z, Hu ZZ, Wu CH, Vucetic S.
Bioinformatics. 22(17):2136-42. 2006.
Quantitative assessment of dictionary-based protein named entity tagging.
Liu H, Hu ZZ, Torii M, Wu C, Friedman C.
J Am Med Inform Assoc. 13(5):497-507. 2006.
An online literature mining tool for protein phosphorylation.
Yuan X, Hu ZZ, Wu HT, Torii M, Narayanaswamy M, Ravikumar KE, Vijay-Shanker K, Wu CH.
Bioinformatics. 22(13):1668-9. 2006.
Prediction of catalytic residues using Support Vector Machine with selected protein sequence and structural properties.
Petrova NV, Wu CH.
BMC Bioinformatics. 7:312. 2006.
BioThesaurus: a web-based thesaurus of protein and gene names.
Liu H, Hu ZZ, Zhang J, Wu C.
Bioinformatics. 22(1):103-5. 2006.
The Universal Protein Resource (UniProt): an expanding universe of protein information.
Wu CH, Apweiler R, Bairoch A, Natale DA, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ, Mazumder R, O'Donovan C, Redaschi N, Suzek B.
Nucleic Acids Res. 34(Database issue):D187-91. 2006.
Computational identification of strain-, species- and genus-specific proteins.
Mazumder R, Natale DA, Murthy S, Thiagarajan R, Wu CH.
BMC Bioinformatics. 6:279. 2005.
Large-scale, classification-driven, rule-based functional annotation of proteins.
Natale DA, Vinayaka CR, Wu CH.
in: Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics, Part 4 Bioinformatics, Section 3 Protein Function and Annotation, Chpater 36. S Subramaniam (Ed.), John Wiley & Sons, Ltd. 2005.
DynGO: a tool for visualizing and mining of Gene Ontology and its associations.
Liu H, Hu ZZ, Wu CH.
BMC Bioinformatics. 6:201. 2005.
Literature mining and database annotation of protein phosphorylation using a rule-based system.
Hu ZZ, Narayanaswamy M, Ravikumar KE, Vijay-Shanker K, Wu CH.
Bioinformatics. 21(11):2759-65. 2005.
Plant protein annotation in the UniProt Knowledgebase.
Schneider M, Bairoch A, Wu CH, Apweiler R.
Plant Physiol. 138(1):59-66. 2005.
The PIR superfamily (PIRSF) classification system.
Barker WC, Mazumder R, Nikolskaya A, Wu CH.
in: Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics, Part 3 Proteomics, Section 6 Proteome Families, Chapter 87. MJ Dunn (Ed.), John Wiley & Sons, Ltd. 2005.
Protein name tagging guidelines: lessons learned.
Mani I, Hu Z, Jang SB, Samuel K, Krause M, Phillips J, Wu CH.
Comp Funct Genomics. 6(1-2):72-6. 2005.
InterPro, progress and status in 2005.
Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bradley P, Bork P, Bucher P, Cerutti L, Copley R, Courcelle E, Das U, Durbin R, Fleischmann W, Gough J, Haft D, Harte N, Hulo N, Kahn D, Kanapin A, Krestyaninova M, Lonsdale D, Lopez R, Letunic I, Madera M, Maslen J, McDowall J, Mitchell A, Nikolskaya AN, Orchard S, Pagni M, Ponting CP, Quevillon E, Selengut J, Sigrist CJ, Silventoinen V, Studholme DJ, Vaughan R, Wu CH.
Nucleic Acids Res. 33(Database issue):D201-5. 2005.
The Universal Protein Resource (UniProt).
Bairoch A, Apweiler R, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ, Natale DA, O'Donovan C, Redaschi N, Yeh LS.
Nucleic Acids Res. 33(Database issue):D154-9. 2005.
Family classification and integrative associative analysis for protein functional annotation.
Huang H, Nikolskaya AN, Vinayaka CR, Chung S, Zhang J, Wu CH.
in: Trends in Bioinformatics Research. Chapter II:33-57. PV Yan (Ed.), Nova Science Publishers, Inc. 2005.
Information flow and data integration of databanks.
Wu CH, Barker WC.
in: Database Annotation in Molecular Biology:Principles and Practice. III Database Design and Integration, Chapter 11:187-201. AM Lesk (Ed.), John Wiley & Sons, Ltd. 2005.
Annotation of protein sequences.
Wu CH, Barker WC.
in: Database Annotation in Molecular Biology:Principles and Practice. II The Basis for Annotation, Chapter 8:131-147. AM Lesk (Ed.), John Wiley & Sons, Ltd. 2005.
iProLINK: an integrated protein resource for literature mining.
Hu ZZ, Mani I, Hermoso V, Liu H, Wu CH.
Comput Biol Chem. 28(5-6):409-16. 2004.
Gene and protein profiling of the response of MA-10 Leydig tumor cells to human chorionic gonadotropin.
Li W, Amri H, Huang H, Wu C, Papadopoulos V.
J Androl. 25(6):900-13. 2004.
A family classification approach to functional annotation of proteins.
Wu CH, Barker WC.
in: The Practical Bioinformatician. Chapter 19:417-434. L Wong (Ed.), World Scientific. 2004.
Update on genome completion and annotations: Protein Information Resource.
Wu C, Nebert DW.
Hum Genomics. 1(3):229-33. 2004.
The iProClass integrated database for protein functional analysis.
Wu CH, Huang H, Nikolskaya A, Hu Z, Barker WC.
Comput Biol Chem. 28(1):87-96. 2004.
Protein sequence databases.
Apweiler R, Bairoch A, Wu CH.
Curr Opin Chem Biol. 8(1):76-80. 2004.
UniProt: the Universal Protein knowledgebase.
Apweiler R, Bairoch A, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ, Natale DA, O'Donovan C, Redaschi N, Yeh LS.
Nucleic Acids Res. 32(Database issue):D115-9. 2004.
PIRSF: family classification system at the Protein Information Resource.
Wu CH, Nikolskaya A, Huang H, Yeh LS, Natale DA, Vinayaka CR, Hu ZZ, Mazumder R, Kumar S, Kourtesis P, Ledley RS, Suzek BE, Arminski L, Chen Y, Zhang J, Cardenas JL, Chung S, Castro-Alvear J, Dinkov G, Barker WC.
Nucleic Acids Res. 32(Database issue):D112-4. 2004.
Protein family classification and functional annotation.
Wu CH, Huang H, Yeh LS, Barker WC.
Comput Biol Chem. 27(1):37-47. 2003.
iProClass: an integrated database of protein family, function and structure information.
Huang H, Barker WC, Chen Y, Wu CH.
Nucleic Acids Res. 31(1):390-2. 2003.
The Protein Information Resource.
Wu CH, Yeh LS, Huang H, Arminski L, Castro-Alvear J, Chen Y, Hu Z, Kourtesis P, Ledley RS, Suzek BE, Vinayaka CR, Zhang J, Barker WC.
Nucleic Acids Res. 31(1):345-7. 2003.
Accomplishments and challenges in literature data mining for biology.
Hirschman L, Park JC, Tsujii J, Wong L, Wu CH.
Bioinformatics. 18(12):1553-61. 2002.
The Protein Information Resource: an integrated public resource of functional annotation of proteins.
Wu CH, Huang H, Arminski L, Castro-Alvear J, Chen Y, Hu ZZ, Ledley RS, Lewis KC, Mewes HW, Orcutt BC, Suzek BE, Tsugita A, Vinayaka CR, Yeh LS, Zhang J, Barker WC.
Nucleic Acids Res. 30(1):35-7. 2002.
iProClass: an integrated, comprehensive and annotated protein classification database.
Wu CH, Xiao C, Hou Z, Huang H, Barker WC.
Nucleic Acids Res. 29(1):52-4. 2001.
Protein Information Resource: a community resource for expert annotation of protein data.
Barker WC, Garavelli JS, Hou Z, Huang H, Ledley RS, McGarvey PB, Mewes HW, Orcutt BC, Pfeiffer F, Tsugita A, Vinayaka CR, Xiao C, Yeh LS, Wu C.
Nucleic Acids Res. 29(1):29-32. 2001.
PIR: a new resource for bioinformatics.
McGarvey PB, Huang H, Barker WC, Orcutt BC, Garavelli JS, Srinivasarao GY, Yeh LS, Xiao C, Wu CH.
Bioinformatics. 16(3):290-1. 2000.
ProClass protein family database.
Huang H, Xiao C, Wu CH.
Nucleic Acids Res. 28(1):273-6. 2000.
The protein information resource (PIR).
Barker WC, Garavelli JS, Huang H, McGarvey PB, Orcutt BC, Srinivasarao GY, Xiao C, Yeh LS, Ledley RS, Janda JF, Pfeiffer F, Mewes HW, Tsugita A, Wu C.
Nucleic Acids Res. 28(1):41-4. 2000.
The PIR-International Protein Sequence Database.
Barker WC, Garavelli JS, McGarvey PB, Marzec CR, Orcutt BC, Srinivasarao GY, Yeh LS, Ledley RS, Mewes HW, Pfeiffer F, Tsugita A, Wu C.
Nucleic Acids Res. 27(1):39-43. 1999.


Revised 04/06/2011

PIR
 HomeAbout PIRDatabasesSearch/AnalysisDownloadSupport  SITE MAPTERMS OF USE
©2009 Protein Information Resource