CBSB: Cancer Bioinformatics and Systems Biology

 Pathway and Network Modeling for Cancer Molecular Targets













Cancers are complex diseases and are mainly defined by uncontrolled growth, immortality and invasion of other tissues. Cancers result from many alterations in cells at genetic, epigenetic, transcriptomic, proteomic, and metabolomic levels, and are manifested in changes of cell signaling, gene regulation, metabolism, cell division and many other fundamental cell processes. Post-genome era high-throughput data of genome sequences, global gene expression of coding genes and microRNA, protein expression and posttranslational modifications, etc. provide unprecedented opportunities for studying mechanisms underlying cancer development and progression and for identifying molecular targets for developing diagnostics and therapeutics treatment, while also present huge challenges in analyzing and interpreting the large-scale omics data as well as managing them.

Bioinformatics and systems biology are emerging fields that integrate computer sciences and technologies into areas of biomedical sciences and hold great promise in meeting these challenges. Due to the interdisciplinary nature of bioinformatics and systems biology, we stride to bridge the biological and computational communities by providing expertise in analyzing various omics data either generated from collaborating laboratories or identified from public repositories, and by conducting computational biology research and developing tools to facilitate the basic cancer research. Consequently, we provide functional interpretation of large-scale omics data from biological pathway and network perspectives, and generate hypothesis on molecular targets or biomarkers that can be tested in the laboratory.


Our research interests are to better understand complex cell signaling pathways and regulatory networks in cancer vs. normal cells using bioinformatics and computational biology approaches, particularly through integration and analysis of modern -omics data. Our long term goal is to conduct systems biology and translational research through pathway and network-based modeling to identify molecular targets for diagnostics and therapeutics development as apply to common diseases such as cancer, endocrine or immune disorders. Current research at CBSB includes:

1. Estrogen-induced apoptosis in breast cancer cells. An experimental model of 17-b-estradiol induced apoptosis in breast cancer cells is being exploited for design of new hormone therapeutic strategy for anitestrogen drug resistant breast cancers. Signaling pathways and protein networks of estrogen-induced apoptosis vs. growth are being modeled using bioinformatics approaches based on genomic and proteomic data generated from breast cancer cell lines.

2. Radiation-induced DNA damage repair. ATM serine-threonine protein kinase plays critical roles in radiation responses, including cell cycle arrest and DNA repair. Fibroblasts expressing wild type or mutant ATM gene are used as an experimental model to construct the ATM and p53 mediated signaling and metabolic pathways and networks for identification of molecular targets conferring sensitivity or resistance of cancer cells to radiation therapy.

3. Protein modification networks. Cellular signal transduction and protein-protein interactions are largely carried out through specific protein modifications, such as phosphorylation, glycosylation, acetylation, methylation, ubiquitination and so on. Modified proteins, modifying enzymes and those for removing the modifications form intricate protein modification networks in controlling specific cellular processes. Changes in modification networks in cancer cells are of particular interest to the identification of molecular targets. We are particularly interested in the dynamic interplay between protein O-GlcNAcylation and phosphorylation in regulation of cell signaling, which form intricate network, called nPGAP (network of Protein O-GlcNAcylation and Phosphorylation). To this end, we recently developed a database of O-GlcNAcylatedd proteins and sites (dbOGAP), which serves as a nuclear for our exploration of roles of nPGAP in cell signaling.

4. Tissue-specificity of cancer signaling pathways and networks. Cell signaling pathways show certain degrees of redundancy as well as tissue-specificity. The goal of this work is to identify gene or protein biomarkers based on pathway tissue-specificity information. Cell type specific pathways will be modeled based on tissue expression and interaction patterns of pathway proteins, including known cancer biomarkers and their spliced isoforms.

5. Text mining and ontology. Literature text mining plays increasingly important roles in systems biology through developing algorithms and tools for systematically mining experimental data and constructing structured knowledgebases. Of fundamental importance of biological text mining are recognition of gene or protein entities in text and extraction of related information such as protein-protein interactions, protein modifications, and associated cell types, biological pathways, phenotypes, etc. Biological Ontologies refer to formal representation of biological concepts or entities using controlled vocabularies and defining relations between concepts or entities, such as in Gene Ontology and Protein Ontology. Biological Ontologies are used for heterogeneous data integration and for more intelligent data query and reasoning.  



  • Dr. Zhang-Zhi Hu (Department of Oncology)

Postdoctoral fellow

  • Dr. Jinlian Wang (Department of Oncology)

Collaborating faculty

  • Dr. Hongfang Liu (Department of Biostatistics, Biostatistics and Biomathematics)

  • Dr. Manabu Torii (Department of Radiology, ISIS)

Other collaborations

Bioinformatics and systems biology are highly interdisciplinary and demand collaborations across biological sciences, computational, statistical and mathematical sciences. We have a broad base of collaborations among the Medical Center researchers particularly at the Lombardi Cancer Center.

Bioinformatics and computational biology

  • Dr. Cathy Wu (Protein Information Resource)

  • Dr. Hongzhan Huang (Protein Information Resource)

Cancer biology and cancer diagnostic and therapeutics at Lombardi

  • Dr. Anton Wellstein 

  • Dr. Anna Riegel 

  • Dr. V. Craig Jordan 

  • Dr. Anatoly Dritschilo 

  • Dr. Rado Goldman 

  • Dr. Yun-Ling Zheng  


Protein annotations, sequence and -omic data analysis

1. dbOGAP( of O-GlcNAcylated proteins and sites and O-GlcNAclation site prediction

2. iProXpress ( an integrated bioinformatics system for gene or protein related omics data analysis.

3. PIR ( the Protein Information Resource that provides integrated protein informatics resource for genomics and proteomics research.

4. UniProt ( the Universal Protein Resource (UniProt) that provides the scientific community with a single, centralized, authoritative resource for protein sequences and functional information.

Biomedical literature text mining

1. iProLINK ( an integrated protein literature mining resource of annotated literature data sets and text mining tools.

2. RLIMS-P ( a rule-based literature mining system for information extraction of protein phosphorylation data from MEDLINE abstracts.

3. BioThesaurus ( a comprehensive collection of gene and protein names from multiple sources for resolving synonyms and ambiguous names.

Selected publications

1. Torii M, Liu H, Hu ZZ. (2009) Support vector machine-based mucin-type O-glycosylation site prediction using enhanced sequence feature encoding. J Am Med Inform Assoc, 2009. Accepted.

2. Torii M, Hu ZZ, Wu C, and Liu H (2009) BioTagger-GM: A gene/protein name recognition system. JAMIA 16:247-255.

3. Hu ZZ, Cohen KB, Hirschman L, Valencia A, Liu H, Giglio MG, Wu CH. iProLINK: A Framework for Linking Text Mining with Ontology and Systems Biology. In Proceedings of IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2008), Philadelphia, 2008

4. Hu ZZ, Huang H, Cheema A, Jung M, Dritschilo A, Wu CH. (2008) Integrated Bioinformatics for Radiation-Induced Pathway Analysis from Proteomics and Microarray Data. J Proteomics & Bioinformatics 1(2):047-060.

5. Torii M, Hu ZZ, Song M, Wu CH, Liu H. (2007) A Comparison study on algorithms of detecting long forms for short forms in biomedical text. BMC Bioinformatics, 8 Suppl 9:S5

6. Natale DA, Arighi CN, Barker W, Blake J, Chang TC, Hu ZZ, Liu H, Smith B, Wu CH. (2007) Framework for a Protein Ontology. BMC Bioinformatics, 8 Suppl 9:S1

7. Huang H, Hu ZZ, Arighi CN, Wu CH. (2007) Integration of bioinformatics resources for functional analysis of gene expression and proteomic data. Frontiers in Bioscience, 12:5071-5088

8. Qiu P, Wang J, Ray Liu KJ, Hu ZZ, Wu CH. (2007) Dependence network modeling for biomarker identification. Bioinformatics 23:198-206.

9. Hu ZZ, Valencia JC, Huang H, Chi A, Shabanowitz J, Hearing VJ, Appella E, Wu CH. (2007) Comparative bioinformatics analyses and profiling of lysosome-related organelle proteomes. Int J Mass Spec 259:147-160

10. Chi A, Valencia JC, Hu ZZ, Watabe H, Yamaguchi H, Mangini NJ, Huang H, Canfield VA, Cheng KC, Yang F, Abe R, Yamagishi S, Shabanowitz J, Hearing VJ, Wu C, Appella E, Hunt DF. (2006) Proteomic and Bioinformatic Characterization of the Biogenesis and Function of Melanosomes. J Proteome Res 5:3135-3144.

11. Han B, Obradovic Z, Hu ZZ, Wu CH, Vucetic S. (2006) Substring selection for biomedical document classification. Bioinformatics 22:2136-42.

12. Liu H, Hu ZZ, Torii M, Wu C, Friedman C. (2006) Quantitative Assessment of Dictionary-based Protein Named Entity Tagging. J Am Med Inform Assoc 13:497-507.

13. Yuan X, Hu ZZ, Wu HT, Torii M, Narayanaswamy M, Ravikumar KE, Vijay-Shanker K, and Wu CH. (2006) An Online Literature Mining Tool for Protein Phosphorylation. Bioinformatics 22:1668-9.

14. Liu H, Hu ZZ, Wu CH (2006). BioThesaurus: a web-based thesaurus of protein and gene names. Bioinformatics 22: 103-105.

15. Liu H, Hu ZZ, Wu CH (2005). DynGO: a tool for visualizing and mining of Gene Ontology and its associations. BMC Bioinformatics 6: 201.

16. Hu ZZ, Narayanaswamy M, Ravikumar KE, Vijay-Shanker K, Wu CH (2005). Literature Mining and Database Annotation of Protein Phosphorylation Using a Rule-Based System. Bioinformatics 21(11): 2759-2765.

17. Hu ZZ, Mani I, Hermoso V, Liu H and Wu CH (2004). iProLINK: an integrated protein resource for literature mining. Comput Biol Chem 28: 409-416.



Currently there is one position open at the Hu Lab.

3300 Whitehaven Street, NW, Suite 1200, Washington, DC 20007