iProLINK Tutorial

iProLINK (integrated Protein Literature, INformation and Knowledge) has been developed as a resource to facilitate text mining in the area of literature-based database curation, named entity recognition, and protein ontology development. The collection of data sources can be utilized by computational and biological researchers to explore literature information on proteins and their features or properties (Hu et al., 2004).

Bibliography Mapping/
Annotation Extraction

Bibliography Display/Submission
Annotation-Tagged Corpora
RLIMS-P Text Mining Tool
eFIP Text Mining Tool
eGIFT Text Mining Tool
iSimp Sentence Simplification System

Entity Recognition/
Ontology Development

BioThesaurus Name Mapping
Name Tagging Guidelines/Corpora
Protein Ontology Development

iProLINK Paper
  • The data sources for bibliography mapping and feature evidence attribution include mapped citations (PubMed ID to protein entry and feature line mapping) and annotation-tagged literature corpora. The latter includes several hundred abstracts and full-text articles tagged with experimentally validated post-translational modifications (PTMs) annotated in the PIR protein sequence database.

  • The data sources for entity recognition and ontology development include protein name dictionaries,word token dictionaries, protein name-tagged literature corpora along with tagging guidelines, and a protein ontology based on PIRSF protein family names.

  • iProLINK also provides tools developed using PIR data sources, e.g. RLIMS-P for text mining of protein phosphorylation and BioThesaurus for mapping protein/gene names to UniProtKB entries.
  • The iProLINK is partly supported by the NSF grants, the Protein Ontology project ITR-0205470 and the BioTagger project IIS-0639062.

