Identity Mapping Service (IMS)
Open PHACTS integrates information from multiple different databases, many of which use unique identifiers. The Identity Mapping Service (IMS) ensures these identifiers are linked and available for use interchangeably throughout the Open PHACTS Discovery Platform.
Mapping identities across datasets is a nontrivial task, and is both user- and task-dependent. The IMS supports the creation of different scientific profiles which would control equality relationships. For instance, genes can, in some contexts, be used as protein proxies. This is not appropriate in other cases, where they must be kept distinct. Similarly, functionality to allow the option of ignoring molecule stereochemistry requires the association of distinct chemical identifiers.
The IMS has been implemented as an extension to BridgeDB and can be used through an independent query expansion service or directly in the Open PHACTS Discovery Platform.
Identity Resolution Service (IRS) and ConceptWiki
The Identity Resolution Service is provided by ConceptWiki and helps maintain vocabulary heterogeneity and provides interoperability using open and extensible standards. The need for a large-scale, community-editable store of disambiguated scientific concepts was important in ensuring Open PHACTS project sustainability. ConceptWiki is an open access system that accepts essentially unlimited numbers of synonyms, in multiple languages, and then maps all the terms correctly back to one unique concept identifier, alleviating vocabulary problems and identifier differences. The concept to vocabulary mappings are provided as linksets to the IMS (see below).
ConceptWiki supports the distinction between ‘authority’ and ‘community’ data and permits general editing only on the community data branch. This highly innovative distinction convinces authorities of the prudence of donating and integrating their data into the system. Additionally, highlighting data from authority and community branches allows personal value judgements on displayed data. Currently, ConceptWiki contains the biomedical terminology of the Unified Medical Language System mapped where appropriate to the protein terminology from Swiss-Prot, the chemical terminology from ChemSpider for biologically relevant chemical molecules and comprehensive names for ChEMBL target classification. Each concept within ConceptWiki is annotated with one or more semantic types and basic information such as a definition.