The research leading to these results has received support from the
Innovative Medicines Initiative (IMI)
Joint Undertaking under grant agreement n° 115191, resources of which
are composed of financial contribution from the European Union's
Seventh Framework Programme (FP7/2007-2013) and EFPIA companies’ in
kind contribution.
This dataset and linkset description specification is intended for data providers who want to publish their data as RDF, and link it to other datasets. A basic knowledge of RDF is assumed.
Details for converting a dataset and publishing it as RDF are given in the Open PHACTS RDF guidelines specification [[OPS-RDF]].
The Open PHACTS platform [[OPS-ARCH]] relies on data and the interlinks published by a variety of sources. For example, details of chemicals are derived from ChemSpider, ChEMBL, and DrugBank. This specification provides details of the metadata expected to describe the datasets and the links that relate the instances in those datasets.
The Open PHACTS project has produced a set of guidelines aimed at data providers for publishing their data within the OPS system [[!OPS-RDF]]. The RDF guide provides details about modelling your data as RDF. This specification builds on the RDF Guidelines by defining the metadata that should be published to describe the dataset and the links to other datasets.
The dataset description defined in this specification declares the properties that should be included in the description of dataset or its links. The information is exchanged using the Vocabulary of Interlinked Datasets [[VOID]]. The VoID Editor can be used to create dataset descriptions, although in general they should be generated as part of the data creation pipeline.
All examples in this document are written in the Turtle RDF syntax [[TURTLE]]. Throughout the document, the following namespaces are used:
@prefix dcterms: <http://purl.org/dc/terms/> . @prefix dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix freq: <http://purl.org/cld/freq/> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix pav: <http://purl.org/pav/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix skos: <http://www.w3.org/2004/02/skos/core#> . @prefix voag: <http://voag.linkedmodel.org/schema/voag#> . @prefix void: <http://rdfs.org/ns/void#> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
Furthermore, we assume that the empty prefix is bound to the base URL of the current file like this:
@prefix : <#> .
This allows us to quickly mint new identifiers in the local namespace.
In this section, we introduce the vocabularies that will be used for capturing the dataset descriptions and the mappings between the datasets.
The Vocabulary of Interlinked Datasets (VoID) [[!VOID]] is a W3C interest group note that specifies a vocabulary for describing the metadata about a dataset and its relationship with other datasets. The vocabulary builds upon existing metadata vocabularies, e.g. Dublin Core Terms [[DCTERMS]], and captures four categories of metadata:
The VoID specification [[VOID]] defines a dataset to be, "a set of RDF triples that are published, maintained or aggregated by a single provider." The dataset itself may contain logical subsets, which can be captured in the VoID description of the dataset, e.g. ChEMBL can be split into subsets for compounds, targets, etc.
The information captured about a dataset focuses on the general metadata, access, metadata, and the structural metadata.
The VoID specification [[VOID]] defines a link to be, "an RDF triple whose subject and object are described in different datasets." The links capture the mapping of an identifier in one dataset to be related to an identifier in another dataset. VoID is agnostic to the relationship to use: possible predicates are given in the next section.
The VoID specification defines a linkset to be, "a collection of such RDF links between two datasets." The linkset captures details of the links, i.e. the datasets that are linked and the relationship, as well as the metadata associated with the links, e.g. provenance information about who created the mapping and the specific versions of the datasets related. The VoID specification enables a separation between (1) the datasets involved in a linkset, and (2) who publishes the linkset.
Note that a VoID linkset is defined to link two datasets via
a single link predicate
(void:linkPredicate).
As such, there can
exist multiple linksets relating the same pair of datasets, as
illustrated in the figure below. The
figure depicts four distinct linksets: two sourced from ChemSpider
depicted in blue which use different link predicates; one sourced
from ChEMBL depicted in red; and one sourced from a third party
depicted in green. Each of the linksets uses a different link
relationship. Those shown with a double arrow head are symmetric
while those with just a single arrow are directional links.
A mapping is expressed as a VoID link, i.e. it is an RDF triple that relates an identifier in one dataset with an identifier in another dataset with some predicate which provides the meaning of the mapping. A justification for the mapping, e.g. two chemical compounds are deemed equivalent as they have the same InChI key, is expressed with the link in the linkset metadata.
The mapping predicate captures the way in which the two
identifiers are related. The mapping should respect the
semantics of the relationship, e.g. the
owl:sameAs
relationship must only be used when the two identifiers are
completely interchangeable.
Standard, and widely used, generic mapping relationships are given in the table below. (A fuller mapping ontology is given in [[Halpin2010]], but it is expected that the main relationships used will be those given in the table.)
| Relationship | Description | Properties |
|---|---|---|
rdfs:seeAlso
|
General link, that indicates that the resource linked to is relevant to the subject. See http://www.w3.org/TR/rdf-schema/#ch_seealso. | |
skos:relatedMatch
|
This link indicates that the linked resources are in some way associated. See http://www.w3.org/TR/skos-reference/#mapping. | Symmetric |
skos:closeMatch
|
This link indicates that the linked resources are the same, under some assumptions or applications. See http://www.w3.org/TR/skos-reference/#mapping. | Symmetric |
skos:exactMatch
|
This link indicates that the linked resources are the same, under the assumptions of most applications. See http://www.w3.org/TR/skos-reference/#mapping. | Transitive Symmetric |
owl:sameAs
|
This link indicates that the linked resources are the same under all assumptions and can be used interchangeably. Note that if this link is used for classes, then reasoning tasks will fall under OWL Full semantics. See http://www.w3.org/TR/2009/REC-owl2-quick-reference-20091027/#Axioms. | Transitive Symmetric |
owl:equivalentClass
|
This link indicates that the linked resources (which are both classes in some ontology) are the same. See http://www.w3.org/TR/2009/REC-owl2-quick-reference-20091027/#Axioms. | Transitive Symmetric |
Note that there are domain specific relationship that may also be
used to express the relationship between two data instances. For
example, ChEBI provides a set of relationships including for
example
http://purl.obolibrary.org/obo#has_part
and
http://purl.obolibrary.org/obo#is_conjugate_acid_of.
A key feature of the Open PHACTS platform is its ability to allow multiple views of the linked data which is achieved by applying scientific lenses over the data [[OPS-LENSES]]. For example, when performing an early stage exploratory task it is desirable to retrieve as much data as possible and as such the system enables the use of a lens whereby compounds are matched on their structural skeleton. Once the research has progressed further, the need for stricter relationships becomes apparent and the structure lens is switched for one which matches compounds on their full chemical structure, i.e. including their charges and stereo-chemistry.
The ability to classify linksets for use under different scientific lenses relies on the justification given in the linkset. A set of vocabulary terms for providing justifications are given in Appendix A.3.
The following example declares that the ChemSpider α-Ketoisovaleric acid concept with the CSID 48 shares many properties with the ChEMBL 3-Methyl-2-oxobutanoic acid concept with the ChEMBL-RDF ChEMBL ID CHEMBL146554. The relationship is drawn from ChemSpider based on the compounds sharing the same structure based on their InChI Keys (in this case the compound has the same InChI Key QHKABHOOEWYVLI-UHFFFAOYSA-N in both data sets.). Only the triples directly related to declaring the link are given in the example.
@prefix chembl: <http://linkedchemistry.info/chembl/chemblid/> .
:cs2chembl_inchi void:linkPredicate skos:exactMatch .
:cs2chembl_inchi dul:expresses <http://semanticscience.org/resource/CHEMINF_000059> .
<http://rdf.chemspider.com/48> skos:exactMatch chembl:CHEMBL146554 .
We assume that an RDF representation of the dataset has been generated according to the Open PHACTS RDF Guidelines [[OPS-RDF]]. This section describes the information that must appear in the VoID description for the dataset.
It is recommended that the generation of the VoID description for a dataset is carried out as part of the creation of the RDF version of the dataset. To help understand some of the principles of the VoID description, that is to help you get started there is a VoID Generator which supports the creation of dataset descriptions.
void:Dataset.
For more details see Section 1.3 of the VoID Specification.
dcterms:title predicate.
For more details see Section 2.2 of the VoID Specification.
dcterms:description predicate.
For more details see Section 2.2 of the VoID Specification.
foaf:homepage predicate.
foaf:page predicate.
An example of an external page that may be linked is the bioDBcore [[BioDBcore]] entry, see Section 7.2.
For more details see Section 2.1 of the VoID Specification.
dcterms:license predicate.
Where possible, we recommend publishing data under an open license to enable reuse of the data with appropriate acknowledgement. For Open PHACTS, the recommended license is CC-BY-SA. A list of alternative licenses are available in Section 2.4 of the W3C note on VoID.
For more details see Section 2.4 of the VoID Specification.
void:uriSpace predicate.
Note that the object of the void:uriSpace predicate
is a literal, not a URI.
For more details see Section 4.2 of the VoID Specification.
pav:version predicate, where this exists for
the dataset.
pav:previousVersion predicate.
Not all dataset have the notion of a version number. Where the dataset does have this notion, it must be given in the description. For other datasets, the versioning information will be inferred by the last modified date provided as part of the provenance information.
Details of the origin of the data must be provided using one of the following groups of predicates.
dcterms:publisher to capture who
published the data.
dcterms:created to capture when the
dataset was created.
dcterms:modified to capture when the
last update was made.
dcterms:created and
dcterms:modified to be declared. This ensures that
we know which version of the dataset has been used.
pav:retrievedFrom to capture the
location of the original source.
pav:retrievedOn to capture when the
retrieval was made.
pav:retrievedBy to capture who made the
retrieval.
pav:importedFrom to capture the
location of the original source.
pav:importedOn to capture when the
import was made.
pav:importedBy to capture who made the
import.
pav:derivedFrom to capture the
relationship to the original data source.
pav:createdOn to capture when the
dataset was made.
pav:createdBy to capture who made the
dataset.
Note that the object of the pav:xxxBy predicate is
a URI.
We currently do not require a specific form of URI for capturing the details of a person or entity. Future versions of this specification may recommend the use of ORCID Identifiers [[ORCID]].
pav:createdWith predicate.
void:subset predicate.
Metadata that is common between the subsets can be declared for the parent only. However, the subsets allow for more specific linking between datasets and for providing details of the subject.
In the VoID note [[VOID]], the void:subset property
is used for both subset of and has subset. The
declared semantics for the property is has subset. Within
Open PHACTS, the property must be used with the has subset
semantics.
For more details see Section 4.4 of the VoID Specification.
void:vocabulary predicate.
For more details see Section 4.3 of the VoID Specification.
dcterms:subject predicate.
Multiple topics can be declared. If the data is split into subsets, then the topics should be associated with the subsets. BioPortal [[BioPortalWeb]] [[BioPortal]] can be used to search for suitable vocabulary terms for topics. DBPedia URIs may also be used. A list of common terms relevant for Open PHACTS is given in Appendix A.1.
For more details see Section 2.5 of the VoID Specification.
void:exampleResource predicate.
Multiple resources can be declared. If the data is split into subsets, then the example resources should be associated with the subsets.
For more details see Section 4.1 of the VoID Specification.
void:dataDump predicate.
For more details see Section 3.3 of the VoID Specification.
void:sparqlEndpoint predicate.
For more details see Section 3.2 of the VoID Specification.
voag:frequencyOfChange predicate together with
a frequency term drawn from the
Frequency Vocabulary.
The terms from the Frequency Vocabulary have been reproduced in Appendix A.2.
Other metadata that can be associated with a dataset may be included, see the VoID specification [[VOID]] for additional properties that may be incorporated. The dataset may also have a provenance graph associated with it providing more detailed information about the creation, authorship, and derivation of the dataset. This provenance graph should be expressed using the W3C Provenance Ontology [[PROV-O]].
Example VoID dataset descriptions can be found in Appendix B. The first example is for the ChemSpider dataset (see Appendix B.1). Note this example is derived from the existing non-conformant ChemSpider VoID description available from http://rdf.chemspider.com/void.rdf.
The second example is for the RDF representation of the ChEMBL database, given in Appendix B.2. This demonstrates the level of information required to track the provenance from a source dataset through to the RDF representation. It also contains subset definitions.
A linkset is itself a dataset, and as such should provide the metadata about its content and how it was created. The metadata associated with a link is essential for enabling its reuse by others. It enables a consumer of the link to understand which datasets are linked (including which version), who claimed the link, under what circumstances, and which (if any) tools were used to generate the link (e.g. [[SILK]]).
void:Linkset.
For more details see Section 1.4 of the VoID Specification.
dcterms:title predicate.
For more details see Section 2.2 of the VoID Specification.
dcterms:description predicate.
For more details see Section 2.2 of the VoID Specification.
Note that for linksets there is unlikely to be a web page directly dedicated to the linkset, as such that part of the dataset metadata is not stated as a requirement.
dcterms:license predicate.
Where possible, we recommend publishing data under an open license to enable reuse of the data with appropriate acknowledgement. For Open PHACTS, the recommended license is CC-BY-SA. A list of alternative licenses are available in Section 2.4 of the W3C note on VoID.
The license under which a linkset is published may be different from that of the datasets that it links, even if it is a subset of one of the datasets.
For more details see Section 2.4 of the VoID Specification.
void:subjectsTarget and
void:objectsTarget predicates.
Note that the object of the above predicates are the URIs of the respective datasets as declared in a VoID description. The datasets may themselves be a subset of a larger dataset. Where the datasets do not provide a VoID description, the minimal required information must be provided in the linkset description. This is detailed in Section 5.5.
For more details see Section 5.1 of the VoID Specification.
void:linkPredicate predicate.
For more details see Section 5.3 of the VoID Specification.
dul:expresses predicate and an object
from an open vocabulary.
The link justification provides the reason/circumstance why instances in the two datasets can be considered equivalent. For example, two chemical compounds have the same InChI Key or have the same chemical formula.
Need to ensure that by using dul:expresses we are not
inferring other contradictory consequences.
Currently recommended vocabulary terms for the justification can be found in Appendix A.3.
void:subset predicate.
Note that the dataset URI is the subject of the
void:subset triple and the linkset URI is the object.
If the dataset VoID description contains the declaration, there is
no need to repeat it in the linkset document. However, the linkset
documents can be used to declare additional subsets. A subset
inherits properties from its parent.
For more details see Section 5.2 of the VoID Specification.
pav:authoredBy to capture who
made the assertions encoded in the links.
pav:authorOn to capture when
the assertions were made.
pav:createdWith.
Additional information about the settings used when running an automated link generation tool such as Silk [[SILK]] can be captured in an associated provenance graph encoded using PROV-O [[PROV-O]].
pav:createdBy.
pav:createdOn predicate.
In this version of the specification, it is assumed that the creator of the linkset has accessed the most recent version of the dataset.
pav:version predicate.
pav:previousVersion predicate.
void:triples predicate.
Providing the number of triples included in the linkset allows for applications using the linkset to validate that the entire linkset has been successfully loaded.
For more details see Section 4.6 of the VoID Specification.
A linkset should point to the dataset descriptions of the datasets that it uses. These descriptions should be provided by the dataset provider as part of the dataset publishing process [[OPS-RDF]]. However, there are occasions when one or both of the linked datasets do not provide a VoID dataset description. The following set of properties must then be provided in the linkset document. Other properties may also be given.
Example VoID linkset descriptions can be found in Appendix C. The first example shows how existing dataset descriptors can be reused (see Appendix C.1). The second example shows the declaration of a dataset's metadata in the linkset file (see Appendix C.2).
The two preceeding sections have prescribed the metadata required to describe datasets and the linksets that inter-relate them. This section outlines the expected deployment and exchange mechanisms and should be read in conjunction with Section 6 of the VoID specification for more details.
VoID documents describing datasets and linksets must contain a metadata block describing the VoID document using the following properties:
void:DatasetDescription.
pav:createdBy.
pav:createdOn.
pav:lastUpdateOn.
foaf:primaryTopic.
Of course, other properties may also be declared, e.g. a title using
dcterms:title. For more details, see
Section 6.2 of
the VoID specification [[VOID]].
An example is given below for the ChemSpider deployment. Note the use of an empty-string relative URI (<>) as a syntactic shortcut for the URI of the document that contains the statements.
<> a void:DatasetDescription ;
dcterms:title "ChemSpider VoID Description"^^xsd:string ;
pav:createdBy <http://www.chemspider.com/> ;
pav:createdOn "2012-05-02T13:50:34Z"^^xsd:dateTime;
pav:lastUpdateOn "2012-08-10T13:52:12Z"^^xsd:dateTime;
foaf:primaryTopic :chemSpiderDataset .
Several mechanisms for deploying VoID descriptions are given in Section 6 of the VoID Note [[VOID]].
For datasets that are being published in their own domain, with
dereferenceable URIs, then we recommend placing the dataset's
VoID description in the root directory in a file called
void.ttl, with a local "hash URI" for the dataset
(and any subsets). For example, for ChemSpider we would have a
URI such as
http://rdf.chemspider.com/void.ttl#chemSpiderDataset.
When the code is being hosted on an external service, then the
VoID descriptor should be provided in the dataset's home
directory. For example, for the RDF encoding of ChEMBL this
would be
http://linkedchemistry.info/chembl/void.ttl#chembl-rdf.
Examples of the VoID descriptors are given in
Appendix A.
Each RDF document containing the data should then contain a backlink to the dataset descriptor. For example, the ChEMBL-RDF molecule m1, there would be the triple:
<http://linkedchemistry.info/chembl/chemblid/molecule/m1> void:inDataset
<http://linkedchemistry.info/chembl/chemblid/void.ttl#chembl-rdf_compounds> .
For the purposes of Open PHACTS, it is anticipated that linksets will be materialised as separate documents from the datasets. This is to allow their loading into the identity mapping service [[IMS]]. These linkset files will contain the metadata about the linkset as well as the links.
It is also permitted to separate the link triples from the
linkset metadata. In this case, the file containing the links
must provide a link back to the linkset desription using the
void:inDataset predicate. The example below
shows how a set of links can refer back to the linkset
description given in the ChEMBL-RDF VoID description file.
<> void:inDataset <http://linkedchemistry.info/chembl/chemblid/void.ttl#chembl-rdf_targets-uniprot-linkset> . <http://linkedchemistry.info/chembl/target/t1> skos:exactMatch <http://purl.uniprot.org/uniprot/O43451> . ...
Tooling being developed within Open PHACTS should support the
predicates stated in this document. However, they should also be
able to read VoID files from external sources that do not comply
completely with this specification, but do comply with the VoID
standard [[VOID]]. An example would be the use of the
void:target predicate instead of the
void:subjectsTarget and
void:objectsTarget predicates. Such usage should
not be the norm and should result in warnings being generated.
Nanopublications [[NANOPUB]] provide a means for data providers to obtain credit for their data contribution, in particular data that can be described in the form of a minimal set of assertions: a minimal piece of information that represents value for which credit is due. Such information is closely related to a link relating instances in two datasets. In some cases it may be desirable to publish a link as a nanopublication. This should not violate a link being published in a linkset according to this specification.
BioDBcore defines the following properties as the set of metadata that should be published in relation to a dataset. The aim of BioDBcore is different from that of VoID, but many of the elements defined are covered in the Open PHACTS dataset description.
An example BioDBcore record for ChEMBL.
The metadata specified in Section 4 covers the functional data required from BioDBcore. The aspects not covered are those relating to discovering who is responsible for a dataset and the publications about the dataset. It is expected that such information can be discovered from the dataset's homepage and is not within the use case scope for the description of the dataset. Such information may be added as additional statements in the VoID description.
There are a wide range of provenance vocabularies that have been proposed. This section gives brief pointers to related vocabularies that could be used in a dataset or linkset description. For more information about the state of provenance vocabularies, the interested reader is recommended [[PROV-XG]].
The Provenance Ontology (PROV-O) [[PROV-O]] is a W3C candiate recommendation for representing provenance information about documents, datasets, workflow runs, etc. It is broadly based on the Open Provenance Model [[OPM]]. It is capable of expressing complex provenance relationships.
The Provenance, Authoring and Versioning Ontology (PAV) [[PAV]] provides a comprehensive set of relationships for capturing basic provenance information.
The Provenance Vocabulary [[PRV]]] is another lightweight vocabulary of provenance predicates with an emphasis on data creation and data access on the Web.
Defined as an extension to VoID, the vocabulary for data and dataset provenance (voidp) [[VOIDP]] is a vocabulary for defining provenance relationships of data and datasets. The vocabulary focuses on four specific pieces of provenance information:
"for a piece of data, x :
- when was x derived,
- how was x derived,
- what data had been used to derive x,
- who carried out the transformations that resulted in the current value of x." [[VOIDP]]
These are a subset of the information that needs to be captured for the Open PHACTS linksets.
Complete list of suggested URIs.
This section lists suggested vocabulary terms for the dataset topics metadata that are relevant for Open PHACTS. The term in bold is the preferred URI.
Below are the terms from the frequency of change vocabulary.
freq:triennial
freq:biennial
freq:annual
freq:semiannual
freq:threeTimesAYear
freq:quarterly
freq:bimonthly
freq:monthly
freq:semimonthly
freq:biweekly
freq:threeTimesAMonth
freq:weekly
freq:semiweekly
freq:threeTimesAWeek
freq:daily
freq:continuous
freq:irregular
Add list of known vocabulary terms to be used for link justification.
The ChemSpider site already has a VoID descriptor available from http://rdf.chemspider.com/void.rdf. This has been created with a previous version of the VoID specification. Below is a suggested updated version which conforms with the requirements of this specification.
@prefix : <http://rdf.chemspider.com/void.ttl#>.
@prefix dcterms: <http://purl.org/dc/terms/>.
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix freq: <http://purl.org/cld/freq/> .
@prefix pav: <http://purl.org/pav/>.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix skos: <http://www.w3.org/2004/02/skos/core#>.
@prefix voag: <http://voag.linkedmodel.org/schema/voag#> .
@prefix void: <http://rdfs.org/ns/void#>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
# Metadata about this file
<http://rdf.chemspider.com/void.ttl>
a void:DatasetDescription;
dcterms:title "A VoID Description of the ChemSpider Dataset"@en;
dcterms:description
"""This is an example VoID description for the ChemSpider dataset.
It is derived from the existing VoID description and updating it to the latest
version of the VoID specification."""@en;
pav:createdBy <http://www.cs.man.ac.uk/~graya/me.ttl>;
pav:createdOn "2012-05-02T14:48:03Z"^^xsd:dateTime;
pav:lastUpdateOn "2012-08-10T09:32:49Z"^^xsd:dateTime;
pav:derivedFrom <http://rdf.chemspider.com/void.rdf>;
foaf:primaryTopic :chemSpiderDataset;
.
# Description of the ChemSpider dataset
:chemSpiderDataset
# General metadata
a void:Dataset;
dcterms:title "ChemSpider"@en;
dcterms:description "ChemSpider's Public Dataset"@en;
foaf:homepage <http://rdf.chemspider.com/>;
foaf:page <http://www.chemspider.com/>;
dcterms:license <http://www.chemspider.com/Disclaimer.aspx>;
void:uriSpace "http://rdf.chemspider.com/"^^xsd:string;
#Provenance
dcterms:publisher <http://www.chemspider.com/>;
dcterms:created "2007-03-01T00:00:00"^^xsd:dateTime;
dcterms:modified "2012-10-16T00:00:00"^^xsd:dateTime;#Subsets
void:subset :chemSpiderDataset_chembl_subset,:chemSpiderDataset_drugbank_subset;
#Vocabularies, topics, resources
void:vocabulary <http://purl.org/dc/elements/1.1/>,
<http://purl.org/dc/terms/>,
<http://www.openarchives.org/ore/terms/>,
<http://www.polymerinformatics.com/ChemAxiom/ChemDomain.owl#>,
<http://xmlns.com/foaf/0.1/>;
dcterms:subject <http://dbpedia.org/resource/Molecule>;
void:exampleResource <http://rdf.chemspider.com/2157>;
#Dataset Access
void:sparqlEndpoint <http://rdf.chemspider.com/sparql>;
#Update Frequency
voag:frequencyOfChange freq:continuous;
#Other Metadata
# Technical features
void:feature <http://www.w3.org/ns/formats/RDF_XML>;
# Dataset statistics
void:triples "1157624328"^^xsd:nonNegativeInteger;
.
:chemSpiderDataset_chembl_subset
#General Metadata
a void:Dataset;
dcterms:title "ChemSpider ChEMBL Subset"@en;
dcterms:description "The slice of ChemSpider data that corresponds to ChEMBL molecules."@en;
#Provenance
pav:retrievedFrom <ftp://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/releases/chembl_13/chembl_13.sdf.gz>;
pav:retrievedOn "2012-08-02T10:23:56Z"^^xsd:dateTime;
pav:retrievedBy <http://www.chemspider.com/> ;
#Dataset Access
void:dataDump <https://www.dropbox.com/sh/6zboa8z9i9vrzyl/7NxayhkUH0/ChEMBL20120731.zip>;
.
# Description of the ChemSpider subset relating to DrugBank
:chemSpiderDataset_drugbank_subset
#General Metadata
a void:Dataset;
dcterms:title "ChemSpider DrugBank Subset"@en;
dcterms:description "Data corresponding to DrugBank."@en;
#Provenance
pav:retrievedFrom <http://www.drugbank.ca/system/downloads/current/structures/all.sdf.zip>;
pav:retrievedOn "2012-08-02T10:24:06Z"^^xsd:dateTime;
pav:retrievedBy <http://www.chemspider.com/> ;
#Dataset Access
void:dataDump <https://www.dropbox.com/sh/6zboa8z9i9vrzyl/qcFZzbLM77/DrugBank20120731.zip>;
.
Below we provide the VoID document for the ChEMBL-RDF dataset, which
is a conversion of the ChEMBL database. The VoID file would be
located at
http://linkedchemistry.info/void.ttl.
@prefix : <http://linkedchemistry.info/void.ttl#>.
@prefix dcterms: <http://purl.org/dc/terms/>.
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix freq: <http://purl.org/cld/freq/> .
@prefix pav: <http://purl.org/pav/>.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix skos: <http://www.w3.org/2004/02/skos/core#>.
@prefix voag: <http://voag.linkedmodel.org/schema/voag#> .
@prefix void: <http://rdfs.org/ns/void#>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
# Metadata about this file
<>
a void:DatasetDescription;
dcterms:title "ChEMBL-RDF VoID Description"@en;
dcterms:description
"""This is the VoID description for a ChEMBL-RDF dataset."""@en;
pav:createdBy <http://egonw.github.com/#me> ;
pav:createdOn "2012-08-12T16:56:07Z"^^xsd:dateTime;
pav:lastUpdateOn "2012-09-14T12:21:12Z"^^xsd:dateTime;
pav:previousVersion <http://semantics.bigcat.unimaas.nl/chembl/v13_ops/chembl-rdf-void.ttl>;
foaf:primaryTopic :chemblrdf_dataset.
:chemblrdf_dataset
# General metadata
a void:Dataset;
dcterms:title "ChEMBL-RDF 13.OPS.2"@en;
dcterms:description "The ChEMBL database in RDF format."@en;
foaf:homepage <http://github.com/egonw/chembl.rdf/>;
foaf:page <http://www.biosharing.org/biodbcore-000015>;
dcterms:license <http://creativecommons.org/licenses/by-sa/3.0/> ;
void:uriSpace "http://linkedchemistry.info/chembl/"^^xsd:string ;
# Provenance
pav:version "v13_ops";
pav:previousVersion <http://semantics.bigcat.unimaas.nl/chembl/v13_ops/chembl-rdf-void.ttl#chemblrdf_dataset>;
pav:importedFrom :chembl_dataset;
pav:importedBy <http://egonw.github.com/#me> ;
pav:importedOn "2012-05-15T15:34:40Z"^^xsd:dateTime;
pav:createdWith <https://github.com/openphacts/chembl.rdf>;
# Subsets
void:subset :chemblrdf_compounds, :chemblrdf_targets ;
# Vocabularies, topics, resources
void:vocabulary
<http://www.w3.org/1999/02/22-rdf-syntax-ns#>,
<http://www.w3.org/2000/01/rdf-schema#>,
<http://www.w3.org/2002/07/owl#>,
<http://www.w3.org/2001/XMLSchema#>,
<http://purl.org/dc/elements/1.1/>,
<http://purl.org/ontology/bibo/>,
<http://xmlns.com/foaf/0.1/>,
<http://purl.org/spar/cito/>,
<http://www.w3.org/2004/02/skos/core#> ,
<http://purl.obolibrary.org/obo#>,
<http://www.blueobelisk.org/ontologies/chemoinformatics-algorithms/#>,
<http://www.blueobelisk.org/chemistryblogs/>,
<http://www.nmrshiftdb.org/onto#>,
<http://www.ifomis.org/bfo/1.1/snap#>,
<http://semanticscience.org/resource/>,
<http://purl.org/obo/owl/CHEBI#>;
# Update frequency
#Need to verify the update frequency!
voag:frequencyOfChange freq:semiannual;
# Other metadata
void:distinctSubjects "18018451"^^xsd:integer ;
.
:chembl_dataset
# Metadata about the original data source
a dcterms:Dataset;
dcterms:title "ChEMBL"@en;
dcterms:description """ChEMBL is a database of bioactive drug-like small molecules, it contains 2-D structures, calculated properties (e.g. logP, Molecular Weight, Lipinski Parameters, etc.) and abstracted bioactivities (e.g. binding constants, pharmacology and ADMET data)."""@en;
foaf:homepage <http://www.ebi.ac.uk/chembl/>;
dcterms:license <http://creativecommons.org/licenses/by-sa/3.0/> ;
pav:version "13"^^xsd:integer ;
dcterms:publisher <http://www.ebi.ac.uk/chembl/>;
dcterms:created "2012-02-29T00:00:00Z"^^xsd:dateTime;
dcterms:modified "2012-02-29T00:00:00Z"^^xsd:dateTime;
.
:chemblrdf_compounds
# Subset metadata
a void:Dataset ;
dcterms:title "ChEMBL Molecules"@en;
dcterms:description "The subset of ChEMBL data relating to molecules."@en;
void:uriSpace "linkedchemistry.info/chembl/molecule/"^^xsd:string ;
dcterms:subject <http://dbpedia.org/resource/Molecule> ;
void:exampleResource <http://linkedchemistry.info/chembl/molecule/m1>;
void:dataDump <http://semantics.bigcat.unimaas.nl/chembl/v13_ops/compounds.nt.gz> ;
.
:chemblrdf_targets
# Subset metadata
a void:Dataset ;
dcterms:title "ChEMBL Targets"@en;
dcterms:description "The subset of ChEMBL data relating to targets which are single proteins."@en;
void:uriSpace "linkedchemistry.info/chembl/target/"^^xsd:string ;
dcterms:subject <http://dbpedia.org/resource/Protein> ;
void:exampleResource <http://linkedchemistry.info/chembl/target/t1>;
void:dataDump <http://semantics.bigcat.unimaas.nl/chembl/v13_ops/targets.nt.gz> ;
.
Below is the start of the linkset file relating ChemSpider compounds with ChEMBL-RDF compounds. The linkset reuses the VoID descriptions already provided for the datasets, but augments these with additional metadata.
@prefix : <http://linkedchemistry.info/void.ttl>.
@prefix dcterms: <http://purl.org/dc/terms/>.
@prefix dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix freq: <http://purl.org/cld/freq/> .
@prefix pav: <http://purl.org/pav/>.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix skos: <http://www.w3.org/2004/02/skos/core#>.
@prefix voag: <http://voag.linkedmodel.org/schema/voag#> .
@prefix void: <http://rdfs.org/ns/void#>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
@prefix chembl: <http://linkedchemistry.info/chemblid/> .
# Metadata about this file
<>
a void:DatasetDescription;
dcterms:title "ChEMBL-RDF Compounds to ChemSpider linkset"@en;
dcterms:description """A VoID linkset that links compounds in ChEMBL-RDF
with compounds in ChemSpider."""@en;
pav:createdBy <http://egonw.github.com/#me> ;
pav:createdOn "2012-08-13T15:00:00Z"^^xsd:dateTime ;
pav:lastUpdateOn "2012-09-14T11:15:32Z"^^xsd:dateTime ;
foaf:primaryTopic :chembl-rdf-compounds_cs_linkset ;
.
# Pointer to the subset description in the ChEMBL VoID file
<http://linkedchemistry.info/void.ttl#chemblrdf_compounds>
# Linkset declared as a subset. Inherits properties
void:subset :chembl-rdf-compounds_cs_linkset .
:chembl-rdf-compounds_cs_linkset
# General linkset metadata
a void:Linkset ;
dcterms:title "ChEMBL-RDF Compounds ChemSpider Linkset"@en;
dcterms:description "Linkset relating ChEMBL-RDF compounds to ChemSpider compounds."@en;
#Explicit declaration of license for the linkset is preferred
dcterms:license <http://creativecommons.org/licenses/by-sa/3.0/> ;
# Link information
void:subjectsTarget <http://linkedchemistry.info/void.ttl#chemblrdf_compounds> ;
void:objectsTarget <http://rdf.chemspider.com/void.ttl#chemSpiderDataset_chembl_subset> ;
void:linkPredicate skos:exactMatch ;
dul:expresses <http://semanticscience.org/resource/CHEMINF_000059> ;
# Linkset provenance
pav:authoredOn "2012-02-22T10:59:38Z"^^xsd:dateTime ;
pav:authoredBy <http://www.chemspider.com/> ;
pav:createdBy <http://egonw.github.com/#me> ;
pav:createdOn "2012-05-15T11:29:01Z"^^xsd:dateTime ;
# Linkset statistics
void:triples "1073967"^^xsd:integer ;
.
chembl:CHEMBL1236438 skos:exactMatch <http://rdf.chemspider.com/43> .
chembl:CHEMBL144103 skos:exactMatch <http://rdf.chemspider.com/60> .
#...
An alternative deployment strategy for this linkset would be to
include the linkset metadata in the ChEMBL-RDF VoID description
file as a subset of the ChEMBL-RDF dataset. The linkset metadata
must then include the predicate void:dataDump to point
to the file containing the links. The file containing the links
must include the corresponding predicate void:inDataset
pointing back to the linkset description.
Below is the linkset file relating ChemSpider compounds to DrugBank drugs. As there is no VoID file available for the DrugBank dataset, the VoID information is included in the linkset, following the minimal information prescribed in Section 5.5. For ChemSpider, the VoID information is imported from the ChemSpider location.
@prefix : <http://rdf.chemspider.com/void.ttl#>. @prefix dcterms: <http://purl.org/dc/terms/>. @prefix dul: <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix freq: <http://purl.org/cld/freq/> . @prefix pav: <http://purl.org/pav/>. @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>. @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>. @prefix skos: <http://www.w3.org/2004/02/skos/core#>. @prefix voag: <http://voag.linkedmodel.org/schema/voag#> . @prefix void: <http://rdfs.org/ns/void#>. @prefix xsd: <http://www.w3.org/2001/XMLSchema#>. # Metadata about this file <> a void:DatasetDescription; dcterms:title "ChemSpider to DrugBank Linkset VoID Description"@en; dcterms:description """This is an example VoID description for a ChemSpider linkset. The linkset relates ChemSpider identifiers with DrugBank identifiers. The links have been generated as the chemicals share the same structure."""@en; pav:createdBy <http://www.cs.man.ac.uk/~graya/me.ttl>; pav:createdOn "2012-08-13T16:43:25Z"^^xsd:dateTime; pav:lastUpdateOn "2012-10-16T10:22:43Z"^^xsd:dateTime; foaf:primaryTopic :chemSpider_drugbank_linkset; . # Pointer to the ChemSpider dataset <http://rdf.chemspider.com/void.ttl#chemSpiderDataset_drugbank_subset> # declare that the linkset is a subset of ChemSpider void:subset :chemSpider_drugbank_linkset; . # Need to declare dataset metadata about DrugBank as no VoID file ## Note this is just a minimal amount of information as prescribed in Section 5.5 :drugbank_drugs_dataset # General Dataset Metadata a void:Dataset; dcterms:title "DrugBank Drugs Dataset"@en; dcterms:description "A subset of the DrugBank database containing the information relating to drugs."@en; foaf:homepage <http://www4.wiwiss.fu-berlin.de/drugbank/>; foaf:page <http://www.drugbank.ca/>; dcterms:license <http://www.drugbank.ca/about#cite>; void:uriSpace "http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugs/"^^xsd:string; # Provenance and Version pav:version "3.0"; pav:retrievedOn "2011-11-30T11:01:59Z"^^xsd:dateTime; pav:retrievedFrom <http://www4.wiwiss.fu-berlin.de/drugbank/drugbank_dump.nt.bz2>; pav:retrievedBy [ a foaf:Person; foaf:name "Antonis Loizou"^^xsd:string. ]; . # Description of the linkset from ChemSpider to DrugBank :chemSpider_drugbank_linkset # General Linkset Metadata a void:Linkset; dcterms:title "ChemSpider DrugBank Linkset"@en; dcterms:description "Linkset relating ChemSpider compounds to DrugBank drugs."@en; dcterms:license <http://creativecommons.org/licenses/by-sa/3.0/> ; # Link Information void:subjectsTarget <http://rdf.chemspider.com/void.ttl#chemSpiderDataset_drugbank_subset>; void:objectsTarget :drugbank_drugs_dataset; void:linkPredicate skos:exactMatch; dul:expresses <http://semanticscience.org/resource/CHEMINF_000059>; # Linkset Provenance pav:authoredBy <http://www.chemspider.com/>; pav:authoredOn "2012-02-23T09:08:29Z"^^xsd:dateTime; pav:createdBy <http://www.cs.man.ac.uk/~graya/me.ttl>; pav:createdOn "2012-08-13T15:29:31Z"^^xsd:dateTime; # Linkset statistics void:triples "6428"^^xsd:nonNegativeInteger; . # Location of the triples ## Note that the triples would not contain the void header information as ## that is in this file. I have assumed the same location as the subset ## above, but this needs to change to a correct location. ## The file containing the triples should contain the following backlink ## |uriOfTheData| void:inDataset <http://rdf.chemspider.com/void-example.rdf#chemSpider_drugbank_linkset>.