Mining severe drug-drug interaction adverse events using Semantic Web technologies: a case study

Background Drug-drug interactions (DDIs) are a major contributing factor for unexpected adverse drug events (ADEs). However, few of knowledge resources cover the severity information of ADEs that is critical for prioritizing the medical need. The objective of the study is to develop and evaluate a Semantic Web-based approach for mining severe DDI-induced ADEs. Methods We utilized a normalized FDA Adverse Event Report System (AERS) dataset and performed a case study of three frequently prescribed cardiovascular drugs: Warfarin, Clopidogrel and Simvastatin. We extracted putative DDI-ADE pairs and their associated outcome codes. We developed a pipeline to filter the associations using ADE datasets from SIDER and PharmGKB. We also performed a signal enrichment using electronic medical records (EMR) data. We leveraged the Common Terminology Criteria for Adverse Event (CTCAE) grading system and classified the DDI-induced ADEs into the CTCAE in the Web Ontology Language (OWL). Results We identified 601 DDI-ADE pairs for the three drugs using the filtering pipeline, of which 61 pairs are in Grade 5, 56 pairs in Grade 4 and 484 pairs in Grade 3. Among 601 pairs, the signals of 59 DDI-ADE pairs were identified from the EMR data. Conclusions The approach developed could be generalized to detect the signals of putative severe ADEs induced by DDIs in other drug domains and would be useful for supporting translational and pharmacovigilance study of severe ADEs.


Introduction
Drug-drug interactions (DDIs) are a major contributing factor for unexpected adverse drug events (ADEs) [1]. A semantically coded knowledge base of DDI-induced ADEs with severity information is critical for clinical decision support systems and translational research applications. In particular, there is emerging interest in investigating genetic susceptibility of DDI-induced ADEs and developing genetic tests to identify all those at risk of ADEs prior to prescribing potentially dangerous medication [2,3], in which the severity information is essential for prioritizing the medical need to evaluate the potential impact of pharmacogenomics information in reducing ADEs [4]. However, few of knowledge resources cover severity information of ADEs.
While recognizing, explaining and ultimately predicting DDIs constitute a huge challenge for medicine and public health, informatics-based approaches are increasingly used in dealing with the challenge [5]. Semantic Web technologies provide a scalable framework for data standardization and data integration from heterogeneous resources. For instance, Samwald et al. [6] developed a Semantic Web-based knowledge base for query answering and decision support in clinical pharmacogenetics, in which three dataset components are integrated. In our previous and ongoing study, we developed a standardized knowledge base of ADEs known as ADEpedia (http://adepedia.org) leveraging Semantic Web technologies [7]. The ADEpedia is intended to integrate existing known ADE knowledge for drug safety surveillance from disparate resources such as Food and Drug Administration (FDA) Structured Product Labeling (SPL) [7], FDA Adverse Event Reporting System (AERS) [8], and the Unified Medical Language System (UMLS) [9].
The objective of the study is to develop and evaluate a Semantic Web-based approach for mining severe DDI-induced ADEs. We utilized a normalized FDA AERS dataset and performed a case study of three frequently prescribed cardiovascular drugs: Warfarin, Clopidogrel and Simvastatin. We extracted putative DDI-ADE pairs and their associated outcome codes. We developed a pipeline to filter the associations using ADE datasets from SIDER and PharmGKB. We also performed a signal enrichment using electronic medical records (EMR) data. We leveraged the Common Terminology Criteria for Adverse Event (CTCAE) grading system and classified the DDI-induced ADEs into the CTCAE in the Web Ontology Language (OWL).

FDA Adverse Event Reporting System (AERS)
FDA AERS is a database that provides information on adverse event and medication error reports submitted to FDA [10]. By the definition of FDA, the "serious" means that one or more of the following outcomes were documented in the report: death (DE), hospitalization (HO), life threatening (LT), disability (DS), congenital anomaly (CA) and/ or other (OT) serious outcome. In our previous study, we produced a normalized AERS dataset known as AERS-DM [11]. The dataset contains 4,639,613 unique putative Drug-ADE pairs in which the drugs are represented by RxNorm [12] codes and the putative ADEs are represented by MedDRA [13] codes. The data set also contains the unique ID number (known as ISR) for each corresponding AERS report, which is a primary link field between the AERS data file. We used the ISR field to identify the outcome codes of each AERS report. Table 1 shows the outcome code definitions in AERS database. CTCAE is a widely accepted, standard grading scale for adverse events throughout the oncology research community [14]. The current released version is CTCAE 4.0. This version contains 764 AE terms and 26 "Other, specify" options for reporting text terms not listed in CTCAE. Each AE term is associated with a 5-point severity scale. The AE terms are grouped by MedDRA Primary SOC classes. In the CTCAE, "Grade" refers to the severity of the adverse event (AE). The CTCAE displays Grades 1 through 5 with unique clinical descriptions of severity for each AE based on a general guideline. Table 2 shows the grade definitions in the CTCAE grading system.

ADE datasets
SIDER (SIDe Effect Resource) is a public, computer-readable side effect resource that contains information on marketed medicines and their recorded adverse drug reactions [15]. The information is extracted from public documents and package inserts, in particular, from the US FDA Structured Product Labels (SPLs). The current version was released on October 17, 2012. PharmGKB DDI-ADE Dataset is a database of DDI side effects based on FDA AERS reporting data [16], in which the confounding factors for prediction of the side effects are corrected through leveraging covariates in observational clinical data [17].

Semantic Web technologies
The World Wide Web consortium (W3C) is the main standards body for the World Wide Web [18]. The goal of the W3C is to develop interoperable technologies and tools as well as specifications and guidelines to lead the web to its full potential. The resource description framework (RDF), web ontology language (OWL), and SPARQL (a recursive acronym for SPARQL Protocol and RDF Query Language) specifications have all achieved the level of W3C recommendations, and are becoming generally accepted and widely used. RDF is a model of directed, labeled graphs that use a set of triples. Each triple is modeled in the form of subject, predicate and object. SPARQL is a standard query language for RDF graphs. OWL is a standard ontology language used for ontology modeling.

Methods
We utilized a normalized AERS dataset known as AERS-DM that was produced in a previous study [11]. The dataset contains 4,639,613 unique putative Drug-ADE pairs in Table 2 Grade definitions in the CTCAE grading system Note: Activities of Daily Living (ADL); *Instrumental ADL refer to preparing meals, shopping for groceries or clothes, using the telephone, managing money, etc.; **Self care ADL refer to bathing, dressing and undressing, feeding self, using the toilet, taking medications, and not bedridden.
which the drugs are represented by RxNorm codes and the putative ADEs are represented by MedDRA codes. The AERS-DM dataset is organized in two database files in the Tab Separated Values (TSV) format and accessible at: http://informatics.mayo.edu/ adepedia/index.php/Download. Figure 1 shows the system architecture of our approach. We first extracted a subset of putative DDI-ADE pairs (in which only two drugs are listed on a report) with their associated outcome codes from original AERS-DM dataset.
Second, we developed a filtering pipeline that comprises three datasets. The first dataset is a subset of original AERS-DM in which only one drug is listed on a report. This dataset was used to build a knowledge base of severe ADEs in a previous study. The second dataset is the SIDER 2 dataset. Table 3 shows a list of drug-ADE pair examples from the dataset, in which drug names are coded in STICH ID (http://stitch. embl.de) and ADE names are coded in MedDRA. We excluded the putative DDI-ADE pairs based on the Drug-ADE pairs of the two datasets. The filtering would ensure that the reported ADEs could not be explained by a single drug effect. The third dataset is a PharmGKB dataset that is used as "silver" standard. Table 4 shows a list of DDI-ADE examples from the dataset, in which drug names are coded in STICH ID and ADE names are coded in UMLS Concept Unique Identifiers (CUIs).
Third, we converted all the datasets used in this study into the Semantic Web RDF format and loaded them into an open source RDF store known as 4store [19]. We established a SPARQL endpoint that provides standard query services against the RDF store. And then we developed the extraction and filtering algorithms using Java-based Jena ARQ APIs [20].
Third, to enrich the signals of the DDI-induced ADEs, we used the NLP-processed EMR data of a cohort of 138 k patients with health home care provided by Mayo Clinic  Rochester where medications and problems have been extracted and normalized to RxNorm codes and the UMLS concepts from the current medication and problem list sections of clinical notes using MedXN and MedTagger (http:// www.ohnlp.org/). For each DDI-induced ADE triples (D1, D2, P), we obtained the number of patients who are administrated with any of the two drugs or both (i.e., N(D1), N(D2), and N(D1,D2)) and the number of patients with putative ADEs (i.e., N(D1,P), N(D2,P), and N(D1,D2,P) after taking the drugs. An occurrence of problem P is considered as putative ADE if it happens within 36 days of drug administration [17] and there is no occurrence of P in the EMR before the drug administration. We then developed the following metric to measure the signal enrichment of DDI-induced ADE: Finally, we developed the mappings between AERS outcome codes and CTCAE grades and classified the filtered DDI-ADEs into the CTCAE. We asserted that DE in AERS corresponds to Grade 5 in CTCAE; LT corresponds to Grade 4; the rest of outcome codes (HO, DS, CA, RI and OT) correspond to Grade 3. In this study, we utilized the CTCAE version 4.0 [14] rendered in OWL format. Figure 2 shows a screenshot of a Protégé4 environment displaying the categories and severity grades in CTCAE classification.

Results
We were able to extract a set of putative DDI-ADE pairs and their associated outcome codes for the three target drugs: Warfarin, Clopidogrel and Simvastatin from normalized AERS-DM dataset. We then filtered the putative DDI-ADE pairs using the filtering pipeline based on three datasets. Table 5 shows the number of filtered DDI-ADE pairs for each target drug. In total, 601 pairs were filtered. Of them, 61 pairs are classified in Grade 5, 56 pairs in Grade 4 and 484 pairs in Grade 3. Table 6 shows a list of filtered DDI-ADE pair examples for the drug "Simvastatin", in which, drugs are coded in RxNorm RxCUIs and ADEs are coded in MedDRA codes. For the signal enrichment using the EMR data, we found that, there are 89 drug pairs prescribed concomitantly in 9.5 k patients, accounting for 6.9% of all patients in the EMR dataset we used. Out of 601 putative DDI-ADE pairs, the signals of 59 (D1, D2, P) pairs were identified. Table 7 shows the detailed statistics of those pairs occurred in no less than five patients.
For integrating the filtered DDI-ADE pairs with the CTCAE, we produced an OWL rendering for each pair, asserting the filtered DDI-ADEs under AE terms in CTCAE (see Figure 3 for an example).

Discussion
In a previous study, we used a similar Semantic Web-based approach to build a knowledge base of severe ADEs using the FDA AERS reporting data [8]. In this study, we focused on mining the DDI-induced ADEs and their severity information, and configured the filtering pipeline differently using a collection of ADE datasets. The standardization of ADE datasets is essential for enabling interoperability and comparability among heterogeneous data sources. We used a normalized AERS dataset, in which the drug names are normalized using standard drug ontologies RxNorm and NDF-RT and the ADEs are normalized using MedDRA, whereas the datasets from SIDER and PharmGKB used STITCH compound IDs to code drug names and used UMLS CUIs to code ADEs. Apparently, the solid mappings between RxNorm codes and STITCH IDs would be required in future, which will be part of our research efforts in constructing a standardized drug and pharmacological class network [21]. We also tested the signals of putative DDI-ADE pairs filtered by the pipeline using a large EMR data. We were able to detect some strong signals indicated by the enrichment score as illustrated in Table 7. This would potentially provide a very useful tool for the knowledge-driven detection of the DDI-induced ADEs from the EMR, though a rigorous patient chart review with a panel of clinicians would be needed in future to verify the signals to establish the causality of the drug-drug interaction.
For measuring the severity of ADEs, we used the CTCAE severity grading system. We found that the AERS outcome codes used to record serious patient outcomes in the AERS reporting data correspond well to the CTCAE Grades 3 to 5. Semantic Web OWL rendering of the DDI-ADE dataset provides seamless integration with the CTCAE itself, enabling a standard infrastructure for automatic classification of ADEs based on the severity conditions specified in the CTCAE.
There are several limitations in this study. First, we used the logic that a putative DDI-ADE combination is extracted if there exists an AERS report involving two drugs and the ADE. We understand that the AERS reports themselves do not make it easy to report concomitant drugs and these are known to be under-reported. This means the putative DDI-ADE pairs extracted in this study only reflect a portion of all DDI interactions and should not be considered as a comprehensive list. Second, the PharmGKB "silver standard" itself contains signals that have not been validated for causality. This is part of the reasons why we introduce the EMR-based signal enrichment metric in this study. Third, some signals identified from EMR data may not be valid and further rigorous validation approach will be needed in future to filter them out.

Conclusions
In summary, we developed a Semantic Web-based approach to mine severe DDIinduced ADEs. The dataset produced in this study will be publicly available from our ADEpedia website (http://adepedia.org). The approach developed could be generalized to detect the signals from EMR for putative severe ADEs induced by DDIs in other drug domains and would be useful for supporting translational and pharmacovigilance study of severe ADEs.