Linked vaccine adverse event data from VAERS for biomedical data analysis and longitudinal studies
© Tao et al.; licensee BioMed Central. 2014
Received: 16 October 2014
Accepted: 6 December 2014
Published: 31 December 2014
Vaccines have been one of the most successful public health interventions to date. The use of vaccination, however, sometimes comes with possible adverse events. The U.S. FDA/CDC Vaccine Adverse Event Reporting System (VAERS) currently contains more than 200,000 reports for post-vaccination events that occur after the administration of vaccines licensed in the United States. Although the data from the VAERS has been applied to many public health and vaccine safety studies, each individual report does not necessarily indicate a casuality relationship between the vaccine and the reported symptoms. Further statistical analysis and summarization needs to be done before this data can be leveraged.
This paper introduces our efforts on representing the vaccine-symptom correlations and their corresponding meta-information extracted from the VAERS database using Resource Description Framework (RDF). Numbers of occurrences of vaccine-symptom pairs reported to the VAERS were summarized with corresponding proportional reporting ratios (PRR) calculated. All the data was stored in an RDF file. We then applied network analysis approaches to the RDF data to illustrate a use case of the data for longititual studies. We further dicussed our vision on integrating the data with vaccine information from other sources using RDF linked approach to facilitate more comprehensive analyses.
The 1990–2013 data from VAERS has been extracted from the VAERS database. There are 83,148 unique vaccine-symptom pairs with 75 vaccine types and 5,865 different reported symptoms. The yearly and over PRR values for each reported vaccine-symptom pair were calculated. The network properties of networks consisting of significant vaccine-symptom associations (i.e., PRR larger than 1) were then investigated. The results indicated that vaccine-symptom association network is a dense network, with any given node connected to all other nodes through an average of approximately two other nodes and a maximum of five nodes.
Vaccines have been one of the most successful public health interventions to date with most vaccine-preventable diseases having declined in the United States by at least 95-99%. However, vaccines are pharmaceutical products that carry risks. They interact with the human immune systems and could permanently alter gene molecular structures. “Under the National Childhood Vaccine Injury Act of 1986, over $2 billion has been awarded to children and adults for whom the risks of vaccine injury were 100%” . Potential relationships between vaccines and particular vaccine adverse events (VAE) may exist, but not well studied yet. The U.S. FDA/CDC Vaccine Adverse Event Reporting System (VAERS) is a national vaccine safety surveillance program for post-vaccination adverse events (AE) that occur after the administration of vaccines licensed in the United States . Currently the VAERS contains more than 200,000 reports in total. Patients or healthcare providers submit reports about cases of adverse events they have experienced on the VAERS website by providing information ranging from vaccine type, gender, age, symptoms and detailed description of occurred symptoms to onset dates, life-threatening status, hospitalization status, and death-status. The objectives of VAERS are to detect new, unusual, or rare vaccine adverse events; determine patient risk factors for particular types of adverse events; identify vaccine lots with increased numbers or types of reported adverse events; and assess the safety of newly licensed vaccines .
Although a report was submitted into the VAERS system, that by no means is an absolute declaration that the vaccine had direct correlation with the reported symptoms. The causality relationship between a vaccine and an adverse event cannot be simply assumed by the VAERS report. In this study, we do not only focus on the raw data from the VAERS system, but also the correlation of vaccines and symptoms. Through statistical analysis, the correlation can be better accessed by relating frequency of a specific symptom to the corresponding vaccine and the related symptom with all the vaccines in the system. We represent information obtained and summarized from the VAERS database in the Resource Description Framework (RDF) format to facilitate further integration with other vaccine relevant data for more comprehensive analysis. Armed with such knowledge, the ability to predict adverse events, or to design new vaccine approaches that minimize or eliminate serious vaccine-related reactions could be devised, consistent with a more personalized or individual approach to vaccine practice. In the following sections of this paper, we use “symptom” and “adverse event” in a interchangeable manner.
After possible vaccine-adverse event correlations are identified, how to organize these high-dimensional correlation data and facilitate pattern recognition by clinical researchers is still a big challenge. In recent years, network analysis emerges as a very promising approach to address this. Network analysis allows simultaneous representation of complex associations (e.g., protein-protein interactions) among key elements (e.g., gene or proteins) in a system (e.g., gene regulatory networks). For example in the social networks, the nodes are individuals, organizations, or even the entire societies, and the edges are social relationships between the nodes. During last two decades, network-based computational approaches gained popularity and have become a new paradigm to investigate associations among biological entities (e.g., drugs, diseases, and genes). Applications of these approaches include drug repositioning ,, disease gene prioritization -, and identification of disease relationships ,. These network analysis approaches are usually developed based on the observations from real-world networks. First, most real-world networks (e.g., WWW network, protein-protein interaction network, and social network) are not randomly organized but are driven by preferential attachment and growth (e.g., some nodes have more connections than others). Such networks are called “scale-free” networks. In the “scale-free” network, the most highly connected nodes are called “hub’ nodes. Second, most real world networks are modular, comprised of small, densely connected groups of nodes. Network analysis metrics and algorithms have been designed to identify network hub nodes and modules in a scale-free network. For instance, in our previous work, we developed a network analysis approach to identify vaccine-related networks and their underlying structural information from PubMed literature abstracts, which were consistent with that captured by the Vaccine Ontology (VO) . The modular structure and hub nodes of these vaccine networks reveal important unidentified knowledge critical to biomedical research and public health and to generate testable hypotheses for future experimental verification.
The rest of the paper is organized as follows. In Section 2, we discuss our methodology on data collection, summarization, representation, and analysis. In Section 3, we discuss the result of our preliminary study. In Section 4, we introduce our vision on further integrating the VAERS data with more vaccine data sources. Finally in Section 5, we conclude the paper and discuss future directions.
The VAERS data preparation
All of the VAERS data was downloaded from the reporting system’s website (http://vaers.hhs.gov/index). The necessary files from 1990 to 2013 were then loaded into a MySQL relational database. More specifically, three tables are included in the database: Data, Vaccine, and Symptom. The Data table contains information including VAERS ID, date the report was received, the state patient was in, age of patient, sex, and detailed description of the symptom (e.g., if the symptom was life threatening, if the patient in the report died and if-so the date of death, if the patient ever attend the ER for treatment, and if so, how many days was the patient administered at the hospital.) The Vaccine table includes information about the vaccine administered to the patient such as vaccine manufacturer, type of vaccine, dosage of the vaccine, vaccination route, vaccination site, and vaccination name. Vaccine types are annotated with Vaccine Code (https://vaers.hhs.gov/glossary/). The Symptom table contains a list of symptom terms (MedDRA terms) involved in the report. Completed information about one report can be jointed from the three tables using VAERS ID.
The VAERS data summarization
As we discussed before, the VAERS is a spontaneous reporting system which contains unverified reports with inconsistent data quality. Symptoms reported occurring after vaccination do not necessarily have a causality association with the vaccine. In addition to the raw data downloaded from VAERS, we also used statistical methods to summarize meta-level features of vaccine-symptom pairs. For each vaccine-symptom pair, we calculated the following features (1) each year (from 1990–2013) the number of reports that contains the pair; (2) the distribution of reports by gender each year (3) the distribution of reports by age groups; and (4) overall proportional reporting ratio (PRR) and yearly PRRs . A PRR is the ratio between the frequency with which a specific symptom (adverse event) occurs for a vaccine of interest (relative to all symptoms reported for the vaccine) and the frequency with which the same symptom occurs for all vaccines reported to the VAERS (relative to all symptoms for all vaccines reported to VAERS) . A yearly PRR is calcuated using the data only for one particular year (e.g., reports for year 2013). A PRR greater than 1 suggests that the post-vaccination symptom (adverse event) is more commonly observed for individuals administrated with the particular vaccine, relative to all other vaccines reported to the VAERS.
We represented vaccine-symptom pairs as well as the summarization features in Resource Description Framework (RDF). RDF is a W3C standard that specifies a graph-based data model for representing data. Each piece of information is represented as a triple: subject, predicate and object. The RDF representations will allow efficient querying and visualization of relationships between important biomedical entities. A distinguishing characteristic of RDF and ontologies compared to the conventional relational database is “their degree of connectedness, their ability to model coherent, linked relationships” . After representing the associations using RDF graphs, it will enable us to leverage existing Semantic Web tools to explore the Semantic Web Linked Data in a flexible and scalable way. Moreover, it will enable powerful data integration among heterogeneous data sets, which is a well-known challenge in the translational science study community.
vaers:hasYear “2009”^^xsd:long ];
The network analysis and visualization was performed in the Cytoscape tool . Cytoscape is an open-source platform for integration, visualization, and analysis of biological networks. Its functionalities can be extended through Cytoscape plugins. Scientists from different research fields have contributed more than 160 useful plugins so far. These comprehensive features allow us to perform thorough network-level analyses, visualization of our association tables, and integration with other biological networks in the future. We used NetworkAnalyzer plugin (http://med.bioinf.mpi-inf.mpg.de/netanalyzer//index.php) to calculate average node degree, average path length, and network diameter for each vaccine-adverse event network generated from VAERS.
General characteristics of the networks
Average path length
Linking with other resources
Discussion, future directions, and conclusion
PRR is not the only data mining methods for identifying significant association between vaccines and post-vaccination symptoms . The PRR value > 1 is not an indication that the pair has a causal relationship. For example if a symptom only appeared once for one vaccine type, but not for any other vaccine types, the PRR would be a relatively large number. Given it only happened once, however, it could be a coincidence. Therefore we may need to add threshold to the PRR values or the number of occurrences to filter out this kind of extreme situations. We may also want to add other statistical indicators beside PRR to faciliate further analysis. In addition, more advanced network approaches could be applied to identify underlying associations among vaccines and adverse events, such as subnetwork analysis and network alignments among different populations.
There are a few future directions we plan to pursue: (1) identification of network modules in the vaccine-adverse event network; (2) investigation of vaccine-vaccine associations by bipartite network projection strategy; (3) incorporation of more vaccine-disease association databases (e.g., Semantic MEDLINE database, Vaccine Adverse Event Ontology) to construct more complete vaccine-related networks. Also in this study, we focused on comparing the overall network properties of the vaccine-adverse event association networks generated by different years. In the future, we plan to explore such differences using more advanced network-based computational approaches at different network level, such as subnetwork level and single association level.
In summary, we discussed our effort on representing data summarized from VAERS database using RDF. We then applied network analysis on top of the data to illustrate how network-based analysis can be applied to identify underlying association patterns among vaccines and adverse events.
Research reported in this publication was supported by the National Library Of Medicine of the National Institutes of Health under Award Number R01LM011829 and the National Cancer Institute of the National Institutes of Health under Award Number P30CA134274.
- National Vaccine Information Center. Available: http://www.nvic.org/.
- The U.S. FDA/CDC Vaccine Adverse Event Reporting System (VAERxS). Available: http://vaers.hhs.gov/index.
- Arrell DK, Terzic A: Network systems biology for drug discovery. Clin Pharmacol Ther. 2010, 88: 120-125. 10.1038/clpt.2010.91.View ArticlePubMedGoogle Scholar
- Dudley JT, Deshpande T, Butte AJ: Exploiting drug-disease relationships for computational drug repositioning. Brief Bioinform. 2011, 12: 303-311. 10.1093/bib/bbr013.View ArticlePubMedPubMed CentralGoogle Scholar
- Piro RM, Di Cunto F: Computational approaches to disease-gene prediction: rationale, classification and successes. FEBS J. 2012, 279: 678-696. 10.1111/j.1742-4658.2012.08471.x.View ArticlePubMedGoogle Scholar
- Kohler S, Bauer S, Horn D, Robinson PN: Walking the interactome for prioritization of candidate disease genes. Am J Hum Genet. 2008, 82: 949-958. 10.1016/j.ajhg.2008.02.013.View ArticlePubMedPubMed CentralGoogle Scholar
- Chen J, Aronow BJ, Jegga AG: Disease candidate gene identification and prioritization using protein interaction networks. BMC Bioinformatics. 2009, 10: 73-10.1186/1471-2105-10-73.View ArticlePubMedPubMed CentralGoogle Scholar
- Goh KI, Cusick ME, Valle D, Childs B, Vidal M, Barabasi AL: The human disease network. Proc Natl Acad Sci U S A. 2007, 104: 8685-8690. 10.1073/pnas.0701361104.View ArticlePubMedPubMed CentralGoogle Scholar
- Suthram S, Dudley JT, Chiang AP, Chen R, Hastie TJ, Butte AJ: Network-based elucidation of human disease similarities reveals common functional modules enriched for pluripotent drug targets. PLoS Comput Biol. 2010, 6: e1000662-10.1371/journal.pcbi.1000662.View ArticlePubMedPubMed CentralGoogle Scholar
- Zhang Y, Tao C, He Y, Kanjamala P, Liu H: Network-based analysis of vaccine-related associations reveals consistent knowledge with the vaccine ontology. J Biomed Semantics. 2013, 4: 33-10.1186/2041-1480-4-33.View ArticlePubMedPubMed CentralGoogle Scholar
- Rothman KJ, Lanes S, Sacks ST: The reporting odds ratio and its advantages over the proportional reporting ratio. Pharmacoepidemiol Drug Saf. 2004, 13: 519-523. 10.1002/pds.1001.View ArticlePubMedGoogle Scholar
- Banks D, Woo EJ, Burwen DR, Perucci P, Braun MM, Ball R: Comparing data mining methods on the VAERS database. Pharmacoepidemiol Drug Saf. 2005, 14: 601-609. 10.1002/pds.1107.View ArticlePubMedGoogle Scholar
- ᅟ: An Executive Intro to Ontologies. 2009Google Scholar
- Smoot ME, Ono K, Ruscheinski J, Wang PL, Ideker T: Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics. 2011, 27: 431-432. 10.1093/bioinformatics/btq675.View ArticlePubMedGoogle Scholar
- Tao C, Zhang Y, Jiang G, Bouamrane M, Chute C: Optimizing Semantic MEDLINE for Translational Science Studies Using Semantic Web Technologies. International Conference on Information and Knowledge Management. 2012, ACM New York, NY, 53-58.Google Scholar
- Rindflesch TC, Kilicoglu H, Fiszman M, Rosemblat G, Shin D: Semantic MEDLINE: an advanced information management application for biomedicine. Inform Serv Use. 2011, 31: 15-21.Google Scholar
- Lin Y, He Y: Ontology representation and analysis of vaccine formulation and administration and their effects on vaccine immune responses. J Biomed Semantics. 2012, 3: 17-10.1186/2041-1480-3-17.View ArticlePubMedPubMed CentralGoogle Scholar
- Marcos E, Zhao B, He Y: The Ontology of Vaccine Adverse Events (OVAE) and its usage in representing and analyzing adverse events associated with US-licensed human vaccines. J Biomed Semantics. 2013, 4: 40-10.1186/2041-1480-4-40.View ArticlePubMedPubMed CentralGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.