Skip to main content
Log in

Performance of Probabilistic Method to Detect Duplicate Individual Case Safety Reports

  • Original Research Article
  • Published:
Drug Safety Aims and scope Submit manuscript

Abstract

Background

Individual case reports of suspected harm from medicines are fundamental for signal detection in postmarketing surveillance. Their effective analysis requires reliable data and one challenge is report duplication. These are multiple unlinked records describing the same suspected adverse drug reaction (ADR) in a particular patient. They distort statistical screening and can mislead clinical assessment. Many organisations rely on rule-based detection, but probabilistic record matching is an alternative.

Objectives

The aim of this study was to evaluate probabilistic record matching for duplicate detection, and to characterise the main sources of duplicate reports within each data set.

Research Design

vigiMatch™, a published probabilistic record matching algorithm, was applied to the WHO global individual case safety reports database, VigiBase®, for reports submitted between 2000 and 2010. Reported drugs, ADRs, patient age, sex, country of origin, and date of onset were considered in the matching. Suspected duplicates for the UK, Denmark, and Spain were reviewed and classified by the respective national centre. This included evaluation to determine whether confirmed duplicates had already been identified by in-house, rule-based screening. Furthermore, each confirmed duplicate was classified with respect to the likely source of duplication.

Measures

For each country, the proportions of suspected duplicates classified as confirmed duplicates, likely duplicates, otherwise related, and unrelated were obtained. The proportions of confirmed or likely duplicates that were not previously known by the national organisation were determined, and variations in the rates of suspected duplicates across subsets of reports were characterised.

Results

Overall, 2.5 % of the reports with sufficient information to be evaluated by vigiMatch were classified as suspected duplicates. The rates for the three countries considered in this study were 1.4 % (UK), 1.0 % (Denmark), and 0.7 % (Spain). Higher rates of suspected duplicates were observed for literature reports (11 %) and reports with fatal outcome (5 %), whereas a lower rate was observed for reports from consumers and non-health professionals (0.5 %). The predictive value for confirmed or likely duplicates among reports flagged as suspected duplicates by vigiMatch ranged from 86 % for the UK, to 64 % for Denmark and 33 % for Spain. The proportions of confirmed duplicates that were previously unknown to national centres ranged from 89 % for Spain, to 60 % for the UK and 38 % for Denmark, despite in-house duplicate detection processes in routine use. The proportion of unrelated cases among suspected duplicates were below 10 % for each national centre in the study.

Conclusions

Probabilistic record matching, as implemented in vigiMatch, achieved good predictive value for confirmed or likely duplicates in each data source. Most of the false positives corresponded to otherwise related reports; less than 10 % were altogether unrelated. A substantial proportion of the correctly identified duplicates had not previously been detected by national centre activity. On one hand, vigiMatch highlighted duplicates that had been missed by rule-based methods, and on the other hand its lower total number of suspected duplicates to review improved the accuracy of manual review.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. Matching drugs tend to be highly rewarded by vigiMatch, so the more drugs are listed on a report, the more likely they are to receive a sufficiently high score when matched against themselves.

References

  1. Lindquist M. Data quality management in pharmacovigilance. Drug Saf. 2004;27(12):857–70.

    Article  PubMed  Google Scholar 

  2. Norén GN, Bate A, Orre R. A hit-miss model for duplicate detection in the WHO drug safety database. In: KDD '05 Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. New York, USA: ACM; 2005. pp. 459–68. doi:10.1145/1081870.1081923

  3. Almenoff J, Tonning JM, Gould AL, et al. Perspectives on the use of data mining in pharmacovigilance. Drug Saf. 2005;28(11):981–1007.

    Article  CAS  PubMed  Google Scholar 

  4. Norén GN, Orre R, Bate A, Edwards IR. Duplicate detection in adverse drug reaction surveillance. Data Min Knowl Discov. 2007;2007(14):305–28.

    Article  Google Scholar 

  5. Hauben M, Reich L, DeMicco J, Kim K. ‘Extreme duplication’ in the US FDA Adverse Events Reporting System database. Drug Saf. 2007;30(6):551–4.

    Article  PubMed  Google Scholar 

  6. Lindquist M. Vigibase, the WHO global ICSR database system: basic facts. Drug Inf J. 2008;42(5):409–19.

    Google Scholar 

  7. Olsson S. The role of the WHO programme on International Drug Monitoring in coordinating worldwide drug safety efforts. Drug Saf. 1998;19(1):1–10.

    Article  CAS  PubMed  Google Scholar 

  8. Copas JB, Hilton FJ. Record linkage: statistical models for matching computer records. J R Stat Soc Ser A Stat Soc. 1990;153(3):287–320.

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgments

The research leading to these results was conducted as part of the PROTECT consortium (Pharmacoepidemiological Research on Outcomes of Therapeutics by a European ConsorTium, www.imi-protect.eu) which is a public-private partnership coordinated by the European Medicines Agency.

The PROTECT project has received support from the Innovative Medicine Initiative Joint Undertaking (www.imi.europa.eu) under Grant Agreement no. 115004, resources of which are composed of financial contribution from the European Union’s Seventh Framework Program (FP7/2007–2013) and companies of the European Federation of Pharmaceutical Industries and Associations (EFPIA) in-kind contribution.

The authors would like to thank Johan Hopstadius, previously with the Uppsala Monitoring Centre, for contributions to the early phases of this project.

Conflicts of Interest

G. Niklas Norén is an employee of the Uppsala Monitoring Centre who has developed and implemented the vigiMatch algorithm and may make it available as a commercial offering and/or as open source. Philp Michael Tregunno, Dorthe Bech Fink, Cristina Fernandez-Fernandez, and Edurne Lázaro-Bengoa have no conflicts of interest that are directly relevant to the content of this study.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Philip Michael Tregunno.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tregunno, P.M., Fink, D.B., Fernandez-Fernandez, C. et al. Performance of Probabilistic Method to Detect Duplicate Individual Case Safety Reports. Drug Saf 37, 249–258 (2014). https://doi.org/10.1007/s40264-014-0146-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40264-014-0146-y

Keywords

Navigation