NLP Challenges for Detecting Medication and Adverse Drug Events from Electronic Health Records (MADE1.0)
hosted by University of Massachusetts Medical School
Adverse drug events (ADEs) are common and occur in approximately 2-5% of hospitalized adult patients. Each ADE is estimated to increase healthcare cost by more than $3,200. Severe ADEs rank among the top 5 or 6 leading causes of death in the United States. Prevention, early detection and mitigation of ADEs could save both lives and dollars. Employing natural language processing (NLP) techniques on electronic health records (EHRs) provides an effective way of real-time pharmacovigilance and drug safety surveillance.
We’ve annotated 1092 EHR notes with medications, as well as relations to their corresponding attributes, indications and adverse events. It provides valuable resources to develop NLP systems to automatically detect those clinically important entities. Therefore we are happy to announce a public NLP challenge, MADE1.0, aiming to promote deep innovations in related research tasks, and bring researchers and professionals together exchanging research ideas and sharing expertise. The ultimate goal is to further advance ADE detection techniques to improve patient safety and health care quality.
- Registration: begins August 1st, 2017
- Training data release: October 2nd, 2017
- System submission: Jan 2nd, 2018
- Workshop: in conjunction with AMIA summit 2018, March 2018
The entire dataset contains 1092 de-identified EHR notes from 21 cancer patients. Each EHR note was annotated with medication information (medication name, dosage, route, frequency, duration), ADEs, indications, other signs and symptoms, and relations among those entities. We split the data into a training set consisting of ~900 notes and a test set consisting of ~180 notes. Both will be released in BioC format.
MADE1.0 challenge consists of three tasks defined as follows.
- Named entity recognition (NER): develop systems to automatically detect mentions of medication name and its attributes (dosage, frequency, route, duration), as well as mentions of ADEs, indications, other signs & symptoms.
- Relation identification (RI): given the truth entity annotations, build system to identify relations between medication name entities and its attribute entities, as well as relations between medication name entities and ADE, indications and other sign & symptoms.
- Integrated task (IT): design and develop a integrative system to conduct the above two tasks together.
All the participated teams will not get the test data itself, instead we will have a platform available to accept system submissions where all the submitted systems will be run on the same withheld test data, in the same computing environment and evaluated by the same evaluation scripts with the same metrics on different tasks. Each team is allowed to submit up to three versions of systems for each task, and each team can choose to participate either one or more tasks in this challenge.
Participants are asked to submit a short paper describing their methodologies. It can contain a graphical summary of the proposed architecture. The document should not exceed 4 pages, 1.5 line spacing, 12 font size. The authors of either top performing systems or particularly novel approaches will be invited to present or demonstrate their systems at the workshop. A special issue of a journal will be organized following the workshop.