
Excellent! Next you can
create a new website with this list, or
embed it in an existing web page.
This is just a preview! If you would like to use this list on your web page
or create a new webpage based on this,
create a free account and upload
the file there. Then you will be able to modify it going forward.
To the site owner:
Action required! Mendeley is changing its API. In order to keep using Mendeley with BibBase past April 14th, you need to:
- renew the authorization for BibBase on Mendeley, and
- update the BibBase URL in your page the same way you did when you initially set up this page.
2023
(6)
Associations Between Natural Language Processing–Enriched Social Determinants of Health and Suicide Death Among US Veterans.
Mitra, A.; Pradhan, R.; Melamed, R. D.; Chen, K.; Hoaglin, D. C.; Tucker, K. L.; Reisman, J. I.; Yang, Z.; Liu, W.; Tsai, J.; and Yu, H.
JAMA Network Open, 6(3): e233079–e233079. March 2023.
_eprint: https://jamanetwork.com/journals/jamanetworkopen/articlepdf/2802468/mitra_2023_oi_230126_1678209361.50297.pdf
Paper
doi
link
bibtex
abstract
@article{mitra_associations_2023, title = {Associations {Between} {Natural} {Language} {Processing}–{Enriched} {Social} {Determinants} of {Health} and {Suicide} {Death} {Among} {US} {Veterans}}, volume = {6}, issn = {2574-3805}, url = {https://doi.org/10.1001/jamanetworkopen.2023.3079}, doi = {10.1001/jamanetworkopen.2023.3079}, abstract = {Social determinants of health (SDOHs) are known to be associated with increased risk of suicidal behaviors, but few studies use SDOHs from unstructured electronic health record notes.To investigate associations between veterans’ death by suicide and recent SDOHs, identified using structured and unstructured data.This nested case-control study included veterans who received care under the US Veterans Health Administration from October 1, 2010, to September 30, 2015. A natural language processing (NLP) system was developed to extract SDOHs from unstructured clinical notes. Structured data yielded 6 SDOHs (ie, social or familial problems, employment or financial problems, housing instability, legal problems, violence, and nonspecific psychosocial needs), NLP on unstructured data yielded 8 SDOHs (social isolation, job or financial insecurity, housing instability, legal problems, barriers to care, violence, transition of care, and food insecurity), and combining them yielded 9 SDOHs. Data were analyzed in May 2022.Occurrence of SDOHs over a maximum span of 2 years compared with no occurrence of SDOH.Cases of suicide death were matched with 4 controls on birth year, cohort entry date, sex, and duration of follow-up. Suicide was ascertained by National Death Index, and patients were followed up for up to 2 years after cohort entry with a study end date of September 30, 2015. Adjusted odds ratios (aORs) and 95\% CIs were estimated using conditional logistic regression.Of 6 122 785 veterans, 8821 committed suicide during 23 725 382 person-years of follow-up (incidence rate 37.18 per 100 000 person-years). These 8821 veterans were matched with 35 284 control participants. The cohort was mostly male (42 540 [96.45\%]) and White (34 930 [79.20\%]), with 6227 (14.12\%) Black veterans. The mean (SD) age was 58.64 (17.41) years. Across the 5 common SDOHs, NLP-extracted SDOH, on average, retained 49.92\% of structured SDOHs and covered 80.03\% of all SDOH occurrences. SDOHs, obtained by structured data and/or NLP, were significantly associated with increased risk of suicide. The 3 SDOHs with the largest effect sizes were legal problems (aOR, 2.66; 95\% CI, 2.46-2.89), violence (aOR, 2.12; 95\% CI, 1.98-2.27), and nonspecific psychosocial needs (aOR, 2.07; 95\% CI, 1.92-2.23), when obtained by combining structured data and NLP.In this study, NLP-extracted SDOHs, with and without structured SDOHs, were associated with increased risk of suicide among veterans, suggesting the potential utility of NLP in public health studies.}, number = {3}, journal = {JAMA Network Open}, author = {Mitra, Avijit and Pradhan, Richeek and Melamed, Rachel D. and Chen, Kun and Hoaglin, David C. and Tucker, Katherine L. and Reisman, Joel I. and Yang, Zhichao and Liu, Weisong and Tsai, Jack and Yu, Hong}, month = mar, year = {2023}, note = {\_eprint: https://jamanetwork.com/journals/jamanetworkopen/articlepdf/2802468/mitra\_2023\_oi\_230126\_1678209361.50297.pdf}, pages = {e233079--e233079}, }
Social determinants of health (SDOHs) are known to be associated with increased risk of suicidal behaviors, but few studies use SDOHs from unstructured electronic health record notes.To investigate associations between veterans’ death by suicide and recent SDOHs, identified using structured and unstructured data.This nested case-control study included veterans who received care under the US Veterans Health Administration from October 1, 2010, to September 30, 2015. A natural language processing (NLP) system was developed to extract SDOHs from unstructured clinical notes. Structured data yielded 6 SDOHs (ie, social or familial problems, employment or financial problems, housing instability, legal problems, violence, and nonspecific psychosocial needs), NLP on unstructured data yielded 8 SDOHs (social isolation, job or financial insecurity, housing instability, legal problems, barriers to care, violence, transition of care, and food insecurity), and combining them yielded 9 SDOHs. Data were analyzed in May 2022.Occurrence of SDOHs over a maximum span of 2 years compared with no occurrence of SDOH.Cases of suicide death were matched with 4 controls on birth year, cohort entry date, sex, and duration of follow-up. Suicide was ascertained by National Death Index, and patients were followed up for up to 2 years after cohort entry with a study end date of September 30, 2015. Adjusted odds ratios (aORs) and 95% CIs were estimated using conditional logistic regression.Of 6 122 785 veterans, 8821 committed suicide during 23 725 382 person-years of follow-up (incidence rate 37.18 per 100 000 person-years). These 8821 veterans were matched with 35 284 control participants. The cohort was mostly male (42 540 [96.45%]) and White (34 930 [79.20%]), with 6227 (14.12%) Black veterans. The mean (SD) age was 58.64 (17.41) years. Across the 5 common SDOHs, NLP-extracted SDOH, on average, retained 49.92% of structured SDOHs and covered 80.03% of all SDOH occurrences. SDOHs, obtained by structured data and/or NLP, were significantly associated with increased risk of suicide. The 3 SDOHs with the largest effect sizes were legal problems (aOR, 2.66; 95% CI, 2.46-2.89), violence (aOR, 2.12; 95% CI, 1.98-2.27), and nonspecific psychosocial needs (aOR, 2.07; 95% CI, 1.92-2.23), when obtained by combining structured data and NLP.In this study, NLP-extracted SDOHs, with and without structured SDOHs, were associated with increased risk of suicide among veterans, suggesting the potential utility of NLP in public health studies.
Web Information Extraction for Social Good: Food Pantry Answering As an Example.
Chen, H.; and Yu, H.
In Austin, TX, May 2023. ACM
The Web Conference 2023, Austin TX
doi link bibtex abstract
doi link bibtex abstract
@inproceedings{chen_web_2023, address = {Austin, TX}, title = {Web {Information} {Extraction} for {Social} {Good}: {Food} {Pantry} {Answering} {As} an {Example}}, doi = {10.1145/3543507.3583880}, abstract = {Social Determinants of Health (SDH) have more influence on health outcome than clinical care or the physical environment, namely food insecurity, housing instability, and health literacy. Many researchers design applications as a bridge to connect between resource providers and the deprived population. In this study, we take food pantries as a solution to mitigate food insecurity as an example to illustrate an automatic system combining location-aware information retrieval, web information extraction and domain-specific answering. To acquire the latest knowledge, our proposed framework first retrieves pantry candidates based on geolocation of the user, and utilizes structural information from markup language to extract semantic chunks related to six common requests. We use BERT and RoBERTa as information extraction models and compare three different web page segmentation methods in the experiments.}, publisher = {ACM}, author = {Chen, Huan-Yuan and Yu, Hong}, month = may, year = {2023}, note = {The Web Conference 2023, Austin TX}, }
Social Determinants of Health (SDH) have more influence on health outcome than clinical care or the physical environment, namely food insecurity, housing instability, and health literacy. Many researchers design applications as a bridge to connect between resource providers and the deprived population. In this study, we take food pantries as a solution to mitigate food insecurity as an example to illustrate an automatic system combining location-aware information retrieval, web information extraction and domain-specific answering. To acquire the latest knowledge, our proposed framework first retrieves pantry candidates based on geolocation of the user, and utilizes structural information from markup language to extract semantic chunks related to six common requests. We use BERT and RoBERTa as information extraction models and compare three different web page segmentation methods in the experiments.
Evaluating the efficacy of NoteAid on EHR note comprehension among US Veterans through Amazon Mechanical Turk.
Lalor, J. P.; Wu, H.; Mazor, K. M.; and Yu, H.
International Journal of Medical Informatics, 172: 105006. April 2023.
Paper
doi
link
bibtex
abstract
@article{lalor_evaluating_2023, title = {Evaluating the efficacy of {NoteAid} on {EHR} note comprehension among {US} {Veterans} through {Amazon} {Mechanical} {Turk}}, volume = {172}, issn = {1386-5056}, url = {https://www.sciencedirect.com/science/article/pii/S1386505623000230}, doi = {10.1016/j.ijmedinf.2023.105006}, abstract = {Objective Low health literacy is a concern among US Veterans. In this study, we evaluated NoteAid, a system that provides lay definitions to medical jargon terms in EHR notes to help Veterans comprehend EHR notes. We expected that low initial scores for Veterans would be improved by using NoteAid. Materials and Methods We recruited Veterans from the Amazon Mechanical Turk crowd work platform (MTurk). We also recruited non-Veterans from MTurk as a control group for comparison. We randomly split recruited MTurk Veteran participants into control and intervention groups. We recruited non-Veteran participants into mutually exclusive control or intervention tasks on the MTurk platform. We showed participants de-identified EHR notes and asked them to answer comprehension questions related to the notes. We provided participants in the intervention group with EHR note content processed with NoteAid, while NoteAid was not available for participants in the control group. Results We recruited 94 Veterans and 181 non-Veterans. NoteAid leads to a significant improvement for non-Veterans but not for Veterans. Comparing Veterans recruited via MTurk with non-Veterans recruited via MTurk, we found that without NoteAid, Veterans have significantly higher raw scores than non-Veterans. This difference is not significant with NoteAid. Discussion That Veterans outperform a comparable population of non-Veterans is a surprising outcome. Without NoteAid, scores on the test are already high for Veterans, therefore, minimizing the ability of an intervention such as NoteAid to improve performance. With regards to Veterans, understanding the health literacy of Veterans has been an open question. We show here that Veterans score higher than a comparable, non-Veteran population. Conclusion Veterans on MTurk do not see improved scores when using NoteAid, but they already score high on the test, significantly higher than non-Veterans. When evaluating NoteAid, population specifics need to be considered, as performance may vary across groups. Future work investigating the effectiveness of NoteAid on improving comprehension with local Veterans and developing a more difficult test to assess groups with higher health literacy is needed.}, language = {en}, urldate = {2023-02-19}, journal = {International Journal of Medical Informatics}, author = {Lalor, John P. and Wu, Hao and Mazor, Kathleen M. and Yu, Hong}, month = apr, year = {2023}, keywords = {Electronic health records, Health information technology, Health literacy}, pages = {105006}, }
Objective Low health literacy is a concern among US Veterans. In this study, we evaluated NoteAid, a system that provides lay definitions to medical jargon terms in EHR notes to help Veterans comprehend EHR notes. We expected that low initial scores for Veterans would be improved by using NoteAid. Materials and Methods We recruited Veterans from the Amazon Mechanical Turk crowd work platform (MTurk). We also recruited non-Veterans from MTurk as a control group for comparison. We randomly split recruited MTurk Veteran participants into control and intervention groups. We recruited non-Veteran participants into mutually exclusive control or intervention tasks on the MTurk platform. We showed participants de-identified EHR notes and asked them to answer comprehension questions related to the notes. We provided participants in the intervention group with EHR note content processed with NoteAid, while NoteAid was not available for participants in the control group. Results We recruited 94 Veterans and 181 non-Veterans. NoteAid leads to a significant improvement for non-Veterans but not for Veterans. Comparing Veterans recruited via MTurk with non-Veterans recruited via MTurk, we found that without NoteAid, Veterans have significantly higher raw scores than non-Veterans. This difference is not significant with NoteAid. Discussion That Veterans outperform a comparable population of non-Veterans is a surprising outcome. Without NoteAid, scores on the test are already high for Veterans, therefore, minimizing the ability of an intervention such as NoteAid to improve performance. With regards to Veterans, understanding the health literacy of Veterans has been an open question. We show here that Veterans score higher than a comparable, non-Veteran population. Conclusion Veterans on MTurk do not see improved scores when using NoteAid, but they already score high on the test, significantly higher than non-Veterans. When evaluating NoteAid, population specifics need to be considered, as performance may vary across groups. Future work investigating the effectiveness of NoteAid on improving comprehension with local Veterans and developing a more difficult test to assess groups with higher health literacy is needed.
Context Variance Evaluation of Pretrained Language Models for Prompt-based Biomedical Knowledge Probing.
Yao, Z.; Cao, Y.; Yang, Z.; and Yu, H.
March 2023.
AMIA 2023 Informatics Summit, Seattle WA
Paper
link
bibtex
abstract
@misc{yao_context_2023, address = {Seattle WA, USA}, title = {Context {Variance} {Evaluation} of {Pretrained} {Language} {Models} for {Prompt}-based {Biomedical} {Knowledge} {Probing}}, url = {http://arxiv.org/abs/2211.10265}, abstract = {Pretrained language models (PLMs) have motivated research on what kinds of knowledge these models learn. Fill-in-the-blanks problem (e.g., cloze tests) is a natural approach for gauging such knowledge. BioLAMA generates prompts for biomedical factual knowledge triples and uses the Top-k accuracy metric to evaluate different PLMs' knowledge. However, existing research has shown that such prompt-based knowledge probing methods can only probe a lower bound of knowledge. Many factors like prompt-based probing biases make the LAMA benchmark unreliable and unstable. This problem is more prominent in BioLAMA. The severe long-tailed distribution in vocabulary and large-N-M relation make the performance gap between LAMA and BioLAMA remain notable. To address these, we introduce context variance into the prompt generation and propose a new rank-change-based evaluation metric. Different from the previous known-unknown evaluation criteria, we propose the concept of "Misunderstand" in LAMA for the first time. Through experiments on 12 PLMs, our context variance prompts and Understand-Confuse-Misunderstand (UCM) metric makes BioLAMA more friendly to large-N-M relations and rare relations. We also conducted a set of control experiments to disentangle "understand" from just "read and copy".}, urldate = {2022-12-17}, publisher = {arXiv}, author = {Yao, Zonghai and Cao, Yi and Yang, Zhichao and Yu, Hong}, month = mar, year = {2023}, note = {AMIA 2023 Informatics Summit, Seattle WA}, keywords = {Computer Science - Artificial Intelligence, Computer Science - Computation and Language}, }
Pretrained language models (PLMs) have motivated research on what kinds of knowledge these models learn. Fill-in-the-blanks problem (e.g., cloze tests) is a natural approach for gauging such knowledge. BioLAMA generates prompts for biomedical factual knowledge triples and uses the Top-k accuracy metric to evaluate different PLMs' knowledge. However, existing research has shown that such prompt-based knowledge probing methods can only probe a lower bound of knowledge. Many factors like prompt-based probing biases make the LAMA benchmark unreliable and unstable. This problem is more prominent in BioLAMA. The severe long-tailed distribution in vocabulary and large-N-M relation make the performance gap between LAMA and BioLAMA remain notable. To address these, we introduce context variance into the prompt generation and propose a new rank-change-based evaluation metric. Different from the previous known-unknown evaluation criteria, we propose the concept of "Misunderstand" in LAMA for the first time. Through experiments on 12 PLMs, our context variance prompts and Understand-Confuse-Misunderstand (UCM) metric makes BioLAMA more friendly to large-N-M relations and rare relations. We also conducted a set of control experiments to disentangle "understand" from just "read and copy".
Multi-label Few-shot ICD Coding as Autoregressive Generation with Prompt.
Yang, Z.; Kwon, S.; Yao, Z.; and Yu, H.
February 2023.
AAAI 2023, Washington DC
Paper
link
bibtex
abstract
@misc{yang_multi-label_2023, address = {Washington DC USA}, title = {Multi-label {Few}-shot {ICD} {Coding} as {Autoregressive} {Generation} with {Prompt}}, url = {http://arxiv.org/abs/2211.13813}, abstract = {Automatic International Classification of Diseases (ICD) coding aims to assign multiple ICD codes to a medical note with an average of 3,000+ tokens. This task is challenging due to the high-dimensional space of multi-label assignment (155,000+ ICD code candidates) and the long-tail challenge - Many ICD codes are infrequently assigned yet infrequent ICD codes are important clinically. This study addresses the long-tail challenge by transforming this multi-label classification task into an autoregressive generation task. Specifically, we first introduce a novel pretraining objective to generate free text diagnoses and procedure using the SOAP structure, the medical logic physicians use for note documentation. Second, instead of directly predicting the high dimensional space of ICD codes, our model generates the lower dimension of text descriptions, which then infer ICD codes. Third, we designed a novel prompt template for multi-label classification. We evaluate our Generation with Prompt model with the benchmark of all code assignment (MIMIC-III-full) and few shot ICD code assignment evaluation benchmark (MIMIC-III-few). Experiments on MIMIC-III-few show that our model performs with a marco F1 30.2, which substantially outperforms the previous MIMIC-III-full SOTA model (marco F1 4.3) and the model specifically designed for few/zero shot setting (marco F1 18.7). Finally, we design a novel ensemble learner, a cross attention reranker with prompts, to integrate previous SOTA and our best few-shot coding predictions. Experiments on MIMIC-III-full show that our ensemble learner substantially improves both macro and micro F1, from 10.4 to 14.6 and from 58.2 to 59.1, respectively.}, urldate = {2022-12-18}, publisher = {arXiv}, author = {Yang, Zhichao and Kwon, Sunjae and Yao, Zonghai and Yu, Hong}, month = feb, year = {2023}, note = {AAAI 2023, Washington DC}, keywords = {Computer Science - Artificial Intelligence, Computer Science - Computation and Language}, }
Automatic International Classification of Diseases (ICD) coding aims to assign multiple ICD codes to a medical note with an average of 3,000+ tokens. This task is challenging due to the high-dimensional space of multi-label assignment (155,000+ ICD code candidates) and the long-tail challenge - Many ICD codes are infrequently assigned yet infrequent ICD codes are important clinically. This study addresses the long-tail challenge by transforming this multi-label classification task into an autoregressive generation task. Specifically, we first introduce a novel pretraining objective to generate free text diagnoses and procedure using the SOAP structure, the medical logic physicians use for note documentation. Second, instead of directly predicting the high dimensional space of ICD codes, our model generates the lower dimension of text descriptions, which then infer ICD codes. Third, we designed a novel prompt template for multi-label classification. We evaluate our Generation with Prompt model with the benchmark of all code assignment (MIMIC-III-full) and few shot ICD code assignment evaluation benchmark (MIMIC-III-few). Experiments on MIMIC-III-few show that our model performs with a marco F1 30.2, which substantially outperforms the previous MIMIC-III-full SOTA model (marco F1 4.3) and the model specifically designed for few/zero shot setting (marco F1 18.7). Finally, we design a novel ensemble learner, a cross attention reranker with prompts, to integrate previous SOTA and our best few-shot coding predictions. Experiments on MIMIC-III-full show that our ensemble learner substantially improves both macro and micro F1, from 10.4 to 14.6 and from 58.2 to 59.1, respectively.
H4H: A Comprehensive Repository of Housing Resources for Homelessness.
Osebe, S.; Tsai, J.; and Yu, H.
In Seattle WA, USA, March 2023.
AMIA 2023 Informatics Summit, Seattle WA
link bibtex
link bibtex
@inproceedings{osebe_h4h_2023, address = {Seattle WA, USA}, title = {{H4H}: {A} {Comprehensive} {Repository} of {Housing} {Resources} for {Homelessness}}, author = {Osebe, Samuel and Tsai, Jack and Yu, Hong}, month = mar, year = {2023}, note = {AMIA 2023 Informatics Summit, Seattle WA}, }
2022
(27)
Automated Identification of Eviction Status from Electronic Health Record Notes.
Yao, Z.; Tsai, J.; Liu, W.; Levy, D. A.; Druhl, E.; Reisman, J. I.; and Yu, H.
December 2022.
arXiv:2212.02762 [cs]
Paper
doi
link
bibtex
abstract
@misc{yao_automated_2022, title = {Automated {Identification} of {Eviction} {Status} from {Electronic} {Health} {Record} {Notes}}, url = {http://arxiv.org/abs/2212.02762}, doi = {10.48550/arXiv.2212.02762}, abstract = {Objective: Evictions are involved in a cascade of negative events that can lead to unemployment, homelessness, long-term poverty, and mental health problems. In this study, we developed a natural language processing system to automatically detect eviction incidences and their attributes from electronic health record (EHR) notes. Materials and Methods: We annotated eviction status in 5000 EHR notes from the Veterans Health Administration. We developed a novel model, called Knowledge Injection based on Ripple Effects of Social and Behavioral Determinants of Health (KIRESH), that has shown to substantially outperform other state-of-the-art models such as fine-tuning pre-trained language models like BioBERT and Bio\_ClinicalBERT. Moreover, we designed a prompt to further improve the model performance by using the intrinsic connection between the two sub-tasks of eviction presence and period prediction. Finally, we used the Temperature Scaling-based Calibration on our KIRESH-Prompt method to avoid over-confidence issues arising from the imbalance dataset. Results: KIRESH-Prompt achieved a Macro-F1 of 0.6273 (presence) and 0.7115 (period), which was significantly higher than 0.5382 (presence) and 0.67167 (period) for just fine-tuning Bio\_ClinicalBERT model. Conclusion and Future Work: KIRESH-Prompt has substantially improved eviction status classification. In future work, we will evaluate the generalizability of the model framework to other applications.}, urldate = {2023-02-19}, publisher = {arXiv}, author = {Yao, Zonghai and Tsai, Jack and Liu, Weisong and Levy, David A. and Druhl, Emily and Reisman, Joel I. and Yu, Hong}, month = dec, year = {2022}, note = {arXiv:2212.02762 [cs]}, keywords = {Computer Science - Computation and Language}, }
Objective: Evictions are involved in a cascade of negative events that can lead to unemployment, homelessness, long-term poverty, and mental health problems. In this study, we developed a natural language processing system to automatically detect eviction incidences and their attributes from electronic health record (EHR) notes. Materials and Methods: We annotated eviction status in 5000 EHR notes from the Veterans Health Administration. We developed a novel model, called Knowledge Injection based on Ripple Effects of Social and Behavioral Determinants of Health (KIRESH), that has shown to substantially outperform other state-of-the-art models such as fine-tuning pre-trained language models like BioBERT and Bio_ClinicalBERT. Moreover, we designed a prompt to further improve the model performance by using the intrinsic connection between the two sub-tasks of eviction presence and period prediction. Finally, we used the Temperature Scaling-based Calibration on our KIRESH-Prompt method to avoid over-confidence issues arising from the imbalance dataset. Results: KIRESH-Prompt achieved a Macro-F1 of 0.6273 (presence) and 0.7115 (period), which was significantly higher than 0.5382 (presence) and 0.67167 (period) for just fine-tuning Bio_ClinicalBERT model. Conclusion and Future Work: KIRESH-Prompt has substantially improved eviction status classification. In future work, we will evaluate the generalizability of the model framework to other applications.
Enhancing the prediction of disease outcomes using electronic health records and pretrained deep learning models.
Yang, Z.; Liu, W.; Berlowitz, D.; and Yu, H.
December 2022.
arXiv:2212.12067 [cs]
Paper
doi
link
bibtex
abstract
@misc{yang_enhancing_2022, title = {Enhancing the prediction of disease outcomes using electronic health records and pretrained deep learning models}, url = {http://arxiv.org/abs/2212.12067}, doi = {10.48550/arXiv.2212.12067}, abstract = {Question: Can an encoder-decoder architecture pretrained on a large dataset of longitudinal electronic health records improves patient outcome predictions? Findings: In this prognostic study of 6.8 million patients, our denoising sequence-to-sequence prediction model of multiple outcomes outperformed state-of-the-art models scuh pretrained BERT on a broad range of patient outcomes, including intentional self-harm and pancreatic cancer. Meaning: Deep bidirectional and autoregressive representation improves patient outcome prediction.}, urldate = {2023-02-19}, publisher = {arXiv}, author = {Yang, Zhichao and Liu, Weisong and Berlowitz, Dan and Yu, Hong}, month = dec, year = {2022}, note = {arXiv:2212.12067 [cs]}, keywords = {Computer Science - Artificial Intelligence, Computer Science - Computers and Society, Computer Science - Machine Learning}, }
Question: Can an encoder-decoder architecture pretrained on a large dataset of longitudinal electronic health records improves patient outcome predictions? Findings: In this prognostic study of 6.8 million patients, our denoising sequence-to-sequence prediction model of multiple outcomes outperformed state-of-the-art models scuh pretrained BERT on a broad range of patient outcomes, including intentional self-harm and pancreatic cancer. Meaning: Deep bidirectional and autoregressive representation improves patient outcome prediction.
Geographic Disparities in Prevalence of Opioid Use Disorders in US Veterans.
Li, W.; Leon, C.; Liu, W.; Sung, M. L.; Kerns, R. D.; Becker, W. C.; and Yu, H.
In Boston MA, November 2022.
APHA 2022 Annual Meeting and Expo
link bibtex
link bibtex
@inproceedings{li_geographic_2022, address = {Boston MA}, title = {Geographic {Disparities} in {Prevalence} of {Opioid} {Use} {Disorders} in {US} {Veterans}}, author = {Li, Weijun and Leon, Casey and Liu, Weisong and Sung, Minhee L. and Kerns, Robert D. and Becker, William C. and Yu, Hong}, month = nov, year = {2022}, note = {APHA 2022 Annual Meeting and Expo}, }
Prevalence of Frailty and Associations with Oral Anticoagulant Prescribing in Atrial Fibrillation.
Sanghai, S. R.; Liu, W.; Wang, W.; Rongali, S.; Orkaby, A. R.; Saczynski, J. S.; Rose, A. J.; Kapoor, A.; Li, W.; Yu, H.; and McManus, D. D.
Journal of General Internal Medicine, 37(4): 730–736. March 2022.
Paper
doi
link
bibtex
abstract
@article{sanghai_prevalence_2022, title = {Prevalence of {Frailty} and {Associations} with {Oral} {Anticoagulant} {Prescribing} in {Atrial} {Fibrillation}}, volume = {37}, issn = {1525-1497}, url = {https://doi.org/10.1007/s11606-021-06834-1}, doi = {10.1007/s11606-021-06834-1}, abstract = {Frailty is often cited as a factor influencing oral anticoagulation (OAC) prescription in patients with non-valvular atrial fibrillation (NVAF). We sought to determine the prevalence of frailty and its association with OAC prescription in older veterans with NVAF.}, language = {en}, number = {4}, urldate = {2022-12-13}, journal = {Journal of General Internal Medicine}, author = {Sanghai, Saket R. and Liu, Weisong and Wang, Weijia and Rongali, Subendhu and Orkaby, Ariela R. and Saczynski, Jane S. and Rose, Adam J. and Kapoor, Alok and Li, Wenjun and Yu, Hong and McManus, David D.}, month = mar, year = {2022}, keywords = {atrial fibrillation, frailty, oral anticoagulation}, pages = {730--736}, }
Frailty is often cited as a factor influencing oral anticoagulation (OAC) prescription in patients with non-valvular atrial fibrillation (NVAF). We sought to determine the prevalence of frailty and its association with OAC prescription in older veterans with NVAF.
Learning as Conversation: Dialogue Systems Reinforced for Information Acquisition.
Cai, P.; Wan, H.; Liu, F.; Yu, M.; Yu, H.; and Joshi, S.
In Seattle WA, USA, July 2022.
NAACL 2022
Paper
link
bibtex
@inproceedings{cai_learning_2022, address = {Seattle WA, USA}, title = {Learning as {Conversation}: {Dialogue} {Systems} {Reinforced} for {Information} {Acquisition}}, shorttitle = {{NAACL} 2022}, url = {https://www.semanticscholar.org/reader/ea6b152a07dcd2e4ff6c4646d8efe1314346793c}, author = {Cai, Pengshan and Wan, Hui and Liu, Fei and Yu, Mo and Yu, Hong and Joshi, Sachindra}, month = jul, year = {2022}, note = {NAACL 2022}, }
Using data science to improve outcomes for persons with opioid use disorder.
Hayes, C. J.; Cucciare, M. A.; Martin, B. C.; Hudson, T. J.; Bush, K.; Lo-Ciganic, W.; Yu, H.; Charron, E.; and Gordon, A. J.
Substance Abuse, 43(1): 956–963. 2022.
Paper
doi
link
bibtex
abstract
@article{hayes_using_2022, title = {Using data science to improve outcomes for persons with opioid use disorder}, volume = {43}, issn = {1547-0164}, url = {https://pubmed.ncbi.nlm.nih.gov/35420927/}, doi = {10.1080/08897077.2022.2060446}, abstract = {Medication treatment for opioid use disorder (MOUD) is an effective evidence-based therapy for decreasing opioid-related adverse outcomes. Effective strategies for retaining persons on MOUD, an essential step to improving outcomes, are needed as roughly half of all persons initiating MOUD discontinue within a year. Data science may be valuable and promising for improving MOUD retention by using "big data" (e.g., electronic health record data, claims data mobile/sensor data, social media data) and specific machine learning techniques (e.g., predictive modeling, natural language processing, reinforcement learning) to individualize patient care. Maximizing the utility of data science to improve MOUD retention requires a three-pronged approach: (1) increasing funding for data science research for OUD, (2) integrating data from multiple sources including treatment for OUD and general medical care as well as data not specific to medical care (e.g., mobile, sensor, and social media data), and (3) applying multiple data science approaches with integrated big data to provide insights and optimize advances in the OUD and overall addiction fields.}, language = {eng}, number = {1}, journal = {Substance Abuse}, author = {Hayes, Corey J. and Cucciare, Michael A. and Martin, Bradley C. and Hudson, Teresa J. and Bush, Keith and Lo-Ciganic, Weihsuan and Yu, Hong and Charron, Elizabeth and Gordon, Adam J.}, year = {2022}, pmid = {35420927 PMCID: PMC9705076}, keywords = {Opioid-related disorders, big data, machine learning}, pages = {956--963}, }
Medication treatment for opioid use disorder (MOUD) is an effective evidence-based therapy for decreasing opioid-related adverse outcomes. Effective strategies for retaining persons on MOUD, an essential step to improving outcomes, are needed as roughly half of all persons initiating MOUD discontinue within a year. Data science may be valuable and promising for improving MOUD retention by using "big data" (e.g., electronic health record data, claims data mobile/sensor data, social media data) and specific machine learning techniques (e.g., predictive modeling, natural language processing, reinforcement learning) to individualize patient care. Maximizing the utility of data science to improve MOUD retention requires a three-pronged approach: (1) increasing funding for data science research for OUD, (2) integrating data from multiple sources including treatment for OUD and general medical care as well as data not specific to medical care (e.g., mobile, sensor, and social media data), and (3) applying multiple data science approaches with integrated big data to provide insights and optimize advances in the OUD and overall addiction fields.
Extracting Biomedical Factual Knowledge Using Pretrained Language Model and Electronic Health Record Context.
Yao, Z.; Cao, Y.; Yang, Z.; Deshpande, V.; and Yu, H.
In Washington DC USA, November 2022.
AMIA Annual Symposium
Paper
link
bibtex
abstract
@inproceedings{yao_extracting_2022, address = {Washington DC USA}, title = {Extracting {Biomedical} {Factual} {Knowledge} {Using} {Pretrained} {Language} {Model} and {Electronic} {Health} {Record} {Context}}, url = {https://arxiv.org/ftp/arxiv/papers/2209/2209.07859.pdf}, abstract = {Language Models (LMs) have performed well on biomedical natural language processing applications. In this study, we conducted some experiments to use prompt methods to extract knowledge from LMs as new knowledge Bases (LMs as KBs). However, prompting can only be used as a low bound for knowledge extraction, and perform particularly poorly on biomedical domain KBs. In order to make LMs as KBs more in line with the actual application scenarios of the biomedical domain, we specifically add EHR notes as context to the prompt to improve the low bound in the biomedical domain. We design and validate a series of experiments for our Dynamic-Context-BioLAMA task. Our experiments show that the knowledge possessed by those language models can distinguish the correct knowledge from the noise knowledge in the EHR notes, and such distinguishing ability can also be used as a new metric to evaluate the amount of knowledge possessed by the model.}, language = {en}, author = {Yao, Zonghai and Cao, Yi and Yang, Zhichao and Deshpande, Vijeta and Yu, Hong}, month = nov, year = {2022}, note = {AMIA Annual Symposium}, }
Language Models (LMs) have performed well on biomedical natural language processing applications. In this study, we conducted some experiments to use prompt methods to extract knowledge from LMs as new knowledge Bases (LMs as KBs). However, prompting can only be used as a low bound for knowledge extraction, and perform particularly poorly on biomedical domain KBs. In order to make LMs as KBs more in line with the actual application scenarios of the biomedical domain, we specifically add EHR notes as context to the prompt to improve the low bound in the biomedical domain. We design and validate a series of experiments for our Dynamic-Context-BioLAMA task. Our experiments show that the knowledge possessed by those language models can distinguish the correct knowledge from the noise knowledge in the EHR notes, and such distinguishing ability can also be used as a new metric to evaluate the amount of knowledge possessed by the model.
Generation of Patient After-Visit Summaries to Support Physicians.
Cai, P.; Liu, F.; Bajracharya, A.; Sills, J.; Kapoor, A.; Liu, W.; Berlowitz, D.; Levy, D.; Pradhan, R.; and Yu, H.
In Proceedings of the 29th International Conference on Computational Linguistics, pages 6234–6247, Gyeongju, Republic of Korea, October 2022. International Committee on Computational Linguistics
Paper
link
bibtex
abstract
@inproceedings{cai_generation_2022, address = {Gyeongju, Republic of Korea}, title = {Generation of {Patient} {After}-{Visit} {Summaries} to {Support} {Physicians}}, url = {https://aclanthology.org/2022.coling-1.544}, abstract = {An after-visit summary (AVS) is a summary note given to patients after their clinical visit. It recaps what happened during their clinical visit and guides patients' disease self-management. Studies have shown that a majority of patients found after-visit summaries useful. However, many physicians face excessive workloads and do not have time to write clear and informative summaries. In this paper, we study the problem of automatic generation of after-visit summaries and examine whether those summaries can convey the gist of clinical visits. We report our findings on a new clinical dataset that contains a large number of electronic health record (EHR) notes and their associated summaries. Our results suggest that generation of lay language after-visit summaries remains a challenging task. Crucially, we introduce a feedback mechanism that alerts physicians when an automatic summary fails to capture the important details of the clinical notes or when it contains hallucinated facts that are potentially detrimental to the summary quality. Automatic and human evaluation demonstrates the effectiveness of our approach in providing writing feedback and supporting physicians.}, urldate = {2022-12-18}, booktitle = {Proceedings of the 29th {International} {Conference} on {Computational} {Linguistics}}, publisher = {International Committee on Computational Linguistics}, author = {Cai, Pengshan and Liu, Fei and Bajracharya, Adarsha and Sills, Joe and Kapoor, Alok and Liu, Weisong and Berlowitz, Dan and Levy, David and Pradhan, Richeek and Yu, Hong}, month = oct, year = {2022}, pages = {6234--6247}, }
An after-visit summary (AVS) is a summary note given to patients after their clinical visit. It recaps what happened during their clinical visit and guides patients' disease self-management. Studies have shown that a majority of patients found after-visit summaries useful. However, many physicians face excessive workloads and do not have time to write clear and informative summaries. In this paper, we study the problem of automatic generation of after-visit summaries and examine whether those summaries can convey the gist of clinical visits. We report our findings on a new clinical dataset that contains a large number of electronic health record (EHR) notes and their associated summaries. Our results suggest that generation of lay language after-visit summaries remains a challenging task. Crucially, we introduce a feedback mechanism that alerts physicians when an automatic summary fails to capture the important details of the clinical notes or when it contains hallucinated facts that are potentially detrimental to the summary quality. Automatic and human evaluation demonstrates the effectiveness of our approach in providing writing feedback and supporting physicians.
Knowledge Injected Prompt Based Fine-tuning for Multi-label Few-shot ICD Coding.
Yang, Z.; Wang, S.; Rawat, B. P. S.; Mitra, A.; and Yu, H.
In Abu Dhabi, United Arab Emirates, December 2022.
Findings of the Association for Computational Linguistics: EMNLP 2022
Paper
link
bibtex
@inproceedings{yang_knowledge_2022, address = {Abu Dhabi, United Arab Emirates}, title = {Knowledge {Injected} {Prompt} {Based} {Fine}-tuning for {Multi}-label {Few}-shot {ICD} {Coding}}, url = {https://arxiv.org/pdf/2210.03304.pdf}, author = {Yang, Zhichao and Wang, Shufan and Rawat, Bhanu Pratap Singh and Mitra, Avijit and Yu, Hong}, month = dec, year = {2022}, note = {Findings of the Association for Computational Linguistics: EMNLP 2022}, }
ScAN: Suicide Attempt and Ideation Events Dataset.
Rawat, B. P. S.; Kovaly, S.; Pigeon, W. R.; and Yu, H.
July 2022.
NAACL 2022
Paper
link
bibtex
abstract
@misc{rawat_scan_2022, address = {Seattle WA, USA}, title = {{ScAN}: {Suicide} {Attempt} and {Ideation} {Events} {Dataset}}, shorttitle = {{ScAN}}, url = {http://arxiv.org/abs/2205.07872}, abstract = {Suicide is an important public health concern and one of the leading causes of death worldwide. Suicidal behaviors, including suicide attempts (SA) and suicide ideations (SI), are leading risk factors for death by suicide. Information related to patients' previous and current SA and SI are frequently documented in the electronic health record (EHR) notes. Accurate detection of such documentation may help improve surveillance and predictions of patients' suicidal behaviors and alert medical professionals for suicide prevention efforts. In this study, we first built Suicide Attempt and Ideation Events (ScAN) dataset, a subset of the publicly available MIMIC III dataset spanning over 12k+ EHR notes with 19k+ annotated SA and SI events information. The annotations also contain attributes such as method of suicide attempt. We also provide a strong baseline model ScANER (Suicide Attempt and Ideation Events Retriever), a multi-task RoBERTa-based model with a retrieval module to extract all the relevant suicidal behavioral evidences from EHR notes of an hospital-stay and, and a prediction module to identify the type of suicidal behavior (SA and SI) concluded during the patient's stay at the hospital. ScANER achieved a macro-weighted F1-score of 0.83 for identifying suicidal behavioral evidences and a macro F1-score of 0.78 and 0.60 for classification of SA and SI for the patient's hospital-stay, respectively. ScAN and ScANER are publicly available.}, urldate = {2022-12-17}, publisher = {arXiv}, author = {Rawat, Bhanu Pratap Singh and Kovaly, Samuel and Pigeon, Wilfred R. and Yu, Hong}, month = jul, year = {2022}, note = {NAACL 2022}, keywords = {Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning}, }
Suicide is an important public health concern and one of the leading causes of death worldwide. Suicidal behaviors, including suicide attempts (SA) and suicide ideations (SI), are leading risk factors for death by suicide. Information related to patients' previous and current SA and SI are frequently documented in the electronic health record (EHR) notes. Accurate detection of such documentation may help improve surveillance and predictions of patients' suicidal behaviors and alert medical professionals for suicide prevention efforts. In this study, we first built Suicide Attempt and Ideation Events (ScAN) dataset, a subset of the publicly available MIMIC III dataset spanning over 12k+ EHR notes with 19k+ annotated SA and SI events information. The annotations also contain attributes such as method of suicide attempt. We also provide a strong baseline model ScANER (Suicide Attempt and Ideation Events Retriever), a multi-task RoBERTa-based model with a retrieval module to extract all the relevant suicidal behavioral evidences from EHR notes of an hospital-stay and, and a prediction module to identify the type of suicidal behavior (SA and SI) concluded during the patient's stay at the hospital. ScANER achieved a macro-weighted F1-score of 0.83 for identifying suicidal behavioral evidences and a macro F1-score of 0.78 and 0.60 for classification of SA and SI for the patient's hospital-stay, respectively. ScAN and ScANER are publicly available.
MedJEx: A Medical Jargon Extraction Model with Wiki's Hyperlink Span and Contextualized Masked Language Model Score.
Kwon, S.; Yao, Z.; Jordan, H. S.; Levy, D. A.; Corner, B.; and Yu, H.
December 2022.
Number: arXiv:2210.05875 arXiv:2210.05875 [cs] The 2022 Conference on Empirical Methods in Natural Language Processing
Paper
link
bibtex
abstract
@misc{kwon_medjex_2022, address = {Abu Dhabi, United Arab Emirates}, title = {{MedJEx}: {A} {Medical} {Jargon} {Extraction} {Model} with {Wiki}'s {Hyperlink} {Span} and {Contextualized} {Masked} {Language} {Model} {Score}}, shorttitle = {{MedJEx}}, url = {http://arxiv.org/abs/2210.05875}, abstract = {This paper proposes a new natural language processing (NLP) application for identifying medical jargon terms potentially difficult for patients to comprehend from electronic health record (EHR) notes. We first present a novel and publicly available dataset with expert-annotated medical jargon terms from 18K+ EHR note sentences (\$MedJ\$). Then, we introduce a novel medical jargon extraction (\$MedJEx\$) model which has been shown to outperform existing state-of-the-art NLP models. First, MedJEx improved the overall performance when it was trained on an auxiliary Wikipedia hyperlink span dataset, where hyperlink spans provide additional Wikipedia articles to explain the spans (or terms), and then fine-tuned on the annotated MedJ data. Secondly, we found that a contextualized masked language model score was beneficial for detecting domain-specific unfamiliar jargon terms. Moreover, our results show that training on the auxiliary Wikipedia hyperlink span datasets improved six out of eight biomedical named entity recognition benchmark datasets. Both MedJ and MedJEx are publicly available.}, urldate = {2022-12-17}, publisher = {arXiv}, author = {Kwon, Sunjae and Yao, Zonghai and Jordan, Harmon S. and Levy, David A. and Corner, Brian and Yu, Hong}, month = dec, year = {2022}, note = {Number: arXiv:2210.05875 arXiv:2210.05875 [cs] The 2022 Conference on Empirical Methods in Natural Language Processing}, keywords = {Computer Science - Computation and Language}, }
This paper proposes a new natural language processing (NLP) application for identifying medical jargon terms potentially difficult for patients to comprehend from electronic health record (EHR) notes. We first present a novel and publicly available dataset with expert-annotated medical jargon terms from 18K+ EHR note sentences ($MedJ$). Then, we introduce a novel medical jargon extraction ($MedJEx$) model which has been shown to outperform existing state-of-the-art NLP models. First, MedJEx improved the overall performance when it was trained on an auxiliary Wikipedia hyperlink span dataset, where hyperlink spans provide additional Wikipedia articles to explain the spans (or terms), and then fine-tuned on the annotated MedJ data. Secondly, we found that a contextualized masked language model score was beneficial for detecting domain-specific unfamiliar jargon terms. Moreover, our results show that training on the auxiliary Wikipedia hyperlink span datasets improved six out of eight biomedical named entity recognition benchmark datasets. Both MedJ and MedJEx are publicly available.
An Investigation of Social Determinants of Health in UMLS.
Rawat, B. P. S.; and Yu, H.
In Houston TX USA, May 2022.
AMIA Clinical Informatics 2022
link bibtex
link bibtex
@inproceedings{rawat_investigation_2022, address = {Houston TX USA}, title = {An {Investigation} of {Social} {Determinants} of {Health} in {UMLS}}, author = {Rawat, Bhanu Pratap Singh and Yu, Hong}, month = may, year = {2022}, note = {AMIA Clinical Informatics 2022}, }
Generating Coherent Narratives with Subtopic Planning to Answer How-to Questions.
Cai, P.; Yu, M.; Liu, F.; and Yu, H.
In Abu Dhabi, December 2022.
The GEM Workshop at EMNLP 2022
link bibtex
link bibtex
@inproceedings{cai_generating_2022, address = {Abu Dhabi}, title = {Generating {Coherent} {Narratives} with {Subtopic} {Planning} to {Answer} {How}-to {Questions}}, author = {Cai, Pengshan and Yu, Mo and Liu, Fei and Yu, Hong}, month = dec, year = {2022}, note = {The GEM Workshop at EMNLP 2022}, }
Parameter Efficient Transfer Learning for Suicide Attempt and Ideation Detection.
Rawat, B. P. S.; and Yu, H.
In Abu Dhabi, December 2022.
LOUHI 2022
link bibtex
link bibtex
@inproceedings{rawat_parameter_2022, address = {Abu Dhabi}, title = {Parameter {Efficient} {Transfer} {Learning} for {Suicide} {Attempt} and {Ideation} {Detection}}, author = {Rawat, Bhanu Pratap Singh and Yu, Hong}, month = dec, year = {2022}, note = {LOUHI 2022}, }
UMass A&P: An Assessment and Plan Reasoning System of UMass in the 2022 N2C2 Challenge.
Kwon, S.; Yang, Z.; and Yu, H.
November 2022.
2022 n2c2 Workshop, Washington DC
link bibtex
link bibtex
@misc{kwon_umass_2022, address = {Washington DC USA}, title = {{UMass} {A}\&{P}: {An} {Assessment} and {Plan} {Reasoning} {System} of {UMass} in the 2022 {N2C2} {Challenge}}, author = {Kwon, Sunjae and Yang, Zhichao and Yu, Hong}, month = nov, year = {2022}, note = {2022 n2c2 Workshop, Washington DC}, }
Racial differences in receipt of medications for opioid use disorder before and during the COVID-19 pandemic in the Veterans Health Administration.
Sung, M. L.; Li, W.; León, C.; Reisman, J.; Liu, W.; Kerns, R. D.; Yu, H.; and Becker, W. C.
November 2022.
APHA 2022 Annual Meeting and Expo, Boston MA
link bibtex
link bibtex
@misc{sung_racial_2022, address = {Boston MA, USA}, title = {Racial differences in receipt of medications for opioid use disorder before and during the {COVID}-19 pandemic in the {Veterans} {Health} {Administration}}, author = {Sung, Minhee L. and Li, Wenjun and León, Casey and Reisman, Joel and Liu, Weisong and Kerns, Robert D. and Yu, Hong and Becker, William C.}, month = nov, year = {2022}, note = {APHA 2022 Annual Meeting and Expo, Boston MA}, }
Using Machine Learning to Predict Opioid Overdose Using Electronic Health Record.
Wang, X.; Li, R.; Druhl, E.; Li, W.; Sung, M. L.; Kerns, R. D.; Becker, W. C.; and Yu, H.
November 2022.
APHA 2022 Annual Meeting and Expo, Boston MA
link bibtex
link bibtex
@misc{wang_using_2022, address = {Boston MA, USA}, title = {Using {Machine} {Learning} to {Predict} {Opioid} {Overdose} {Using} {Electronic} {Health} {Record}}, author = {Wang, Xun and Li, Rumeng and Druhl, Emily and Li, Wenjun and Sung, Minhee L. and Kerns, Robert D. and Becker, William C. and Yu, Hong}, month = nov, year = {2022}, note = {APHA 2022 Annual Meeting and Expo, Boston MA}, }
Automatically Detecting Opioid-Related Aberrant Behaviors from Electronic Health Records.
Wang, X.; Li, R.; Lingeman, J. M.; Druhl, E.; Li, W.; Sung, M. L.; Kerns, R. D.; Becker, W. C.; and Yu, H.
November 2022.
APHA 2022 Annual Meeting and Expo, Boston MA
link bibtex
link bibtex
@misc{wang_automatically_2022, address = {Boston MA, USA}, title = {Automatically {Detecting} {Opioid}-{Related} {Aberrant} {Behaviors} from {Electronic} {Health} {Records}}, author = {Wang, Xun and Li, Rumeng and Lingeman, Jesse M. and Druhl, Emily and Li, Wenjun and Sung, Minhee L. and Kerns, Robert D. and Becker, William C. and Yu, Hong}, month = nov, year = {2022}, note = {APHA 2022 Annual Meeting and Expo, Boston MA}, }
An Investigation of the Representation of Social Determinants of Health in the UMLS.
Rawat, B. P. S.; Keating, H.; Goodwin, R.; Druhl, E. B.; and Yu, H.
In Washington, D.C., November 2022.
AMIA 2022 Annual Symposium
link bibtex
link bibtex
@inproceedings{rawat_investigation_2022-1, address = {Washington, D.C.}, title = {An {Investigation} of the {Representation} of {Social} {Determinants} of {Health} in the {UMLS}}, author = {Rawat, Bhanu Pratap Singh and Keating, Heather and Goodwin, Raelene and Druhl, Emily B. and Yu, Hong}, month = nov, year = {2022}, note = {AMIA 2022 Annual Symposium}, }
Pretraining of Patient Representations On Structured Electronic Health Records for Patient Outcome Prediction: case study as self-harm screening tool.
Yang, Z.; and Hong, Y.
In Washington DC USA, June 2022.
ARM2022
link bibtex
link bibtex
@inproceedings{yang_pretraining_2022, address = {Washington DC USA}, title = {Pretraining of {Patient} {Representations} {On} {Structured} {Electronic} {Health} {Records} for {Patient} {Outcome} {Prediction}: case study as self-harm screening tool}, shorttitle = {{ARM} 2022}, author = {Yang, Zhichao and Hong, Yu}, month = jun, year = {2022}, note = {ARM2022}, }
Risk Factors Associated with Nonfatal Opioid Overdose Leading to Intensive Care Unit Admission: A Cross-Sectional Study.
Mitra, A.; Ahsan, H.; Li, W.; Liu, W.; Kerns, R. D.; Tsai, J.; Becker, W. C.; Smelson, D. A.; and Yu, H.
In Washington DC USA, June 2022.
ARM 2022
link bibtex
link bibtex
@inproceedings{mitra_risk_2022, address = {Washington DC USA}, title = {Risk {Factors} {Associated} with {Nonfatal} {Opioid} {Overdose} {Leading} to {Intensive} {Care} {Unit} {Admission}: {A} {Cross}-{Sectional} {Study}}, shorttitle = {{ARM} 2022}, author = {Mitra, Avijit and Ahsan, Hiba and Li, Wenjun and Liu, Weisong and Kerns, Robert D. and Tsai, Jack and Becker, William C. and Smelson, David A. and Yu, Hong}, month = jun, year = {2022}, note = {ARM 2022}, }
SBDH and Suicide: A Multi-Task Learning Framework for SBDH Detection in Electronic Health Records Using NLP.
Mitra, A.; Rawat, B. P. S.; Druhl, E. B.; Keating, H.; Goodwin, R.; Hu, W.; Liu, W.; Tsai, J.; Smelson, D. A.; and Yu, H.
In Washington DC USA, June 2022.
ARM 2022
link bibtex
link bibtex
@inproceedings{mitra_sbdh_2022, address = {Washington DC USA}, title = {{SBDH} and {Suicide}: {A} {Multi}-{Task} {Learning} {Framework} for {SBDH} {Detection} in {Electronic} {Health} {Records} {Using} {NLP}}, shorttitle = {{ARM} 2022}, author = {Mitra, Avijit and Rawat, Bhanu Pratap Singh and Druhl, Emily B. and Keating, Heather and Goodwin, Raelene and Hu, Wen and Liu, Weisong and Tsai, Jack and Smelson, David A. and Yu, Hong}, month = jun, year = {2022}, note = {ARM 2022}, }
Studying Association of Traumatic Brain Injury and Posttraumatic Stress Disorder Diagnoses with Hospitalized Self-Harm Among US Veterans, 2008-2017.
Rawat, B. P. S.; Reisman, J.; Rongali, S.; Liu, W.; Yu, H.; and Carlson, K.
In Washington DC USA, June 2022.
ARM 2022 (Poster)
link bibtex
link bibtex
@inproceedings{rawat_studying_2022, address = {Washington DC USA}, title = {Studying {Association} of {Traumatic} {Brain} {Injury} and {Posttraumatic} {Stress} {Disorder} {Diagnoses} with {Hospitalized} {Self}-{Harm} {Among} {US} {Veterans}, 2008-2017}, shorttitle = {{ARM} 2022}, author = {Rawat, Bhanu Pratap Singh and Reisman, Joel and Rongali, Subendhu and Liu, Weisong and Yu, Hong and Carlson, Kathleen}, month = jun, year = {2022}, note = {ARM 2022 (Poster)}, }
NLP and Annie App for Social Determinants of Health.
Mahapatra, S.; Chen, H.; Tsai, J.; and Yu, H.
In Houston TX USA, May 2022.
AMIA Clinical Informatics 2022
link bibtex
link bibtex
@inproceedings{mahapatra_nlp_2022, address = {Houston TX USA}, title = {{NLP} and {Annie} {App} for {Social} {Determinants} of {Health}}, author = {Mahapatra, Sneha and Chen, Huan-Yuan and Tsai, Jack and Yu, Hong}, month = may, year = {2022}, note = {AMIA Clinical Informatics 2022}, }
EASE: A Tool to Extract Social Determinants of Health from Electronic Health Records.
Rawat, B. P. S.; and Yu, H.
In Houston TX USA, May 2022.
AMIA Clinical Informatics 2022 (System Demo)
link bibtex
link bibtex
@inproceedings{rawat_ease_2022, address = {Houston TX USA}, title = {{EASE}: {A} {Tool} to {Extract} {Social} {Determinants} of {Health} from {Electronic} {Health} {Records}}, author = {Rawat, Bhanu Pratap Singh and Yu, Hong}, month = may, year = {2022}, note = {AMIA Clinical Informatics 2022 (System Demo)}, }
The association of prescribed long-acting versus short-acting opioids and mortality among older adults.
Sung, M.; Smirnova, J.; Li, W.; Liu, W.; Kerns, R. D.; Reisman, J. I.; Yu, H.; and Becker, W. C.
In Society of General Internal Medicine Annual National Meeting, Orlando, Florida, USA, April 2022.
link bibtex
link bibtex
@inproceedings{sung_association_2022, address = {Orlando, Florida, USA}, title = {The association of prescribed long-acting versus short-acting opioids and mortality among older adults}, booktitle = {Society of {General} {Internal} {Medicine} {Annual} {National} {Meeting}}, author = {Sung, Minhee and Smirnova, Jimin and Li, Wenjun and Liu, Weisong and Kerns, Robert D. and Reisman, Joel I. and Yu, Hong and Becker, William C.}, month = apr, year = {2022}, }
EHR Cohort Development Using Natural Language Processing For Identifying Symptoms Of Alzheimer's Disease.
Yu, H.; Mitra, A.; Keating, H.; Liu, W.; Hu, W.; Xia, W.; Morin, P.; Berlowitz, D. R.; Bray, M.; Monfared, A.; and Zhang, Q.
In Barcelona, Spain (Online), March 2022.
AD/PD 2022
link bibtex
link bibtex
@inproceedings{yu_ehr_2022, address = {Barcelona, Spain (Online)}, title = {{EHR} {Cohort} {Development} {Using} {Natural} {Language} {Processing} {For} {Identifying} {Symptoms} {Of} {Alzheimer}'s {Disease}}, shorttitle = {{AD}/{PD} 2022}, author = {Yu, Hong and Mitra, Avijit and Keating, Heather and Liu, Weisong and Hu, Wen and Xia, Weiming and Morin, Peter and Berlowitz, Dan R. and Bray, Margaret and Monfared, Amir and Zhang, Quanwu}, month = mar, year = {2022}, note = {AD/PD 2022}, }
2021
(9)
Risk Factors Associated With Nonfatal Opioid Overdose Leading to Intensive Care Unit Admission: A Cross-sectional Study.
Mitra, A.; Ahsan, H.; Li, W.; Liu, W.; Kerns, R. D.; Tsai, J.; Becker, W.; Smelson, D. A.; and Yu, H.
JMIR medical informatics, 9(11): e32851. November 2021.
doi link bibtex abstract
doi link bibtex abstract
@article{mitra_risk_2021, title = {Risk {Factors} {Associated} {With} {Nonfatal} {Opioid} {Overdose} {Leading} to {Intensive} {Care} {Unit} {Admission}: {A} {Cross}-sectional {Study}}, volume = {9}, issn = {2291-9694}, shorttitle = {Risk {Factors} {Associated} {With} {Nonfatal} {Opioid} {Overdose} {Leading} to {Intensive} {Care} {Unit} {Admission}}, doi = {10.2196/32851}, abstract = {BACKGROUND: Opioid overdose (OD) and related deaths have significantly increased in the United States over the last 2 decades. Existing studies have mostly focused on demographic and clinical risk factors in noncritical care settings. Social and behavioral determinants of health (SBDH) are infrequently coded in the electronic health record (EHR) and usually buried in unstructured EHR notes, reflecting possible gaps in clinical care and observational research. Therefore, SBDH often receive less attention despite being important risk factors for OD. Natural language processing (NLP) can alleviate this problem. OBJECTIVE: The objectives of this study were two-fold: First, we examined the usefulness of NLP for SBDH extraction from unstructured EHR text, and second, for intensive care unit (ICU) admissions, we investigated risk factors including SBDH for nonfatal OD. METHODS: We performed a cross-sectional analysis of admission data from the EHR of patients in the ICU of Beth Israel Deaconess Medical Center between 2001 and 2012. We used patient admission data and International Classification of Diseases, Ninth Revision (ICD-9) diagnoses to extract demographics, nonfatal OD, SBDH, and other clinical variables. In addition to obtaining SBDH information from the ICD codes, an NLP model was developed to extract 6 SBDH variables from EHR notes, namely, housing insecurity, unemployment, social isolation, alcohol use, smoking, and illicit drug use. We adopted a sequential forward selection process to select relevant clinical variables. Multivariable logistic regression analysis was used to evaluate the associations with nonfatal OD, and relative risks were quantified as covariate-adjusted odds ratios (aOR). RESULTS: The strongest association with nonfatal OD was found to be drug use disorder (aOR 8.17, 95\% CI 5.44-12.27), followed by bipolar disorder (aOR 2.69, 95\% CI 1.68-4.29). Among others, major depressive disorder (aOR 2.57, 95\% CI 1.12-5.88), being on a Medicaid health insurance program (aOR 2.26, 95\% CI 1.43-3.58), history of illicit drug use (aOR 2.09, 95\% CI 1.15-3.79), and current use of illicit drugs (aOR 2.06, 95\% CI 1.20-3.55) were strongly associated with increased risk of nonfatal OD. Conversely, Blacks (aOR 0.51, 95\% CI 0.28-0.94), older age groups (40-64 years: aOR 0.65, 95\% CI 0.44-0.96; {\textgreater}64 years: aOR 0.16, 95\% CI 0.08-0.34) and those with tobacco use disorder (aOR 0.53, 95\% CI 0.32-0.89) or alcohol use disorder (aOR 0.64, 95\% CI 0.42-1.00) had decreased risk of nonfatal OD. Moreover, 99.82\% of all SBDH information was identified by the NLP model, in contrast to only 0.18\% identified by the ICD codes. CONCLUSIONS: This is the first study to analyze the risk factors for nonfatal OD in an ICU setting using NLP-extracted SBDH from EHR notes. We found several risk factors associated with nonfatal OD including SBDH. SBDH are richly described in EHR notes, supporting the importance of integrating NLP-derived SBDH into OD risk assessment. More studies in ICU settings can help health care systems better understand and respond to the opioid epidemic.}, language = {eng}, number = {11}, journal = {JMIR medical informatics}, author = {Mitra, Avijit and Ahsan, Hiba and Li, Wenjun and Liu, Weisong and Kerns, Robert D. and Tsai, Jack and Becker, William and Smelson, David A. and Yu, Hong}, month = nov, year = {2021}, pmid = {34747714}, pmcid = {PMC8663596}, keywords = {electronic health records, intensive care unit, natural language processing, opioids, overdose, risk factors, social and behavioral determinants of health}, pages = {e32851}, }
BACKGROUND: Opioid overdose (OD) and related deaths have significantly increased in the United States over the last 2 decades. Existing studies have mostly focused on demographic and clinical risk factors in noncritical care settings. Social and behavioral determinants of health (SBDH) are infrequently coded in the electronic health record (EHR) and usually buried in unstructured EHR notes, reflecting possible gaps in clinical care and observational research. Therefore, SBDH often receive less attention despite being important risk factors for OD. Natural language processing (NLP) can alleviate this problem. OBJECTIVE: The objectives of this study were two-fold: First, we examined the usefulness of NLP for SBDH extraction from unstructured EHR text, and second, for intensive care unit (ICU) admissions, we investigated risk factors including SBDH for nonfatal OD. METHODS: We performed a cross-sectional analysis of admission data from the EHR of patients in the ICU of Beth Israel Deaconess Medical Center between 2001 and 2012. We used patient admission data and International Classification of Diseases, Ninth Revision (ICD-9) diagnoses to extract demographics, nonfatal OD, SBDH, and other clinical variables. In addition to obtaining SBDH information from the ICD codes, an NLP model was developed to extract 6 SBDH variables from EHR notes, namely, housing insecurity, unemployment, social isolation, alcohol use, smoking, and illicit drug use. We adopted a sequential forward selection process to select relevant clinical variables. Multivariable logistic regression analysis was used to evaluate the associations with nonfatal OD, and relative risks were quantified as covariate-adjusted odds ratios (aOR). RESULTS: The strongest association with nonfatal OD was found to be drug use disorder (aOR 8.17, 95% CI 5.44-12.27), followed by bipolar disorder (aOR 2.69, 95% CI 1.68-4.29). Among others, major depressive disorder (aOR 2.57, 95% CI 1.12-5.88), being on a Medicaid health insurance program (aOR 2.26, 95% CI 1.43-3.58), history of illicit drug use (aOR 2.09, 95% CI 1.15-3.79), and current use of illicit drugs (aOR 2.06, 95% CI 1.20-3.55) were strongly associated with increased risk of nonfatal OD. Conversely, Blacks (aOR 0.51, 95% CI 0.28-0.94), older age groups (40-64 years: aOR 0.65, 95% CI 0.44-0.96; \textgreater64 years: aOR 0.16, 95% CI 0.08-0.34) and those with tobacco use disorder (aOR 0.53, 95% CI 0.32-0.89) or alcohol use disorder (aOR 0.64, 95% CI 0.42-1.00) had decreased risk of nonfatal OD. Moreover, 99.82% of all SBDH information was identified by the NLP model, in contrast to only 0.18% identified by the ICD codes. CONCLUSIONS: This is the first study to analyze the risk factors for nonfatal OD in an ICU setting using NLP-extracted SBDH from EHR notes. We found several risk factors associated with nonfatal OD including SBDH. SBDH are richly described in EHR notes, supporting the importance of integrating NLP-derived SBDH into OD risk assessment. More studies in ICU settings can help health care systems better understand and respond to the opioid epidemic.
Evaluating the effectiveness of noteaid in a community hospital setting: randomized trial of electronic health record note comprehension interventions with patients.
Lalor, J. P; Hu, W.; Tran, M.; Wu, H.; Mazor, K. M; and Yu, H.
Journal of Medical Internet Research, 23(5). 2021.
Paper
doi
link
bibtex
abstract
@article{lalor_evaluating_2021, title = {Evaluating the effectiveness of noteaid in a community hospital setting: randomized trial of electronic health record note comprehension interventions with patients}, volume = {23}, url = {https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8160802/}, doi = {10.2196/26354}, abstract = {Background: Interventions to define medical jargon have been shown to improve electronic health record (EHR) note comprehension among crowdsourced participants on Amazon Mechanical Turk (AMT). However, AMT participants may not be representative of the general population or patients who are most at-risk for low health literacy. Objective: In this work, we assessed the efficacy of an intervention (NoteAid) for EHR note comprehension among participants in a community hospital setting. Methods: Participants were recruited from Lowell General Hospital (LGH), a community hospital in Massachusetts, to take the ComprehENotes test, a web-based test of EHR note comprehension. Participants were randomly assigned to control (n=85) or intervention (n=89) groups to take the test without or with NoteAid, respectively. For comparison, we used a sample of 200 participants recruited from AMT to take the ComprehENotes test (100 in the control group and 100 in the intervention group). Results: A total of 174 participants were recruited from LGH, and 200 participants were recruited from AMT. Participants in both intervention groups (community hospital and AMT) scored significantly higher than participants in the control groups (P{\textless}.001). The average score for the community hospital participants was significantly lower than the average score for the AMT participants (P{\textless}.001), consistent with the lower education levels in the community hospital sample. Education level had a significant effect on scores for the community hospital participants (P{\textless}.001). Conclusions: Use of NoteAid was associated with significantly improved EHR note comprehension in both community hospital and AMT samples. Our results demonstrate the generalizability of ComprehENotes as a test of EHR note comprehension and the effectiveness of NoteAid for improving EHR note comprehension.}, number = {5}, journal = {Journal of Medical Internet Research}, author = {Lalor, John P and Hu, Wen and Tran, Matthew and Wu, Hao and Mazor, Kathleen M and Yu, Hong}, year = {2021}, pmid = {33983124}, pmcid = {8160802}, }
Background: Interventions to define medical jargon have been shown to improve electronic health record (EHR) note comprehension among crowdsourced participants on Amazon Mechanical Turk (AMT). However, AMT participants may not be representative of the general population or patients who are most at-risk for low health literacy. Objective: In this work, we assessed the efficacy of an intervention (NoteAid) for EHR note comprehension among participants in a community hospital setting. Methods: Participants were recruited from Lowell General Hospital (LGH), a community hospital in Massachusetts, to take the ComprehENotes test, a web-based test of EHR note comprehension. Participants were randomly assigned to control (n=85) or intervention (n=89) groups to take the test without or with NoteAid, respectively. For comparison, we used a sample of 200 participants recruited from AMT to take the ComprehENotes test (100 in the control group and 100 in the intervention group). Results: A total of 174 participants were recruited from LGH, and 200 participants were recruited from AMT. Participants in both intervention groups (community hospital and AMT) scored significantly higher than participants in the control groups (P\textless.001). The average score for the community hospital participants was significantly lower than the average score for the AMT participants (P\textless.001), consistent with the lower education levels in the community hospital sample. Education level had a significant effect on scores for the community hospital participants (P\textless.001). Conclusions: Use of NoteAid was associated with significantly improved EHR note comprehension in both community hospital and AMT samples. Our results demonstrate the generalizability of ComprehENotes as a test of EHR note comprehension and the effectiveness of NoteAid for improving EHR note comprehension.
MIMIC-SBDH: A Dataset for Social and Behavioral Determinants of Health.
Ahsan, H.; Ohnuki, E.; Mitra, A.; and Yu, H.
Proceedings of Machine Learning Research, 149: 391–413. August 2021.
Paper
link
bibtex
abstract
@article{ahsan_mimic-sbdh_2021, title = {{MIMIC}-{SBDH}: {A} {Dataset} for {Social} and {Behavioral} {Determinants} of {Health}}, volume = {149}, issn = {2640-3498}, shorttitle = {{MIMIC}-{SBDH}}, url = {https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8734043/}, abstract = {Social and Behavioral Determinants of Health (SBDHs) are environmental and behavioral factors that have a profound impact on health and related outcomes. Given their importance, physicians document SBDHs of their patients in Electronic Health Records (EHRs). However, SBDHs are mostly documented in unstructured EHR notes. Determining the status of the SBDHs requires manually reviewing the notes which can be a tedious process. Therefore, there is a need to automate identifying the patients' SBDH status in EHR notes. In this work, we created MIMIC-SBDH, the first publicly available dataset of EHR notes annotated for patients' SBDH status. Specifically, we annotated 7,025 discharge summary notes for the status of 7 SBDHs as well as marked SBDH-related keywords. Using this annotated data for training and evaluation, we evaluated the performance of three machine learning models (Random Forest, XGBoost, and Bio-ClinicalBERT) on the task of identifying SBDH status in EHR notes. The performance ranged from the lowest 0.69 F1 score for Drug Use to the highest 0.96 F1 score for Community-Present. In addition to standard evaluation metrics such as the F1 score, we evaluated four capabilities that a model must possess to perform well on the task using the CheckList tool (Ribeiro et al., 2020). The results revealed several shortcomings of the models. Our results highlighted the need to perform more capability-centric evaluations in addition to standard metric comparisons.}, language = {eng}, journal = {Proceedings of Machine Learning Research}, author = {Ahsan, Hiba and Ohnuki, Emmie and Mitra, Avijit and Yu, Hong}, month = aug, year = {2021}, pmid = {35005628}, pmcid = {PMC8734043}, pages = {391--413}, }
Social and Behavioral Determinants of Health (SBDHs) are environmental and behavioral factors that have a profound impact on health and related outcomes. Given their importance, physicians document SBDHs of their patients in Electronic Health Records (EHRs). However, SBDHs are mostly documented in unstructured EHR notes. Determining the status of the SBDHs requires manually reviewing the notes which can be a tedious process. Therefore, there is a need to automate identifying the patients' SBDH status in EHR notes. In this work, we created MIMIC-SBDH, the first publicly available dataset of EHR notes annotated for patients' SBDH status. Specifically, we annotated 7,025 discharge summary notes for the status of 7 SBDHs as well as marked SBDH-related keywords. Using this annotated data for training and evaluation, we evaluated the performance of three machine learning models (Random Forest, XGBoost, and Bio-ClinicalBERT) on the task of identifying SBDH status in EHR notes. The performance ranged from the lowest 0.69 F1 score for Drug Use to the highest 0.96 F1 score for Community-Present. In addition to standard evaluation metrics such as the F1 score, we evaluated four capabilities that a model must possess to perform well on the task using the CheckList tool (Ribeiro et al., 2020). The results revealed several shortcomings of the models. Our results highlighted the need to perform more capability-centric evaluations in addition to standard metric comparisons.
SBDH and Suicide: A Multi-task Learning Framework for SBDH in Electronic Health Records.
Mitra, A.; Rawat, B. P. S.; Druhl, E.; Keating, H.; Goodwin, R.; Hu, W.; Liu, W.; Tsai, J.; Smelson, D. A.; and Yu, H.
In Online, October 2021.
SciNLP 2021
link bibtex
link bibtex
@inproceedings{mitra_sbdh_2021, address = {Online}, title = {{SBDH} and {Suicide}: {A} {Multi}-task {Learning} {Framework} for {SBDH} in {Electronic} {Health} {Records}}, shorttitle = {{SciNLP} 2021}, author = {Mitra, Avijit and Rawat, Bhanu Pratap Singh and Druhl, Emily and Keating, Heather and Goodwin, Raelene and Hu, Wen and Liu, Weisong and Tsai, Jack and Smelson, David A. and Yu, Hong}, month = oct, year = {2021}, note = {SciNLP 2021}, }
Membership Inference Attack Susceptibility of Clinical Language Models.
Jagannatha, A.; Rawat, B. P. S.; and Yu, H.
CoRR, abs/2104.08305. 2021.
arXiv: 2104.08305
Paper
link
bibtex
abstract
@article{jagannatha_membership_2021, title = {Membership {Inference} {Attack} {Susceptibility} of {Clinical} {Language} {Models}}, volume = {abs/2104.08305}, url = {https://arxiv.org/abs/2104.08305}, abstract = {Deep Neural Network (DNN) models have been shown to have high empirical privacy leakages. Clinical language models (CLMs) trained on clinical data have been used to improve performance in biomedical natural language processing tasks. In this work, we investigate the risks of training-data leakage through white-box or black-box access to CLMs. We design and employ membership inference attacks to estimate the empirical privacy leaks for model architectures like BERT and GPT2. We show that membership inference attacks on CLMs lead to non-trivial privacy leakages of up to 7\%. Our results show that smaller models have lower empirical privacy leakages than larger ones, and masked LMs have lower leakages than auto-regressive LMs. We further show that differentially private CLMs can have improved model utility on clinical domain while ensuring low empirical privacy leakage. Lastly, we also study the effects of group-level membership inference and disease rarity on CLM privacy leakages.}, journal = {CoRR}, author = {Jagannatha, Abhyuday and Rawat, Bhanu Pratap Singh and Yu, Hong}, year = {2021}, note = {arXiv: 2104.08305}, }
Deep Neural Network (DNN) models have been shown to have high empirical privacy leakages. Clinical language models (CLMs) trained on clinical data have been used to improve performance in biomedical natural language processing tasks. In this work, we investigate the risks of training-data leakage through white-box or black-box access to CLMs. We design and employ membership inference attacks to estimate the empirical privacy leaks for model architectures like BERT and GPT2. We show that membership inference attacks on CLMs lead to non-trivial privacy leakages of up to 7%. Our results show that smaller models have lower empirical privacy leakages than larger ones, and masked LMs have lower leakages than auto-regressive LMs. We further show that differentially private CLMs can have improved model utility on clinical domain while ensuring low empirical privacy leakage. Lastly, we also study the effects of group-level membership inference and disease rarity on CLM privacy leakages.
Guideline-discordant dosing of direct-acting oral anticoagulants in the veterans health administration.
Rose, A. J.; Lee, J. S.; Berlowitz, D. R.; Liu, W.; Mitra, A.; and Yu, H.
BMC Health Services Research, 21(1): 1351. December 2021.
Paper
doi
link
bibtex
abstract
@article{rose_guideline-discordant_2021, title = {Guideline-discordant dosing of direct-acting oral anticoagulants in the veterans health administration}, volume = {21}, issn = {1472-6963}, url = {https://doi.org/10.1186/s12913-021-07397-x}, doi = {10.1186/s12913-021-07397-x}, abstract = {Clear guidelines exist to guide the dosing of direct-acting oral anticoagulants (DOACs). It is not known how consistently these guidelines are followed in practice.}, number = {1}, urldate = {2022-01-24}, journal = {BMC Health Services Research}, author = {Rose, Adam J. and Lee, Jong Soo and Berlowitz, Dan R. and Liu, Weisong and Mitra, Avijit and Yu, Hong}, month = dec, year = {2021}, keywords = {Anticoagulants, Atrial fibrillation, Medication therapy management, Quality of health care}, pages = {1351}, }
Clear guidelines exist to guide the dosing of direct-acting oral anticoagulants (DOACs). It is not known how consistently these guidelines are followed in practice.
Improving Formality Style Transfer with Context-Aware Rule Injection.
Yao, Z.; and Yu, H.
In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1561–1570, Online, August 2021. Association for Computational Linguistics
Paper
doi
link
bibtex
abstract
@inproceedings{yao_improving_2021, address = {Online}, title = {Improving {Formality} {Style} {Transfer} with {Context}-{Aware} {Rule} {Injection}}, url = {https://aclanthology.org/2021.acl-long.124}, doi = {10.18653/v1/2021.acl-long.124}, abstract = {Models pre-trained on large-scale regular text corpora often do not work well for user-generated data where the language styles differ significantly from the mainstream text. Here we present Context-Aware Rule Injection (CARI), an innovative method for formality style transfer (FST) by injecting multiple rules into an end-to-end BERT-based encoder and decoder model. CARI is able to learn to select optimal rules based on context. The intrinsic evaluation showed that CARI achieved the new highest performance on the FST benchmark dataset. Our extrinsic evaluation showed that CARI can greatly improve the regular pre-trained models' performance on several tweet sentiment analysis tasks. Our contributions are as follows: 1.We propose a new method, CARI, to integrate rules for pre-trained language models. CARI is context-aware and can trained end-to-end with the downstream NLP applications. 2.We have achieved new state-of-the-art results for FST on the benchmark GYAFC dataset. 3.We are the first to evaluate FST methods with extrinsic evaluation and specifically on sentiment classification tasks. We show that CARI outperformed existing rule-based FST approaches for sentiment classification.}, urldate = {2021-09-21}, booktitle = {Proceedings of the 59th {Annual} {Meeting} of the {Association} for {Computational} {Linguistics} and the 11th {International} {Joint} {Conference} on {Natural} {Language} {Processing} ({Volume} 1: {Long} {Papers})}, publisher = {Association for Computational Linguistics}, author = {Yao, Zonghai and Yu, Hong}, month = aug, year = {2021}, pages = {1561--1570}, }
Models pre-trained on large-scale regular text corpora often do not work well for user-generated data where the language styles differ significantly from the mainstream text. Here we present Context-Aware Rule Injection (CARI), an innovative method for formality style transfer (FST) by injecting multiple rules into an end-to-end BERT-based encoder and decoder model. CARI is able to learn to select optimal rules based on context. The intrinsic evaluation showed that CARI achieved the new highest performance on the FST benchmark dataset. Our extrinsic evaluation showed that CARI can greatly improve the regular pre-trained models' performance on several tweet sentiment analysis tasks. Our contributions are as follows: 1.We propose a new method, CARI, to integrate rules for pre-trained language models. CARI is context-aware and can trained end-to-end with the downstream NLP applications. 2.We have achieved new state-of-the-art results for FST on the benchmark GYAFC dataset. 3.We are the first to evaluate FST methods with extrinsic evaluation and specifically on sentiment classification tasks. We show that CARI outperformed existing rule-based FST approaches for sentiment classification.
Relation Classification for Bleeding Events From Electronic Health Records Using Deep Learning Systems: An Empirical Study.
Mitra, A.; Rawat, B. P. S.; McManus, D. D.; and Yu, H.
JMIR Medical Informatics, 9(7): e27527. July 2021.
Company: JMIR Medical Informatics Distributor: JMIR Medical Informatics Institution: JMIR Medical Informatics Label: JMIR Medical Informatics Publisher: JMIR Publications Inc., Toronto, Canada
Paper
doi
link
bibtex
abstract
@article{mitra_relation_2021, title = {Relation {Classification} for {Bleeding} {Events} {From} {Electronic} {Health} {Records} {Using} {Deep} {Learning} {Systems}: {An} {Empirical} {Study}}, volume = {9}, copyright = {Unless stated otherwise, all articles are open-access distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work ("first published in the Journal of Medical Internet Research...") is properly cited with original URL and bibliographic citation information. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.}, shorttitle = {Relation {Classification} for {Bleeding} {Events} {From} {Electronic} {Health} {Records} {Using} {Deep} {Learning} {Systems}}, url = {https://medinform.jmir.org/2021/7/e27527}, doi = {10.2196/27527}, abstract = {Background: Accurate detection of bleeding events from electronic health records (EHRs) is crucial for identifying and characterizing different common and serious medical problems. To extract such information from EHRs, it is essential to identify the relations between bleeding events and related clinical entities (eg, bleeding anatomic sites and lab tests). With the advent of natural language processing (NLP) and deep learning (DL)-based techniques, many studies have focused on their applicability for various clinical applications. However, no prior work has utilized DL to extract relations between bleeding events and relevant entities. Objective: In this study, we aimed to evaluate multiple DL systems on a novel EHR data set for bleeding event–related relation classification. Methods: We first expert annotated a new data set of 1046 deidentified EHR notes for bleeding events and their attributes. On this data set, we evaluated three state-of-the-art DL architectures for the bleeding event relation classification task, namely, convolutional neural network (CNN), attention-guided graph convolutional network (AGGCN), and Bidirectional Encoder Representations from Transformers (BERT). We used three BERT-based models, namely, BERT pretrained on biomedical data (BioBERT), BioBERT pretrained on clinical text (Bio+Clinical BERT), and BioBERT pretrained on EHR notes (EhrBERT). Results: Our experiments showed that the BERT-based models significantly outperformed the CNN and AGGCN models. Specifically, BioBERT achieved a macro F1 score of 0.842, outperforming both the AGGCN (macro F1 score, 0.828) and CNN models (macro F1 score, 0.763) by 1.4\% (P\<.001) and 7.9\% (P\<.001), respectively. Conclusions: In this comprehensive study, we explored and compared different DL systems to classify relations between bleeding events and other medical concepts. On our corpus, BERT-based models outperformed other DL models for identifying the relations of bleeding-related entities. In addition to pretrained contextualized word representation, BERT-based models benefited from the use of target entity representation over traditional sequence representation}, language = {EN}, number = {7}, urldate = {2021-07-02}, journal = {JMIR Medical Informatics}, author = {Mitra, Avijit and Rawat, Bhanu Pratap Singh and McManus, David D. and Yu, Hong}, month = jul, year = {2021}, note = {Company: JMIR Medical Informatics Distributor: JMIR Medical Informatics Institution: JMIR Medical Informatics Label: JMIR Medical Informatics Publisher: JMIR Publications Inc., Toronto, Canada}, pages = {e27527}, }
Background: Accurate detection of bleeding events from electronic health records (EHRs) is crucial for identifying and characterizing different common and serious medical problems. To extract such information from EHRs, it is essential to identify the relations between bleeding events and related clinical entities (eg, bleeding anatomic sites and lab tests). With the advent of natural language processing (NLP) and deep learning (DL)-based techniques, many studies have focused on their applicability for various clinical applications. However, no prior work has utilized DL to extract relations between bleeding events and relevant entities. Objective: In this study, we aimed to evaluate multiple DL systems on a novel EHR data set for bleeding event–related relation classification. Methods: We first expert annotated a new data set of 1046 deidentified EHR notes for bleeding events and their attributes. On this data set, we evaluated three state-of-the-art DL architectures for the bleeding event relation classification task, namely, convolutional neural network (CNN), attention-guided graph convolutional network (AGGCN), and Bidirectional Encoder Representations from Transformers (BERT). We used three BERT-based models, namely, BERT pretrained on biomedical data (BioBERT), BioBERT pretrained on clinical text (Bio+Clinical BERT), and BioBERT pretrained on EHR notes (EhrBERT). Results: Our experiments showed that the BERT-based models significantly outperformed the CNN and AGGCN models. Specifically, BioBERT achieved a macro F1 score of 0.842, outperforming both the AGGCN (macro F1 score, 0.828) and CNN models (macro F1 score, 0.763) by 1.4% (P<.001) and 7.9% (P<.001), respectively. Conclusions: In this comprehensive study, we explored and compared different DL systems to classify relations between bleeding events and other medical concepts. On our corpus, BERT-based models outperformed other DL models for identifying the relations of bleeding-related entities. In addition to pretrained contextualized word representation, BERT-based models benefited from the use of target entity representation over traditional sequence representation
Epinoter: A Natural Language Processing Tool for Epidemiological Studies.
Liu, W.; Li, F.; Jin, Y.; Granillo, E.; Yarzebski, J.; Li, W.; and Yu, H.
In Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies, volume 5, pages 754–761, February 2021.
link bibtex
link bibtex
@inproceedings{liu_epinoter_2021, title = {Epinoter: {A} {Natural} {Language} {Processing} {Tool} for {Epidemiological} {Studies}.}, volume = {5}, booktitle = {Proceedings of the 14th {International} {Joint} {Conference} on {Biomedical} {Engineering} {Systems} and {Technologies}}, author = {Liu, Weisong and Li, Fei and Jin, Yonghao and Granillo, Edgard and Yarzebski, Jorge and Li, Wenjun and Yu, Hong}, month = feb, year = {2021}, pages = {754--761}, }
2020
(15)
Inferring ADR causality by predicting the Naranjo Score from Clinical Notes.
Rawat, B. P. S.; Jagannatha, A.; Liu, F.; and Yu, H.
In AMIA Fall Symposium, pages 1041–1049, 2020.
Paper
link
bibtex
abstract
@inproceedings{rawat_inferring_2020, title = {Inferring {ADR} causality by predicting the {Naranjo} {Score} from {Clinical} {Notes}}, url = {https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8075501/}, abstract = {Clinical judgment studies are an integral part of drug safety surveillance and pharmacovigilance frameworks. They help quantify the causal relationship between medication and its adverse drug reactions (ADRs). To conduct such studies, physicians need to review patients’ charts manually to answer Naranjo questionnaire1. In this paper, we propose a methodology to automatically infer causal relations from patients’ discharge summaries by combining the capabilities of deep learning and statistical learning models. We use Bidirectional Encoder Representations from Transformers (BERT)2 to extract relevant paragraphs for each Naranjo question and then use a statistical learning model such as logistic regression to predict the Naranjo score and the causal relation between the medication and an ADR. Our methodology achieves a macro-averaged f1-score of 0.50 and weighted f1-score of 0.63.}, booktitle = {{AMIA} {Fall} {Symposium}}, author = {Rawat, Bhanu Pratap Singh and Jagannatha, Abhyuday and Liu, Feifan and Yu, Hong}, year = {2020}, pmcid = {PMC8075501}, pmid = {33936480}, pages = {1041--1049}, }
Clinical judgment studies are an integral part of drug safety surveillance and pharmacovigilance frameworks. They help quantify the causal relationship between medication and its adverse drug reactions (ADRs). To conduct such studies, physicians need to review patients’ charts manually to answer Naranjo questionnaire1. In this paper, we propose a methodology to automatically infer causal relations from patients’ discharge summaries by combining the capabilities of deep learning and statistical learning models. We use Bidirectional Encoder Representations from Transformers (BERT)2 to extract relevant paragraphs for each Naranjo question and then use a statistical learning model such as logistic regression to predict the Naranjo score and the causal relation between the medication and an ADR. Our methodology achieves a macro-averaged f1-score of 0.50 and weighted f1-score of 0.63.
Calibrating Structured Output Predictors for Natural Language Processing.
Jagannatha, A.; and Yu, H.
In 2020 Annual Conference of the Association for Computational Linguistics (ACL), volume Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 2078–2092, July 2020.
NIHMSID: NIHMS1661932
Paper
doi
link
bibtex
abstract
@inproceedings{jagannatha_calibrating_2020, title = {Calibrating {Structured} {Output} {Predictors} for {Natural} {Language} {Processing}.}, volume = {Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics}, url = {https://aclanthology.org/2020.acl-main.188}, doi = {10.18653/v1/2020.acl-main.188}, abstract = {We address the problem of calibrating prediction confidence for output entities of interest in natural language processing (NLP) applications. It is important that NLP applications such as named entity recognition and question answering produce calibrated confidence scores for their predictions, especially if the system is to be deployed in a safety-critical domain such as healthcare. However, the output space of such structured prediction models is often too large to adapt binary or multi-class calibration methods directly. In this study, we propose a general calibration scheme for output entities of interest in neural-network based structured prediction models. Our proposed method can be used with any binary class calibration scheme and a neural network model. Additionally, we show that our calibration method can also be used as an uncertainty-aware, entity-specific decoding step to improve the performance of the underlying model at no additional training cost or data requirements. We show that our method outperforms current calibration techniques for named-entity-recognition, part-of-speech and question answering. We also improve our model's performance from our decoding step across several tasks and benchmark datasets. Our method improves the calibration and model performance on out-of-domain test scenarios as well.}, booktitle = {2020 {Annual} {Conference} of the {Association} for {Computational} {Linguistics} ({ACL})}, author = {Jagannatha, Abhyuday and Yu, Hong}, month = jul, year = {2020}, pmcid = {PMC7890517}, pmid = {33612961}, note = {NIHMSID: NIHMS1661932}, pages = {2078--2092}, }
We address the problem of calibrating prediction confidence for output entities of interest in natural language processing (NLP) applications. It is important that NLP applications such as named entity recognition and question answering produce calibrated confidence scores for their predictions, especially if the system is to be deployed in a safety-critical domain such as healthcare. However, the output space of such structured prediction models is often too large to adapt binary or multi-class calibration methods directly. In this study, we propose a general calibration scheme for output entities of interest in neural-network based structured prediction models. Our proposed method can be used with any binary class calibration scheme and a neural network model. Additionally, we show that our calibration method can also be used as an uncertainty-aware, entity-specific decoding step to improve the performance of the underlying model at no additional training cost or data requirements. We show that our method outperforms current calibration techniques for named-entity-recognition, part-of-speech and question answering. We also improve our model's performance from our decoding step across several tasks and benchmark datasets. Our method improves the calibration and model performance on out-of-domain test scenarios as well.
Conversational machine comprehension: a literature review.
Gupta, S.; Rawat, B. P. S.; and Yu, H.
arXiv preprint arXiv:2006.00671,2739–2753. December 2020.
COLING 2020
Paper
doi
link
bibtex
abstract
@article{gupta_conversational_2020, title = {Conversational machine comprehension: a literature review}, shorttitle = {Conversational machine comprehension}, url = {https://aclanthology.org/2020.coling-main.247}, doi = {10.18653/v1/2020.coling-main.247}, abstract = {Conversational Machine Comprehension (CMC), a research track in conversational AI, expects the machine to understand an open-domain natural language text and thereafter engage in a multi-turn conversation to answer questions related to the text. While most of the research in Machine Reading Comprehension (MRC) revolves around single-turn question answering (QA), multi-turn CMC has recently gained prominence, thanks to the advancement in natural language understanding via neural language models such as BERT and the introduction of large-scale conversational datasets such as CoQA and QuAC. The rise in interest has, however, led to a flurry of concurrent publications, each with a different yet structurally similar modeling approach and an inconsistent view of the surrounding literature. With the volume of model submissions to conversational datasets increasing every year, there exists a need to consolidate the scattered knowledge in this domain to streamline future research. This literature review attempts at providing a holistic overview of CMC with an emphasis on the common trends across recently published models, specifically in their approach to tackling conversational history. The review synthesizes a generic framework for CMC models while highlighting the differences in recent approaches and intends to serve as a compendium of CMC for future researchers.}, journal = {arXiv preprint arXiv:2006.00671}, author = {Gupta, Somil and Rawat, Bhanu Pratap Singh and Yu, Hong}, month = dec, year = {2020}, note = {COLING 2020}, pages = {2739--2753}, }
Conversational Machine Comprehension (CMC), a research track in conversational AI, expects the machine to understand an open-domain natural language text and thereafter engage in a multi-turn conversation to answer questions related to the text. While most of the research in Machine Reading Comprehension (MRC) revolves around single-turn question answering (QA), multi-turn CMC has recently gained prominence, thanks to the advancement in natural language understanding via neural language models such as BERT and the introduction of large-scale conversational datasets such as CoQA and QuAC. The rise in interest has, however, led to a flurry of concurrent publications, each with a different yet structurally similar modeling approach and an inconsistent view of the surrounding literature. With the volume of model submissions to conversational datasets increasing every year, there exists a need to consolidate the scattered knowledge in this domain to streamline future research. This literature review attempts at providing a holistic overview of CMC with an emphasis on the common trends across recently published models, specifically in their approach to tackling conversational history. The review synthesizes a generic framework for CMC models while highlighting the differences in recent approaches and intends to serve as a compendium of CMC for future researchers.
Bleeding Entity Recognition in Electronic Health Records: A Comprehensive Analysis of End-to-End Systems.
Mitra, A.; Rawat, B. P. S.; McManus, D.; Kapoor, A.; and Yu, H.
In AMIA Annu Symp Proc, pages 860–869, 2020.
Paper
link
bibtex
abstract
@inproceedings{mitra_bleeding_2020, title = {Bleeding {Entity} {Recognition} in {Electronic} {Health} {Records}: {A} {Comprehensive} {Analysis} of {End}-to-{End} {Systems}}, url = {https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8075442/}, abstract = {A bleeding event is a common adverse drug reaction amongst patients on anticoagulation and factors critically into a clinician's decision to prescribe or continue anticoagulation for atrial fibrillation. However, bleeding events are not uniformly captured in the administrative data of electronic health records (EHR). As manual review is prohibitively expensive, we investigate the effectiveness of various natural language processing (NLP) methods for automatic extraction of bleeding events. Using our expert-annotated 1,079 de-identified EHR notes, we evaluated state-of-the-art NLP models such as biLSTM-CRF with language modeling, and different BERT variants for six entity types. On our dataset, the biLSTM-CRF surpassed other models resulting in a macro F1-score of 0.75 whereas the performance difference is negligible for sentence and document-level predictions with the best macro F1-scores of 0.84 and 0.96, respectively. Our error analyses suggest that the models' incorrect predictions can be attributed to variability in entity spans, memorization, and missing negation signals.}, booktitle = {{AMIA} {Annu} {Symp} {Proc}}, author = {Mitra, Avijit and Rawat, Bhanu Pratap Singh and McManus, David and Kapoor, Alok and Yu, Hong}, year = {2020}, pmid = {33936461 PMCID: PMC8075442}, pages = {860--869}, }
A bleeding event is a common adverse drug reaction amongst patients on anticoagulation and factors critically into a clinician's decision to prescribe or continue anticoagulation for atrial fibrillation. However, bleeding events are not uniformly captured in the administrative data of electronic health records (EHR). As manual review is prohibitively expensive, we investigate the effectiveness of various natural language processing (NLP) methods for automatic extraction of bleeding events. Using our expert-annotated 1,079 de-identified EHR notes, we evaluated state-of-the-art NLP models such as biLSTM-CRF with language modeling, and different BERT variants for six entity types. On our dataset, the biLSTM-CRF surpassed other models resulting in a macro F1-score of 0.75 whereas the performance difference is negligible for sentence and document-level predictions with the best macro F1-scores of 0.84 and 0.96, respectively. Our error analyses suggest that the models' incorrect predictions can be attributed to variability in entity spans, memorization, and missing negation signals.
Neural Multi-Task Learning for Adverse Drug Reaction Extraction.
Liu, F.; Zheng, X.; Yu, H.; and Tjia, J.
AMIA ... Annual Symposium proceedings. AMIA Symposium, 2020: 756–762. 2020.
Paper
link
bibtex
abstract
@article{liu_neural_2020, title = {Neural {Multi}-{Task} {Learning} for {Adverse} {Drug} {Reaction} {Extraction}}, volume = {2020}, issn = {1942-597X}, url = {https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8075418/pdf/110_3417286.pdf}, abstract = {A reliable and searchable knowledge database of adverse drug reactions (ADRs) is highly important and valuable for improving patient safety at the point of care. In this paper, we proposed a neural multi-task learning system, NeuroADR, to extract ADRs as well as relevant modifiers from free-text drug labels. Specifically, the NeuroADR system exploited a hierarchical multi-task learning (HMTL) framework to perform named entity recognition (NER) and relation extraction (RE) jointly, where interactions among the learned deep encoder representations from different subtasks are explored. Different from the conventional HMTL approach, NeuroADR adopted a novel task decomposition strategy to generate auxiliary subtasks for more inter-task interactions and integrated a new label encoding schema for better handling discontinuous entities. Experimental results demonstrate the effectiveness of the proposed system.}, language = {eng}, journal = {AMIA ... Annual Symposium proceedings. AMIA Symposium}, author = {Liu, Feifan and Zheng, Xiaoyu and Yu, Hong and Tjia, Jennifer}, year = {2020}, pmid = {33936450}, pmcid = {PMC8075418}, keywords = {Data Mining, Databases, Factual, Deep Learning, Drug-Related Side Effects and Adverse Reactions, Humans, Machine Learning}, pages = {756--762}, }
A reliable and searchable knowledge database of adverse drug reactions (ADRs) is highly important and valuable for improving patient safety at the point of care. In this paper, we proposed a neural multi-task learning system, NeuroADR, to extract ADRs as well as relevant modifiers from free-text drug labels. Specifically, the NeuroADR system exploited a hierarchical multi-task learning (HMTL) framework to perform named entity recognition (NER) and relation extraction (RE) jointly, where interactions among the learned deep encoder representations from different subtasks are explored. Different from the conventional HMTL approach, NeuroADR adopted a novel task decomposition strategy to generate auxiliary subtasks for more inter-task interactions and integrated a new label encoding schema for better handling discontinuous entities. Experimental results demonstrate the effectiveness of the proposed system.
BENTO: A Visual Platform for Building Clinical NLP Pipelines Based on CodaLab.
Jin, Y.; Li, F.; and Yu, H.
In 2020 Annual Conference of the Association for Computational Linguistics (ACL), pages 95–100, July 2020.
NIHMSID: NIHMS1644629
Paper
doi
link
bibtex
abstract
@inproceedings{jin_bento_2020, title = {{BENTO}: {A} {Visual} {Platform} for {Building} {Clinical} {NLP} {Pipelines} {Based} on {CodaLab}.}, url = {https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7679080/}, doi = {10.18653/v1/2020.acl-demos.13}, abstract = {CodaLab is an open-source web-based platform for collaborative computational research. Although CodaLab has gained popularity in the research community, its interface has limited support for creating reusable tools that can be easily applied to new datasets and composed into pipelines. In clinical domain, natural language processing (NLP) on medical notes generally involves multiple steps, like tokenization, named entity recognition, etc. Since these steps require different tools which are usually scattered in different publications, it is not easy for researchers to use them to process their own datasets. In this paper, we present BENTO, a workflow management platform with a graphic user interface (GUI) that is built on top of CodaLab, to facilitate the process of building clinical NLP pipelines. BENTO comes with a number of clinical NLP tools that have been pre-trained using medical notes and expert annotations and can be readily used for various clinical NLP tasks. It also allows researchers and developers to create their custom tools (e.g., pre-trained NLP models) and use them in a controlled and reproducible way. In addition, the GUI interface enables researchers with limited computer background to compose tools into NLP pipelines and then apply the pipelines on their own datasets in a "what you see is what you get" (WYSIWYG) way. Although BENTO is designed for clinical NLP applications, the underlying architecture is flexible to be tailored to any other domains.}, booktitle = {2020 {Annual} {Conference} of the {Association} for {Computational} {Linguistics} ({ACL})}, author = {Jin, Yonghao and Li, Fei and Yu, Hong}, month = jul, year = {2020}, pmcid = {PMC7679080}, pmid = {33223604}, note = {NIHMSID: NIHMS1644629}, pages = {95--100}, }
CodaLab is an open-source web-based platform for collaborative computational research. Although CodaLab has gained popularity in the research community, its interface has limited support for creating reusable tools that can be easily applied to new datasets and composed into pipelines. In clinical domain, natural language processing (NLP) on medical notes generally involves multiple steps, like tokenization, named entity recognition, etc. Since these steps require different tools which are usually scattered in different publications, it is not easy for researchers to use them to process their own datasets. In this paper, we present BENTO, a workflow management platform with a graphic user interface (GUI) that is built on top of CodaLab, to facilitate the process of building clinical NLP pipelines. BENTO comes with a number of clinical NLP tools that have been pre-trained using medical notes and expert annotations and can be readily used for various clinical NLP tasks. It also allows researchers and developers to create their custom tools (e.g., pre-trained NLP models) and use them in a controlled and reproducible way. In addition, the GUI interface enables researchers with limited computer background to compose tools into NLP pipelines and then apply the pipelines on their own datasets in a "what you see is what you get" (WYSIWYG) way. Although BENTO is designed for clinical NLP applications, the underlying architecture is flexible to be tailored to any other domains.
ICD Coding from Clinical Text Using Multi-Filter Residual Convolutional Neural Network.
Li, F.; and Yu, H.
In The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20), pages 8180–8187, New York City, New York, February 2020.
doi link bibtex
doi link bibtex
@inproceedings{li_icd_2020, address = {New York City, New York}, title = {{ICD} {Coding} from {Clinical} {Text} {Using} {Multi}-{Filter} {Residual} {Convolutional} {Neural} {Network}.}, shorttitle = {{AAAI} 2020}, doi = {10.1609/AAAI.V34I05.6331}, booktitle = {The {Thirty}-{Fourth} {AAAI} {Conference} on {Artificial} {Intelligence} ({AAAI}-20)}, author = {Li, Fei and Yu, Hong}, month = feb, year = {2020}, keywords = {Computer Science - Computation and Language, Computer Science - Machine Learning}, pages = {8180--8187}, }
Improved Pretraining for Domain-specific Contextual Embedding Models.
Rongali, S.; Jagannatha, A.; Rawat, B. P. S.; and Yu, H.
CoRR, abs/2004.02288. 2020.
arXiv: 2004.02288
Paper
link
bibtex
@article{rongali_improved_2020, title = {Improved {Pretraining} for {Domain}-specific {Contextual} {Embedding} {Models}}, volume = {abs/2004.02288}, url = {https://arxiv.org/abs/2004.02288}, journal = {CoRR}, author = {Rongali, Subendhu and Jagannatha, Abhyuday and Rawat, Bhanu Pratap Singh and Yu, Hong}, year = {2020}, note = {arXiv: 2004.02288}, }
Neural data-to-text generation with dynamic content planning.
Chen, K.; Li, F.; Hu, B.; Peng, W.; Chen, Q.; Yu, H.; and Xiang, Y.
Knowledge-Based Systems,106610. November 2020.
Paper
doi
link
bibtex
abstract
@article{chen_neural_2020, title = {Neural data-to-text generation with dynamic content planning}, issn = {0950-7051}, url = {http://www.sciencedirect.com/science/article/pii/S0950705120307395}, doi = {10.1016/j.knosys.2020.106610}, abstract = {Neural data-to-text generation models have achieved significant advancement in recent years. However, these models have two shortcomings: the generated texts tend to miss some vital information, and they often generate descriptions that are not consistent with the structured input data. To alleviate these problems, we propose a Neural data-to-text generation model with Dynamic content Planning, named NDP 2 2This work was completed in cooperation with Baidu Inc.for abbreviation. The NDP can utilize the previously generated text to dynamically select the appropriate entry from the given structured data. We further design a reconstruction mechanism with a novel objective function that can reconstruct the whole entry of the used data sequentially from the hidden states of the decoder, which aids the accuracy of the generated text. Empirical results show that the NDP achieves superior performance over the state-of-the-art on ROTOWIRE and NBAZHN datasets, in terms of relation generation (RG), content selection (CS), content ordering (CO) and BLEU metrics. The human evaluation result shows that the texts generated by the proposed NDP are better than the corresponding ones generated by NCP in most of time. And using the proposed reconstruction mechanism, the fidelity of the generated text can be further improved significantly.}, language = {en}, urldate = {2020-12-29}, journal = {Knowledge-Based Systems}, author = {Chen, Kai and Li, Fayuan and Hu, Baotian and Peng, Weihua and Chen, Qingcai and Yu, Hong and Xiang, Yang}, month = nov, year = {2020}, keywords = {Data-to-text, Dynamic content planning, Reconstruction mechanism}, pages = {106610}, }
Neural data-to-text generation models have achieved significant advancement in recent years. However, these models have two shortcomings: the generated texts tend to miss some vital information, and they often generate descriptions that are not consistent with the structured input data. To alleviate these problems, we propose a Neural data-to-text generation model with Dynamic content Planning, named NDP 2 2This work was completed in cooperation with Baidu Inc.for abbreviation. The NDP can utilize the previously generated text to dynamically select the appropriate entry from the given structured data. We further design a reconstruction mechanism with a novel objective function that can reconstruct the whole entry of the used data sequentially from the hidden states of the decoder, which aids the accuracy of the generated text. Empirical results show that the NDP achieves superior performance over the state-of-the-art on ROTOWIRE and NBAZHN datasets, in terms of relation generation (RG), content selection (CS), content ordering (CO) and BLEU metrics. The human evaluation result shows that the texts generated by the proposed NDP are better than the corresponding ones generated by NCP in most of time. And using the proposed reconstruction mechanism, the fidelity of the generated text can be further improved significantly.
Generating Medical Assessments Using a Neural Network Model: Algorithm Development and Validation.
Hu, B.; Bajracharya, A.; and Yu, H.
JMIR Medical Informatics, 8(1): e14971. 2020.
Company: JMIR Medical Informatics Distributor: JMIR Medical Informatics Institution: JMIR Medical Informatics Label: JMIR Medical Informatics Publisher: JMIR Publications Inc., Toronto, Canada
Paper
doi
link
bibtex
abstract
@article{hu_generating_2020, title = {Generating {Medical} {Assessments} {Using} a {Neural} {Network} {Model}: {Algorithm} {Development} and {Validation}}, volume = {8}, copyright = {Unless stated otherwise, all articles are open-access distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work (}, shorttitle = {Generating {Medical} {Assessments} {Using} a {Neural} {Network} {Model}}, url = {https://medinform.jmir.org/2020/1/e14971/}, doi = {10.2196/14971}, abstract = {Background: Since its inception, artificial intelligence has aimed to use computers to help make clinical diagnoses. Evidence-based medical reasoning is important for patient care. Inferring clinical diagnoses is a crucial step during the patient encounter. Previous works mainly used expert systems or machine learning–based methods to predict the International Classification of Diseases - Clinical Modification codes based on electronic health records. We report an alternative approach: inference of clinical diagnoses from patients’ reported symptoms and physicians’ clinical observations. Objective: We aimed to report a natural language processing system for generating medical assessments based on patient information described in the electronic health record (EHR) notes. Methods: We processed EHR notes into the Subjective, Objective, Assessment, and Plan sections. We trained a neural network model for medical assessment generation (N2MAG). Our N2MAG is an innovative deep neural model that uses the Subjective and Objective sections of an EHR note to automatically generate an “expert-like” assessment of the patient. N2MAG can be trained in an end-to-end fashion and does not require feature engineering and external knowledge resources. Results: We evaluated N2MAG and the baseline models both quantitatively and qualitatively. Evaluated by both the Recall-Oriented Understudy for Gisting Evaluation metrics and domain experts, our results show that N2MAG outperformed the existing state-of-the-art baseline models. Conclusions: N2MAG could generate a medical assessment from the Subject and Objective section descriptions in EHR notes. Future work will assess its potential for providing clinical decision support. [JMIR Med Inform 2020;8(1):e14971]}, language = {en}, number = {1}, urldate = {2020-04-07}, journal = {JMIR Medical Informatics}, author = {Hu, Baotian and Bajracharya, Adarsha and Yu, Hong}, year = {2020}, pmid = {31939742 PMCID: PMC7006435}, note = {Company: JMIR Medical Informatics Distributor: JMIR Medical Informatics Institution: JMIR Medical Informatics Label: JMIR Medical Informatics Publisher: JMIR Publications Inc., Toronto, Canada}, pages = {e14971}, }
Background: Since its inception, artificial intelligence has aimed to use computers to help make clinical diagnoses. Evidence-based medical reasoning is important for patient care. Inferring clinical diagnoses is a crucial step during the patient encounter. Previous works mainly used expert systems or machine learning–based methods to predict the International Classification of Diseases - Clinical Modification codes based on electronic health records. We report an alternative approach: inference of clinical diagnoses from patients’ reported symptoms and physicians’ clinical observations. Objective: We aimed to report a natural language processing system for generating medical assessments based on patient information described in the electronic health record (EHR) notes. Methods: We processed EHR notes into the Subjective, Objective, Assessment, and Plan sections. We trained a neural network model for medical assessment generation (N2MAG). Our N2MAG is an innovative deep neural model that uses the Subjective and Objective sections of an EHR note to automatically generate an “expert-like” assessment of the patient. N2MAG can be trained in an end-to-end fashion and does not require feature engineering and external knowledge resources. Results: We evaluated N2MAG and the baseline models both quantitatively and qualitatively. Evaluated by both the Recall-Oriented Understudy for Gisting Evaluation metrics and domain experts, our results show that N2MAG outperformed the existing state-of-the-art baseline models. Conclusions: N2MAG could generate a medical assessment from the Subject and Objective section descriptions in EHR notes. Future work will assess its potential for providing clinical decision support. [JMIR Med Inform 2020;8(1):e14971]
Dynamic Data Selection for Curriculum Learning via Ability Estimation.
Lalor, J. P.; and Yu, H.
In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 545–555, Online, November 2020. Association for Computational Linguistics
Paper
link
bibtex
abstract
@inproceedings{lalor_dynamic_2020, address = {Online}, title = {Dynamic {Data} {Selection} for {Curriculum} {Learning} via {Ability} {Estimation}}, url = {https://www.aclweb.org/anthology/2020.findings-emnlp.48}, abstract = {Curriculum learning methods typically rely on heuristics to estimate the difficulty of training examples or the ability of the model. In this work, we propose replacing difficulty heuristics with learned difficulty parameters. We also propose Dynamic Data selection for Curriculum Learning via Ability Estimation (DDaCLAE), a strategy that probes model ability at each training epoch to select the best training examples at that point. We show that models using learned difficulty and/or ability outperform heuristic-based curriculum learning models on the GLUE classification tasks.}, urldate = {2020-11-29}, booktitle = {Findings of the {Association} for {Computational} {Linguistics}: {EMNLP} 2020}, publisher = {Association for Computational Linguistics}, author = {Lalor, John P. and Yu, Hong}, month = nov, year = {2020}, pmid = {33381774 PMCID: PMC7771727}, pages = {545--555}, }
Curriculum learning methods typically rely on heuristics to estimate the difficulty of training examples or the ability of the model. In this work, we propose replacing difficulty heuristics with learned difficulty parameters. We also propose Dynamic Data selection for Curriculum Learning via Ability Estimation (DDaCLAE), a strategy that probes model ability at each training epoch to select the best training examples at that point. We show that models using learned difficulty and/or ability outperform heuristic-based curriculum learning models on the GLUE classification tasks.
Generating Accurate Electronic Health Assessment from Medical Graph.
Yang, Z.; and Yu, H.
In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3764–3773, Online, November 2020. Association for Computational Linguistics
NIHMSID: NIHMS1658452
Paper
link
bibtex
abstract
@inproceedings{yang_generating_2020, address = {Online}, title = {Generating {Accurate} {Electronic} {Health} {Assessment} from {Medical} {Graph}}, url = {https://www.aclweb.org/anthology/2020.findings-emnlp.336}, abstract = {One of the fundamental goals of artificial intelligence is to build computer-based expert systems. Inferring clinical diagnoses to generate a clinical assessment during a patient encounter is a crucial step towards building a medical diagnostic system. Previous works were mainly based on either medical domain-specific knowledge, or patients' prior diagnoses and clinical encounters. In this paper, we propose a novel model for automated clinical assessment generation (MCAG). MCAG is built on an innovative graph neural network, where rich clinical knowledge is incorporated into an end-to-end corpus-learning system. Our evaluation results against physician generated gold standard show that MCAG significantly improves the BLEU and rouge score compared with competitive baseline models. Further, physicians' evaluation showed that MCAG could generate high-quality assessments.}, urldate = {2020-11-29}, booktitle = {Findings of the {Association} for {Computational} {Linguistics}: {EMNLP} 2020}, publisher = {Association for Computational Linguistics}, author = {Yang, Zhichao and Yu, Hong}, month = nov, year = {2020}, pmcid = {PMC7821471}, pmid = {33491009}, note = {NIHMSID: NIHMS1658452}, pages = {3764--3773}, }
One of the fundamental goals of artificial intelligence is to build computer-based expert systems. Inferring clinical diagnoses to generate a clinical assessment during a patient encounter is a crucial step towards building a medical diagnostic system. Previous works were mainly based on either medical domain-specific knowledge, or patients' prior diagnoses and clinical encounters. In this paper, we propose a novel model for automated clinical assessment generation (MCAG). MCAG is built on an innovative graph neural network, where rich clinical knowledge is incorporated into an end-to-end corpus-learning system. Our evaluation results against physician generated gold standard show that MCAG significantly improves the BLEU and rouge score compared with competitive baseline models. Further, physicians' evaluation showed that MCAG could generate high-quality assessments.
Neural Data-to-Text Generation with Dynamic Content Planning.
Chen, K.; Li, F.; Hu, B.; Peng, W.; Chen, Q.; and Yu, H.
arXiv:2004.07426 [cs]. April 2020.
arXiv: 2004.07426
Paper
link
bibtex
abstract
@article{chen_neural_2020, title = {Neural {Data}-to-{Text} {Generation} with {Dynamic} {Content} {Planning}}, url = {http://arxiv.org/abs/2004.07426}, abstract = {Neural data-to-text generation models have achieved significant advancement in recent years. However, these models have two shortcomings: the generated texts tend to miss some vital information, and they often generate descriptions that are not consistent with the structured input data. To alleviate these problems, we propose a Neural data-to-text generation model with Dynamic content Planning, named NDP for abbreviation. The NDP can utilize the previously generated text to dynamically select the appropriate entry from the given structured data. We further design a reconstruction mechanism with a novel objective function that can reconstruct the whole entry of the used data sequentially from the hidden states of the decoder, which aids the accuracy of the generated text. Empirical results show that the NDP achieves superior performance over the state-of-the-art on ROTOWIRE dataset, in terms of relation generation (RG), content selection (CS), content ordering (CO) and BLEU metrics. The human evaluation result shows that the texts generated by the proposed NDP are better than the corresponding ones generated by NCP in most of time. And using the proposed reconstruction mechanism, the fidelity of the generated text can be further improved significantly.}, urldate = {2020-12-29}, journal = {arXiv:2004.07426 [cs]}, author = {Chen, Kai and Li, Fayuan and Hu, Baotian and Peng, Weihua and Chen, Qingcai and Yu, Hong}, month = apr, year = {2020}, note = {arXiv: 2004.07426}, keywords = {Computer Science - Computation and Language}, }
Neural data-to-text generation models have achieved significant advancement in recent years. However, these models have two shortcomings: the generated texts tend to miss some vital information, and they often generate descriptions that are not consistent with the structured input data. To alleviate these problems, we propose a Neural data-to-text generation model with Dynamic content Planning, named NDP for abbreviation. The NDP can utilize the previously generated text to dynamically select the appropriate entry from the given structured data. We further design a reconstruction mechanism with a novel objective function that can reconstruct the whole entry of the used data sequentially from the hidden states of the decoder, which aids the accuracy of the generated text. Empirical results show that the NDP achieves superior performance over the state-of-the-art on ROTOWIRE dataset, in terms of relation generation (RG), content selection (CS), content ordering (CO) and BLEU metrics. The human evaluation result shows that the texts generated by the proposed NDP are better than the corresponding ones generated by NCP in most of time. And using the proposed reconstruction mechanism, the fidelity of the generated text can be further improved significantly.
BENTO: A Visual Platform for Building Clinical NLP Pipelines Based on CodaLab.
Jin, Y; Li, F; and Yu, H
In AMIA Fall Symposium, 2020.
link bibtex
link bibtex
@inproceedings{jin_bento_2020, title = {{BENTO}: {A} {Visual} {Platform} for {Building} {Clinical} {NLP} {Pipelines} {Based} on {CodaLab}.}, booktitle = {{AMIA} {Fall} {Symposium}}, author = {Jin, Y and Li, F and Yu, H}, year = {2020}, }
Learning Latent Space Representations to Predict Patient Outcomes: Model Development and Validation.
Rongali, S.; Rose, A. J.; McManus, D. D.; Bajracharya, A. S.; Kapoor, A.; Granillo, E.; and Yu, H.
Journal of Medical Internet Research, 22(3): e16374. 2020.
Company: Journal of Medical Internet Research Distributor: Journal of Medical Internet Research Institution: Journal of Medical Internet Research Label: Journal of Medical Internet Research Publisher: JMIR Publications Inc., Toronto, Canada
Paper
doi
link
bibtex
abstract
@article{rongali_learning_2020, title = {Learning {Latent} {Space} {Representations} to {Predict} {Patient} {Outcomes}: {Model} {Development} and {Validation}}, volume = {22}, shorttitle = {Learning {Latent} {Space} {Representations} to {Predict} {Patient} {Outcomes}}, url = {https://www.jmir.org/2020/3/e16374/}, doi = {10.2196/16374}, abstract = {Background: Scalable and accurate health outcome prediction using electronic health record (EHR) data has gained much attention in research recently. Previous machine learning models mostly ignore relations between different types of clinical data (ie, laboratory components, International Classification of Diseases codes, and medications). Objective: This study aimed to model such relations and build predictive models using the EHR data from intensive care units. We developed innovative neural network models and compared them with the widely used logistic regression model and other state-of-the-art neural network models to predict the patient’s mortality using their longitudinal EHR data. Methods: We built a set of neural network models that we collectively called as long short-term memory (LSTM) outcome prediction using comprehensive feature relations or in short, CLOUT. Our CLOUT models use a correlational neural network model to identify a latent space representation between different types of discrete clinical features during a patient’s encounter and integrate the latent representation into an LSTM-based predictive model framework. In addition, we designed an ablation experiment to identify risk factors from our CLOUT models. Using physicians’ input as the gold standard, we compared the risk factors identified by both CLOUT and logistic regression models. Results: Experiments on the Medical Information Mart for Intensive Care-III dataset (selected patient population: 7537) show that CLOUT (area under the receiver operating characteristic curve=0.89) has surpassed logistic regression (0.82) and other baseline NN models (\<0.86). In addition, physicians’ agreement with the CLOUT-derived risk factor rankings was statistically significantly higher than the agreement with the logistic regression model. Conclusions: Our results support the applicability of CLOUT for real-world clinical use in identifying patients at high risk of mortality. Trial Registration: [J Med Internet Res 2020;22(3):e16374]}, language = {en}, number = {3}, urldate = {2020-04-07}, journal = {Journal of Medical Internet Research}, author = {Rongali, Subendhu and Rose, Adam J. and McManus, David D. and Bajracharya, Adarsha S. and Kapoor, Alok and Granillo, Edgard and Yu, Hong}, year = {2020}, pmid = {32202503 PMCID: PMC7136840}, note = {Company: Journal of Medical Internet Research Distributor: Journal of Medical Internet Research Institution: Journal of Medical Internet Research Label: Journal of Medical Internet Research Publisher: JMIR Publications Inc., Toronto, Canada}, pages = {e16374}, }
Background: Scalable and accurate health outcome prediction using electronic health record (EHR) data has gained much attention in research recently. Previous machine learning models mostly ignore relations between different types of clinical data (ie, laboratory components, International Classification of Diseases codes, and medications). Objective: This study aimed to model such relations and build predictive models using the EHR data from intensive care units. We developed innovative neural network models and compared them with the widely used logistic regression model and other state-of-the-art neural network models to predict the patient’s mortality using their longitudinal EHR data. Methods: We built a set of neural network models that we collectively called as long short-term memory (LSTM) outcome prediction using comprehensive feature relations or in short, CLOUT. Our CLOUT models use a correlational neural network model to identify a latent space representation between different types of discrete clinical features during a patient’s encounter and integrate the latent representation into an LSTM-based predictive model framework. In addition, we designed an ablation experiment to identify risk factors from our CLOUT models. Using physicians’ input as the gold standard, we compared the risk factors identified by both CLOUT and logistic regression models. Results: Experiments on the Medical Information Mart for Intensive Care-III dataset (selected patient population: 7537) show that CLOUT (area under the receiver operating characteristic curve=0.89) has surpassed logistic regression (0.82) and other baseline NN models (<0.86). In addition, physicians’ agreement with the CLOUT-derived risk factor rankings was statistically significantly higher than the agreement with the logistic regression model. Conclusions: Our results support the applicability of CLOUT for real-world clinical use in identifying patients at high risk of mortality. Trial Registration: [J Med Internet Res 2020;22(3):e16374]
2019
(15)
Improving electronic health record note comprehension with noteaid: randomized trial of electronic health record note comprehension interventions with crowdsourced workers.
Lalor, J. P.; Woolf, B.; and Yu, H.
Journal of Medical Internet Research, 21(1): e10793. 2019.
Paper
doi
link
bibtex
abstract
@article{lalor_improving_2019, title = {Improving electronic health record note comprehension with noteaid: randomized trial of electronic health record note comprehension interventions with crowdsourced workers}, volume = {21}, copyright = {Unless stated otherwise, all articles are open-access distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work (}, shorttitle = {Improving electronic health record note comprehension with noteaid}, url = {https://www.jmir.org/2019/1/e10793/}, doi = {10.2196/jmir.10793}, abstract = {Background: Patient portals are becoming more common, and with them, the ability of patients to access their personal electronic health records (EHRs). EHRs, in particular the free-text EHR notes, often contain medical jargon and terms that are difficult for laypersons to understand. There are many Web-based resources for learning more about particular diseases or conditions, including systems that directly link to lay definitions or educational materials for medical concepts. Objective: Our goal is to determine whether use of one such tool, NoteAid, leads to higher EHR note comprehension ability. We use a new EHR note comprehension assessment tool instead of patient self-reported scores. Methods: In this work, we compare a passive, self-service educational resource (MedlinePlus) with an active resource (NoteAid) where definitions are provided to the user for medical concepts that the system identifies. We use Amazon Mechanical Turk (AMT) to recruit individuals to complete ComprehENotes, a new test of EHR note comprehension. Results: Mean scores for individuals with access to NoteAid are significantly higher than the mean baseline scores, both for raw scores (P=.008) and estimated ability (P=.02). Conclusions: In our experiments, we show that the active intervention leads to significantly higher scores on the comprehension test as compared with a baseline group with no resources provided. In contrast, there is no significant difference between the group that was provided with the passive intervention and the baseline group. Finally, we analyze the demographics of the individuals who participated in our AMT task and show differences between groups that align with the current understanding of health literacy between populations. This is the first work to show improvements in comprehension using tools such as NoteAid as measured by an EHR note comprehension assessment tool as opposed to patient self-reported scores. [J Med Internet Res 2019;21(1):e10793]}, language = {en}, number = {1}, urldate = {2019-01-31}, journal = {Journal of Medical Internet Research}, author = {Lalor, John P. and Woolf, Beverly and Yu, Hong}, year = {2019}, pmid = {30664453 PMCID: 6351990}, pages = {e10793}, }
Background: Patient portals are becoming more common, and with them, the ability of patients to access their personal electronic health records (EHRs). EHRs, in particular the free-text EHR notes, often contain medical jargon and terms that are difficult for laypersons to understand. There are many Web-based resources for learning more about particular diseases or conditions, including systems that directly link to lay definitions or educational materials for medical concepts. Objective: Our goal is to determine whether use of one such tool, NoteAid, leads to higher EHR note comprehension ability. We use a new EHR note comprehension assessment tool instead of patient self-reported scores. Methods: In this work, we compare a passive, self-service educational resource (MedlinePlus) with an active resource (NoteAid) where definitions are provided to the user for medical concepts that the system identifies. We use Amazon Mechanical Turk (AMT) to recruit individuals to complete ComprehENotes, a new test of EHR note comprehension. Results: Mean scores for individuals with access to NoteAid are significantly higher than the mean baseline scores, both for raw scores (P=.008) and estimated ability (P=.02). Conclusions: In our experiments, we show that the active intervention leads to significantly higher scores on the comprehension test as compared with a baseline group with no resources provided. In contrast, there is no significant difference between the group that was provided with the passive intervention and the baseline group. Finally, we analyze the demographics of the individuals who participated in our AMT task and show differences between groups that align with the current understanding of health literacy between populations. This is the first work to show improvements in comprehension using tools such as NoteAid as measured by an EHR note comprehension assessment tool as opposed to patient self-reported scores. [J Med Internet Res 2019;21(1):e10793]
Fine-Tuning Bidirectional Encoder Representations From Transformers (BERT)–Based Models on Large-Scale Electronic Health Record Notes: An Empirical Study.
Li, F.; Jin, Y.; Liu, W.; Rawat, B. P. S.; Cai, P.; and Yu, H.
JMIR Medical Informatics, 7(3): e14830. September 2019.
Paper
doi
link
bibtex
@article{li_fine-tuning_2019, title = {Fine-{Tuning} {Bidirectional} {Encoder} {Representations} {From} {Transformers} ({BERT})–{Based} {Models} on {Large}-{Scale} {Electronic} {Health} {Record} {Notes}: {An} {Empirical} {Study}}, volume = {7}, issn = {2291-9694}, shorttitle = {Fine-{Tuning} {Bidirectional} {Encoder} {Representations} {From} {Transformers} ({BERT})–{Based} {Models} on {Large}-{Scale} {Electronic} {Health} {Record} {Notes}}, url = {http://medinform.jmir.org/2019/3/e14830/}, doi = {10.2196/14830}, language = {en}, number = {3}, urldate = {2019-10-07}, journal = {JMIR Medical Informatics}, author = {Li, Fei and Jin, Yonghao and Liu, Weisong and Rawat, Bhanu Pratap Singh and Cai, Pengshan and Yu, Hong}, month = sep, year = {2019}, pmid = {31516126 PMCID: PMC6746103}, pages = {e14830}, }
Detecting Hypoglycemia Incidents Reported in Patients’ Secure Messages: Using Cost-Sensitive Learning and Oversampling to Reduce Data Imbalance.
Chen, J.; Lalor, J.; Liu, W.; Druhl, E.; Granillo, E.; Vimalananda, V. G; and Yu, H.
Journal of Medical Internet Research, 21(3). March 2019.
Paper
doi
link
bibtex
abstract
@article{chen_detecting_2019, title = {Detecting {Hypoglycemia} {Incidents} {Reported} in {Patients}’ {Secure} {Messages}: {Using} {Cost}-{Sensitive} {Learning} and {Oversampling} to {Reduce} {Data} {Imbalance}}, volume = {21}, issn = {1439-4456}, shorttitle = {Detecting {Hypoglycemia} {Incidents} {Reported} in {Patients}’ {Secure} {Messages}}, url = {https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6431826/}, doi = {10.2196/11990}, abstract = {Background Improper dosing of medications such as insulin can cause hypoglycemic episodes, which may lead to severe morbidity or even death. Although secure messaging was designed for exchanging nonurgent messages, patients sometimes report hypoglycemia events through secure messaging. Detecting these patient-reported adverse events may help alert clinical teams and enable early corrective actions to improve patient safety. Objective We aimed to develop a natural language processing system, called HypoDetect (Hypoglycemia Detector), to automatically identify hypoglycemia incidents reported in patients’ secure messages. Methods An expert in public health annotated 3000 secure message threads between patients with diabetes and US Department of Veterans Affairs clinical teams as containing patient-reported hypoglycemia incidents or not. A physician independently annotated 100 threads randomly selected from this dataset to determine interannotator agreement. We used this dataset to develop and evaluate HypoDetect. HypoDetect incorporates 3 machine learning algorithms widely used for text classification: linear support vector machines, random forest, and logistic regression. We explored different learning features, including new knowledge-driven features. Because only 114 (3.80\%) messages were annotated as positive, we investigated cost-sensitive learning and oversampling methods to mitigate the challenge of imbalanced data. Results The interannotator agreement was Cohen kappa=.976. Using cross-validation, logistic regression with cost-sensitive learning achieved the best performance (area under the receiver operating characteristic curve=0.954, sensitivity=0.693, specificity 0.974, F1 score=0.590). Cost-sensitive learning and the ensembled synthetic minority oversampling technique improved the sensitivity of the baseline systems substantially (by 0.123 to 0.728 absolute gains). Our results show that a variety of features contributed to the best performance of HypoDetect. Conclusions Despite the challenge of data imbalance, HypoDetect achieved promising results for the task of detecting hypoglycemia incidents from secure messages. The system has a great potential to facilitate early detection and treatment of hypoglycemia.}, number = {3}, urldate = {2019-12-29}, journal = {Journal of Medical Internet Research}, author = {Chen, Jinying and Lalor, John and Liu, Weisong and Druhl, Emily and Granillo, Edgard and Vimalananda, Varsha G and Yu, Hong}, month = mar, year = {2019}, pmid = {30855231 PMCID: PMC6431826}, }
Background Improper dosing of medications such as insulin can cause hypoglycemic episodes, which may lead to severe morbidity or even death. Although secure messaging was designed for exchanging nonurgent messages, patients sometimes report hypoglycemia events through secure messaging. Detecting these patient-reported adverse events may help alert clinical teams and enable early corrective actions to improve patient safety. Objective We aimed to develop a natural language processing system, called HypoDetect (Hypoglycemia Detector), to automatically identify hypoglycemia incidents reported in patients’ secure messages. Methods An expert in public health annotated 3000 secure message threads between patients with diabetes and US Department of Veterans Affairs clinical teams as containing patient-reported hypoglycemia incidents or not. A physician independently annotated 100 threads randomly selected from this dataset to determine interannotator agreement. We used this dataset to develop and evaluate HypoDetect. HypoDetect incorporates 3 machine learning algorithms widely used for text classification: linear support vector machines, random forest, and logistic regression. We explored different learning features, including new knowledge-driven features. Because only 114 (3.80%) messages were annotated as positive, we investigated cost-sensitive learning and oversampling methods to mitigate the challenge of imbalanced data. Results The interannotator agreement was Cohen kappa=.976. Using cross-validation, logistic regression with cost-sensitive learning achieved the best performance (area under the receiver operating characteristic curve=0.954, sensitivity=0.693, specificity 0.974, F1 score=0.590). Cost-sensitive learning and the ensembled synthetic minority oversampling technique improved the sensitivity of the baseline systems substantially (by 0.123 to 0.728 absolute gains). Our results show that a variety of features contributed to the best performance of HypoDetect. Conclusions Despite the challenge of data imbalance, HypoDetect achieved promising results for the task of detecting hypoglycemia incidents from secure messages. The system has a great potential to facilitate early detection and treatment of hypoglycemia.
Automatic Detection of Hypoglycemic Events From the Electronic Health Record Notes of Diabetes Patients: Empirical Study.
Jin, Y.; Li, F.; Vimalananda, V. G.; and Yu, H.
JMIR Medical Informatics, 7(4): e14340. 2019.
Paper
doi
link
bibtex
abstract
@article{jin_automatic_2019, title = {Automatic {Detection} of {Hypoglycemic} {Events} {From} the {Electronic} {Health} {Record} {Notes} of {Diabetes} {Patients}: {Empirical} {Study}}, volume = {7}, copyright = {Unless stated otherwise, all articles are open-access distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work (}, shorttitle = {Automatic {Detection} of {Hypoglycemic} {Events} {From} the {Electronic} {Health} {Record} {Notes} of {Diabetes} {Patients}}, url = {https://medinform.jmir.org/2019/4/e14340/}, doi = {10.2196/14340}, abstract = {Background: Hypoglycemic events are common and potentially dangerous conditions among patients being treated for diabetes. Automatic detection of such events could improve patient care and is valuable in population studies. Electronic health records (EHRs) are valuable resources for the detection of such events. Objective: In this study, we aim to develop a deep-learning–based natural language processing (NLP) system to automatically detect hypoglycemic events from EHR notes. Our model is called the High-Performing System for Automatically Detecting Hypoglycemic Events (HYPE). Methods: Domain experts reviewed 500 EHR notes of diabetes patients to determine whether each sentence contained a hypoglycemic event or not. We used this annotated corpus to train and evaluate HYPE, the high-performance NLP system for hypoglycemia detection. We built and evaluated both a classical machine learning model (ie, support vector machines [SVMs]) and state-of-the-art neural network models. Results: We found that neural network models outperformed the SVM model. The convolutional neural network (CNN) model yielded the highest performance in a 10-fold cross-validation setting: mean precision=0.96 (SD 0.03), mean recall=0.86 (SD 0.03), and mean F1=0.91 (SD 0.03). Conclusions: Despite the challenges posed by small and highly imbalanced data, our CNN-based HYPE system still achieved a high performance for hypoglycemia detection. HYPE can be used for EHR-based hypoglycemia surveillance and population studies in diabetes patients. [JMIR Med Inform 2019;7(4):e14340]}, language = {en}, number = {4}, urldate = {2019-11-10}, journal = {JMIR Medical Informatics}, author = {Jin, Yonghao and Li, Fei and Vimalananda, Varsha G. and Yu, Hong}, year = {2019}, pmid = {31702562 PMCID: PMC6913754}, keywords = {adverse events, convolutional neural networks, hypoglycemia, natural language processing}, pages = {e14340}, }
Background: Hypoglycemic events are common and potentially dangerous conditions among patients being treated for diabetes. Automatic detection of such events could improve patient care and is valuable in population studies. Electronic health records (EHRs) are valuable resources for the detection of such events. Objective: In this study, we aim to develop a deep-learning–based natural language processing (NLP) system to automatically detect hypoglycemic events from EHR notes. Our model is called the High-Performing System for Automatically Detecting Hypoglycemic Events (HYPE). Methods: Domain experts reviewed 500 EHR notes of diabetes patients to determine whether each sentence contained a hypoglycemic event or not. We used this annotated corpus to train and evaluate HYPE, the high-performance NLP system for hypoglycemia detection. We built and evaluated both a classical machine learning model (ie, support vector machines [SVMs]) and state-of-the-art neural network models. Results: We found that neural network models outperformed the SVM model. The convolutional neural network (CNN) model yielded the highest performance in a 10-fold cross-validation setting: mean precision=0.96 (SD 0.03), mean recall=0.86 (SD 0.03), and mean F1=0.91 (SD 0.03). Conclusions: Despite the challenges posed by small and highly imbalanced data, our CNN-based HYPE system still achieved a high performance for hypoglycemia detection. HYPE can be used for EHR-based hypoglycemia surveillance and population studies in diabetes patients. [JMIR Med Inform 2019;7(4):e14340]
Learning to detect and understand drug discontinuation events from clinical narratives.
Liu, F.; Pradhan, R.; Druhl, E.; Freund, E.; Liu, W.; Sauer, B. C.; Cunningham, F.; Gordon, A. J.; Peters, C. B.; and Yu, H.
Journal of the American Medical Informatics Association, 26(10): 943–951. October 2019.
Paper
doi
link
bibtex
abstract
@article{liu_learning_2019, title = {Learning to detect and understand drug discontinuation events from clinical narratives}, volume = {26}, url = {https://academic.oup.com/jamia/article/26/10/943/5481540}, doi = {10.1093/jamia/ocz048}, abstract = {AbstractObjective. Identifying drug discontinuation (DDC) events and understanding their reasons are important for medication management and drug safety survei}, language = {en}, number = {10}, urldate = {2019-12-29}, journal = {Journal of the American Medical Informatics Association}, author = {Liu, Feifan and Pradhan, Richeek and Druhl, Emily and Freund, Elaine and Liu, Weisong and Sauer, Brian C. and Cunningham, Fran and Gordon, Adam J. and Peters, Celena B. and Yu, Hong}, month = oct, year = {2019}, pmid = {31034028 PMCID: PMC6748801}, pages = {943--951}, }
AbstractObjective. Identifying drug discontinuation (DDC) events and understanding their reasons are important for medication management and drug safety survei
Overview of the First Natural Language Processing Challenge for Extracting Medication, Indication, and Adverse Drug Events from Electronic Health Record Notes (MADE 1.0).
Jagannatha, A.; Liu, F.; Liu, W.; and Yu, H.
Drug Safety, (1): 99–111. January 2019.
doi link bibtex abstract
doi link bibtex abstract
@article{jagannatha_overview_2019, title = {Overview of the {First} {Natural} {Language} {Processing} {Challenge} for {Extracting} {Medication}, {Indication}, and {Adverse} {Drug} {Events} from {Electronic} {Health} {Record} {Notes} ({MADE} 1.0)}, issn = {1179-1942}, doi = {10.1007/s40264-018-0762-z}, abstract = {INTRODUCTION: This work describes the Medication and Adverse Drug Events from Electronic Health Records (MADE 1.0) corpus and provides an overview of the MADE 1.0 2018 challenge for extracting medication, indication, and adverse drug events (ADEs) from electronic health record (EHR) notes. OBJECTIVE: The goal of MADE is to provide a set of common evaluation tasks to assess the state of the art for natural language processing (NLP) systems applied to EHRs supporting drug safety surveillance and pharmacovigilance. We also provide benchmarks on the MADE dataset using the system submissions received in the MADE 2018 challenge. METHODS: The MADE 1.0 challenge has released an expert-annotated cohort of medication and ADE information comprising 1089 fully de-identified longitudinal EHR notes from 21 randomly selected patients with cancer at the University of Massachusetts Memorial Hospital. Using this cohort as a benchmark, the MADE 1.0 challenge designed three shared NLP tasks. The named entity recognition (NER) task identifies medications and their attributes (dosage, route, duration, and frequency), indications, ADEs, and severity. The relation identification (RI) task identifies relations between the named entities: medication-indication, medication-ADE, and attribute relations. The third shared task (NER-RI) evaluates NLP models that perform the NER and RI tasks jointly. In total, 11 teams from four countries participated in at least one of the three shared tasks, and 41 system submissions were received in total. RESULTS: The best systems F1 scores for NER, RI, and NER-RI were 0.82, 0.86, and 0.61, respectively. Ensemble classifiers using the team submissions improved the performance further, with an F1 score of 0.85, 0.87, and 0.66 for the three tasks, respectively. CONCLUSION: MADE results show that recent progress in NLP has led to remarkable improvements in NER and RI tasks for the clinical domain. However, some room for improvement remains, particularly in the NER-RI task.}, language = {eng}, number = {1}, journal = {Drug Safety}, author = {Jagannatha, Abhyuday and Liu, Feifan and Liu, Weisong and Yu, Hong}, month = jan, year = {2019}, pmid = {30649735 PMCID: PMC6860017}, pages = {99--111}, }
INTRODUCTION: This work describes the Medication and Adverse Drug Events from Electronic Health Records (MADE 1.0) corpus and provides an overview of the MADE 1.0 2018 challenge for extracting medication, indication, and adverse drug events (ADEs) from electronic health record (EHR) notes. OBJECTIVE: The goal of MADE is to provide a set of common evaluation tasks to assess the state of the art for natural language processing (NLP) systems applied to EHRs supporting drug safety surveillance and pharmacovigilance. We also provide benchmarks on the MADE dataset using the system submissions received in the MADE 2018 challenge. METHODS: The MADE 1.0 challenge has released an expert-annotated cohort of medication and ADE information comprising 1089 fully de-identified longitudinal EHR notes from 21 randomly selected patients with cancer at the University of Massachusetts Memorial Hospital. Using this cohort as a benchmark, the MADE 1.0 challenge designed three shared NLP tasks. The named entity recognition (NER) task identifies medications and their attributes (dosage, route, duration, and frequency), indications, ADEs, and severity. The relation identification (RI) task identifies relations between the named entities: medication-indication, medication-ADE, and attribute relations. The third shared task (NER-RI) evaluates NLP models that perform the NER and RI tasks jointly. In total, 11 teams from four countries participated in at least one of the three shared tasks, and 41 system submissions were received in total. RESULTS: The best systems F1 scores for NER, RI, and NER-RI were 0.82, 0.86, and 0.61, respectively. Ensemble classifiers using the team submissions improved the performance further, with an F1 score of 0.85, 0.87, and 0.66 for the three tasks, respectively. CONCLUSION: MADE results show that recent progress in NLP has led to remarkable improvements in NER and RI tasks for the clinical domain. However, some room for improvement remains, particularly in the NER-RI task.
Naranjo Question Answering using End-to-End Multi-task Learning Model.
Rawat, B. P; Li, F.; and Yu, H.
25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD),2547–2555. 2019.
doi link bibtex abstract
doi link bibtex abstract
@article{rawat_naranjo_2019, title = {Naranjo {Question} {Answering} using {End}-to-{End} {Multi}-task {Learning} {Model}}, doi = {10.1145/3292500.3330770}, abstract = {In the clinical domain, it is important to understand whether an adverse drug reaction (ADR) is caused by a particular medication. Clinical judgement studies help judge the causal relation between a medication and its ADRs. In this study, we present the first attempt to automatically infer the causality between a drug and an ADR from electronic health records (EHRs) by answering the Naranjo questionnaire, the validated clinical question answering set used by domain experts for ADR causality assessment. Using physicians’ annotation as the gold standard, our proposed joint model, which uses multi-task learning to predict the answers of a subset of the Naranjo questionnaire, significantly outperforms the baseline pipeline model with a good margin, achieving a macro-weighted f-score between 0.3652 – 0.5271 and micro-weighted f-score between 0.9523 – 0.9918.}, journal = {25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD)}, author = {Rawat, Bhanu P and Li, Fei and Yu, Hong}, year = {2019}, pmid = {31799022 NIHMSID: NIHMS1058295 PMCID:PMC6887102}, pages = {2547--2555}, }
In the clinical domain, it is important to understand whether an adverse drug reaction (ADR) is caused by a particular medication. Clinical judgement studies help judge the causal relation between a medication and its ADRs. In this study, we present the first attempt to automatically infer the causality between a drug and an ADR from electronic health records (EHRs) by answering the Naranjo questionnaire, the validated clinical question answering set used by domain experts for ADR causality assessment. Using physicians’ annotation as the gold standard, our proposed joint model, which uses multi-task learning to predict the answers of a subset of the Naranjo questionnaire, significantly outperforms the baseline pipeline model with a good margin, achieving a macro-weighted f-score between 0.3652 – 0.5271 and micro-weighted f-score between 0.9523 – 0.9918.
A neural abstractive summarization model guided with topic sentences. ICONIP.
Chen, C.; Hu, B.; Chen, Q.; and Yu, H.
In 2019.
link bibtex
link bibtex
@inproceedings{chen_neural_2019, title = {A neural abstractive summarization model guided with topic sentences. {ICONIP}}, author = {Chen, Chen and Hu, Baotian and Chen, Qingcai and Yu, Hong}, year = {2019}, }
An investigation of single-domain and multidomain medication and adverse drug event relation extraction from electronic health record notes using advanced deep learning models.
Li, F.; and Yu, H.
Journal of the American Medical Informatics Association, 26(7): 646–654. July 2019.
Paper
doi
link
bibtex
abstract
@article{li_investigation_2019, title = {An investigation of single-domain and multidomain medication and adverse drug event relation extraction from electronic health record notes using advanced deep learning models}, volume = {26}, url = {https://academic.oup.com/jamia/article/26/7/646/5426087}, doi = {10.1093/jamia/ocz018}, abstract = {AbstractObjective. We aim to evaluate the effectiveness of advanced deep learning models (eg, capsule network [CapNet], adversarial training [ADV]) for single-}, language = {en}, number = {7}, urldate = {2019-12-09}, journal = {Journal of the American Medical Informatics Association}, author = {Li, Fei and Yu, Hong}, month = jul, year = {2019}, pages = {646--654}, }
AbstractObjective. We aim to evaluate the effectiveness of advanced deep learning models (eg, capsule network [CapNet], adversarial training [ADV]) for single-
Anticoagulant prescribing for non-valvular atrial fibrillation in the Veterans Health Administration.
Rose, A.; Goldberg, R; McManus, D.; Kapoor, A; Wang, V; Liu, W; and Yu, H
Journal of the American Heart Association. 2019.
doi link bibtex abstract
doi link bibtex abstract
@article{rose_anticoagulant_2019, title = {Anticoagulant prescribing for non-valvular atrial fibrillation in the {Veterans} {Health} {Administration}}, doi = {10.1161/JAHA.119.012646}, abstract = {Background Direct acting oral anticoagulants (DOACs) theoretically could contribute to addressing underuse of anticoagulation in non-valvular atrial fibrillation (NVAF). Few studies have examined this prospect, however. The potential of DOACs to address underuse of anticoagulation in NVAF could be magnified within a healthcare system that sharply limits patients' exposure to out-of-pocket copayments, such as the Veterans Health Administration (VA). Methods and Results We used a clinical data set of all patients with NVAF treated within VA from 2007 to 2016 (n=987 373). We examined how the proportion of patients receiving any anticoagulation, and which agent was prescribed, changed over time. When first approved for VA use in 2011, DOACs constituted a tiny proportion of all prescriptions for anticoagulants (2\%); by 2016, this proportion had increased to 45\% of all prescriptions and 67\% of new prescriptions. Patient characteristics associated with receiving a DOAC, rather than warfarin, included white race, better kidney function, fewer comorbid conditions overall, and no history of stroke or bleeding. In 2007, before the introduction of DOACs, 56\% of VA patients with NVAF were receiving anticoagulation; this dipped to 44\% in 2012 just after the introduction of DOACs and had risen back to 51\% by 2016. Conclusions These results do not suggest that the availability of DOACs has led to an increased proportion of patients with NVAF receiving anticoagulation, even in the context of a healthcare system that sharply limits patients' exposure to out-of-pocket copayments.}, journal = {Journal of the American Heart Association}, author = {Rose, AJ and Goldberg, R and McManus, DD and Kapoor, A and Wang, V and Liu, W and Yu, H}, year = {2019}, pmid = {31441364 PMCID:PMC6755851}, }
Background Direct acting oral anticoagulants (DOACs) theoretically could contribute to addressing underuse of anticoagulation in non-valvular atrial fibrillation (NVAF). Few studies have examined this prospect, however. The potential of DOACs to address underuse of anticoagulation in NVAF could be magnified within a healthcare system that sharply limits patients' exposure to out-of-pocket copayments, such as the Veterans Health Administration (VA). Methods and Results We used a clinical data set of all patients with NVAF treated within VA from 2007 to 2016 (n=987 373). We examined how the proportion of patients receiving any anticoagulation, and which agent was prescribed, changed over time. When first approved for VA use in 2011, DOACs constituted a tiny proportion of all prescriptions for anticoagulants (2%); by 2016, this proportion had increased to 45% of all prescriptions and 67% of new prescriptions. Patient characteristics associated with receiving a DOAC, rather than warfarin, included white race, better kidney function, fewer comorbid conditions overall, and no history of stroke or bleeding. In 2007, before the introduction of DOACs, 56% of VA patients with NVAF were receiving anticoagulation; this dipped to 44% in 2012 just after the introduction of DOACs and had risen back to 51% by 2016. Conclusions These results do not suggest that the availability of DOACs has led to an increased proportion of patients with NVAF receiving anticoagulation, even in the context of a healthcare system that sharply limits patients' exposure to out-of-pocket copayments.
Learning Latent Parameters without Human Response Patterns: Item Response Theory with Artificial Crowds.
Lalor, J. P.; Wu, H.; and Yu, H.
In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4240–4250, Hong Kong, China, November 2019. Association for Computational Linguistics
NIHMSID: NIHMS1059054
Paper
doi
link
bibtex
abstract
@inproceedings{lalor_learning_2019, address = {Hong Kong, China}, title = {Learning {Latent} {Parameters} without {Human} {Response} {Patterns}: {Item} {Response} {Theory} with {Artificial} {Crowds}}, shorttitle = {Learning {Latent} {Parameters} without {Human} {Response} {Patterns}}, url = {https://www.aclweb.org/anthology/D19-1434}, doi = {10.18653/v1/D19-1434}, abstract = {Incorporating Item Response Theory (IRT) into NLP tasks can provide valuable information about model performance and behavior. Traditionally, IRT models are learned using human response pattern (RP) data, presenting a significant bottleneck for large data sets like those required for training deep neural networks (DNNs). In this work we propose learning IRT models using RPs generated from artificial crowds of DNN models. We demonstrate the effectiveness of learning IRT models using DNN-generated data through quantitative and qualitative analyses for two NLP tasks. Parameters learned from human and machine RPs for natural language inference and sentiment analysis exhibit medium to large positive correlations. We demonstrate a use-case for latent difficulty item parameters, namely training set filtering, and show that using difficulty to sample training data outperforms baseline methods. Finally, we highlight cases where human expectation about item difficulty does not match difficulty as estimated from the machine RPs.}, urldate = {2019-11-11}, booktitle = {Proceedings of the 2019 {Conference} on {Empirical} {Methods} in {Natural} {Language} {Processing} and the 9th {International} {Joint} {Conference} on {Natural} {Language} {Processing} ({EMNLP}-{IJCNLP})}, publisher = {Association for Computational Linguistics}, author = {Lalor, John P. and Wu, Hao and Yu, Hong}, month = nov, year = {2019}, pmcid = {PMC6892593}, pmid = {31803865}, note = {NIHMSID: NIHMS1059054}, pages = {4240--4250}, }
Incorporating Item Response Theory (IRT) into NLP tasks can provide valuable information about model performance and behavior. Traditionally, IRT models are learned using human response pattern (RP) data, presenting a significant bottleneck for large data sets like those required for training deep neural networks (DNNs). In this work we propose learning IRT models using RPs generated from artificial crowds of DNN models. We demonstrate the effectiveness of learning IRT models using DNN-generated data through quantitative and qualitative analyses for two NLP tasks. Parameters learned from human and machine RPs for natural language inference and sentiment analysis exhibit medium to large positive correlations. We demonstrate a use-case for latent difficulty item parameters, namely training set filtering, and show that using difficulty to sample training data outperforms baseline methods. Finally, we highlight cases where human expectation about item difficulty does not match difficulty as estimated from the machine RPs.
Clinical Question Answering from Electronic Health Records. In the MLHC 2019 research track proceedings.
Singh, B.; Li, F.; and Yu, H.
In The MLHC 2019 research track proceedings, 2019.
Paper
link
bibtex
@inproceedings{singh_clinical_2019, title = {Clinical {Question} {Answering} from {Electronic} {Health} {Records}. {In} the {MLHC} 2019 research track proceedings}, url = {https://static1.squarespace.com/static/59d5ac1780bd5ef9c396eda6/t/5d472f54d73cd5000124d13c/1564946262055/Rawat.pdf}, booktitle = {The {MLHC} 2019 research track proceedings}, author = {Singh, Bhanu and Li, Fei and Yu, Hong}, year = {2019}, }
Comparing Human and DNN-Ensemble Response Patterns for Item Response Theory Model Fitting.
Lalor, J.; Wu, H.; and Yu, H.
2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)The Workshop on Cognitive Modeling and Computational Linguistics (CMCL). 2019.
Paper
link
bibtex
@article{lalor_comparing_2019, title = {Comparing {Human} and {DNN}-{Ensemble} {Response} {Patterns} for {Item} {Response} {Theory} {Model} {Fitting}}, url = {http://jplalor.github.io/pdfs/cmcl19_irt.pdf}, journal = {2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)The Workshop on Cognitive Modeling and Computational Linguistics (CMCL)}, author = {Lalor, John and Wu, Hao and Yu, Hong}, year = {2019}, }
QuikLitE, a Framework for Quick Literacy Evaluation in Medicine: Development and Validation.
Zheng, J.; and Yu, H.
Journal of Medical Internet Research, 21(2): e12525. 2019.
Paper
doi
link
bibtex
abstract
@article{zheng_quiklite_2019, title = {{QuikLitE}, a {Framework} for {Quick} {Literacy} {Evaluation} in {Medicine}: {Development} and {Validation}}, volume = {21}, copyright = {Unless stated otherwise, all articles are open-access distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work (}, shorttitle = {{QuikLitE}, a {Framework} for {Quick} {Literacy} {Evaluation} in {Medicine}}, url = {https://www.jmir.org/2019/2/e12525/}, doi = {10.2196/jmir.12525}, abstract = {Background: A plethora of health literacy instruments was developed over the decades. They usually start with experts curating passages of text or word lists, followed by psychometric validation and revision based on test results obtained from a sample population. This process is costly and it is difficult to customize for new usage scenarios. Objective: This study aimed to develop and evaluate a framework for dynamically creating test instruments that can provide a focused assessment of patients’ health literacy. Methods: A health literacy framework and scoring method were extended from the vocabulary knowledge test to accommodate a wide range of item difficulties and various degrees of uncertainty in the participant’s answer. Web-based tests from Amazon Mechanical Turk users were used to assess reliability and validity. Results: Parallel forms of our tests showed high reliability (correlation=.78; 95\% CI 0.69-0.85). Validity measured as correlation with an electronic health record comprehension instrument was higher (.47-.61 among 3 groups) than 2 existing tools (Short Assessment of Health Literacy-English, .38-.43; Short Test of Functional Health Literacy in Adults, .34-.46). Our framework is able to distinguish higher literacy levels that are often not measured by other instruments. It is also flexible, allowing customizations to the test the designer’s focus on a particular interest in a subject matter or domain. The framework is among the fastest health literacy instrument to administer. Conclusions: We proposed a valid and highly reliable framework to dynamically create health literacy instruments, alleviating the need to repeat a time-consuming process when a new use scenario arises. This framework can be customized to a specific need on demand and can measure skills beyond the basic level. [J Med Internet Res 2019;21(2):e12525]}, language = {en}, number = {2}, urldate = {2019-02-22}, journal = {Journal of Medical Internet Research}, author = {Zheng, Jiaping and Yu, Hong}, year = {2019}, pmid = {30794206 PMCID: 6406229}, pages = {e12525}, }
Background: A plethora of health literacy instruments was developed over the decades. They usually start with experts curating passages of text or word lists, followed by psychometric validation and revision based on test results obtained from a sample population. This process is costly and it is difficult to customize for new usage scenarios. Objective: This study aimed to develop and evaluate a framework for dynamically creating test instruments that can provide a focused assessment of patients’ health literacy. Methods: A health literacy framework and scoring method were extended from the vocabulary knowledge test to accommodate a wide range of item difficulties and various degrees of uncertainty in the participant’s answer. Web-based tests from Amazon Mechanical Turk users were used to assess reliability and validity. Results: Parallel forms of our tests showed high reliability (correlation=.78; 95% CI 0.69-0.85). Validity measured as correlation with an electronic health record comprehension instrument was higher (.47-.61 among 3 groups) than 2 existing tools (Short Assessment of Health Literacy-English, .38-.43; Short Test of Functional Health Literacy in Adults, .34-.46). Our framework is able to distinguish higher literacy levels that are often not measured by other instruments. It is also flexible, allowing customizations to the test the designer’s focus on a particular interest in a subject matter or domain. The framework is among the fastest health literacy instrument to administer. Conclusions: We proposed a valid and highly reliable framework to dynamically create health literacy instruments, alleviating the need to repeat a time-consuming process when a new use scenario arises. This framework can be customized to a specific need on demand and can measure skills beyond the basic level. [J Med Internet Res 2019;21(2):e12525]
Towards Drug Safety Surveillance and Pharmacovigilance: Current Progress in Detecting Medication and Adverse Drug Events from Electronic Health Records.
Liu, F.; Jagannatha, A.; and Yu, H.
Drug Safety. January 2019.
Paper
doi
link
bibtex
@article{liu_towards_2019, title = {Towards {Drug} {Safety} {Surveillance} and {Pharmacovigilance}: {Current} {Progress} in {Detecting} {Medication} and {Adverse} {Drug} {Events} from {Electronic} {Health} {Records}}, issn = {1179-1942}, shorttitle = {Towards {Drug} {Safety} {Surveillance} and {Pharmacovigilance}}, url = {https://doi.org/10.1007/s40264-018-0766-8}, doi = {10.1007/s40264-018-0766-8}, language = {en}, urldate = {2019-01-31}, journal = {Drug Safety}, author = {Liu, Feifan and Jagannatha, Abhyuday and Yu, Hong}, month = jan, year = {2019}, pmid = {30649734}, }
2018
(13)
A natural language processing system that links medical terms in electronic health record notes to lay definitions: system development using physician reviews.
Chen, J.; Druhl, E.; Polepalli Ramesh, B.; Houston, T. K.; Brandt, C. A.; Zulman, D. M.; Vimalananda, V. G.; Malkani, S.; and Yu, H.
Journal of Medical Internet Research, 20(1): e26. January 2018.
doi link bibtex abstract
doi link bibtex abstract
@article{chen_natural_2018, title = {A natural language processing system that links medical terms in electronic health record notes to lay definitions: system development using physician reviews}, volume = {20}, issn = {1438-8871}, shorttitle = {A natural language processing system that links medical terms in electronic health record notes to lay definitions}, doi = {10.2196/jmir.8669}, abstract = {BACKGROUND: Many health care systems now allow patients to access their electronic health record (EHR) notes online through patient portals. Medical jargon in EHR notes can confuse patients, which may interfere with potential benefits of patient access to EHR notes. OBJECTIVE: The aim of this study was to develop and evaluate the usability and content quality of NoteAid, a Web-based natural language processing system that links medical terms in EHR notes to lay definitions, that is, definitions easily understood by lay people. METHODS: NoteAid incorporates two core components: CoDeMed, a lexical resource of lay definitions for medical terms, and MedLink, a computational unit that links medical terms to lay definitions. We developed innovative computational methods, including an adapted distant supervision algorithm to prioritize medical terms important for EHR comprehension to facilitate the effort of building CoDeMed. Ten physician domain experts evaluated the user interface and content quality of NoteAid. The evaluation protocol included a cognitive walkthrough session and a postsession questionnaire. Physician feedback sessions were audio-recorded. We used standard content analysis methods to analyze qualitative data from these sessions. RESULTS: Physician feedback was mixed. Positive feedback on NoteAid included (1) Easy to use, (2) Good visual display, (3) Satisfactory system speed, and (4) Adequate lay definitions. Opportunities for improvement arising from evaluation sessions and feedback included (1) improving the display of definitions for partially matched terms, (2) including more medical terms in CoDeMed, (3) improving the handling of terms whose definitions vary depending on different contexts, and (4) standardizing the scope of definitions for medicines. On the basis of these results, we have improved NoteAid's user interface and a number of definitions, and added 4502 more definitions in CoDeMed. CONCLUSIONS: Physician evaluation yielded useful feedback for content validation and refinement of this innovative tool that has the potential to improve patient EHR comprehension and experience using patient portals. Future ongoing work will develop algorithms to handle ambiguous medical terms and test and evaluate NoteAid with patients.}, language = {eng}, number = {1}, journal = {Journal of Medical Internet Research}, author = {Chen, Jinying and Druhl, Emily and Polepalli Ramesh, Balaji and Houston, Thomas K. and Brandt, Cynthia A. and Zulman, Donna M. and Vimalananda, Varsha G. and Malkani, Samir and Yu, Hong}, month = jan, year = {2018}, pmid = {29358159 PMCID: PMC5799720}, keywords = {computer software, consumer health informatics, electronic health records, natural language processing, usability testing}, pages = {e26}, }
BACKGROUND: Many health care systems now allow patients to access their electronic health record (EHR) notes online through patient portals. Medical jargon in EHR notes can confuse patients, which may interfere with potential benefits of patient access to EHR notes. OBJECTIVE: The aim of this study was to develop and evaluate the usability and content quality of NoteAid, a Web-based natural language processing system that links medical terms in EHR notes to lay definitions, that is, definitions easily understood by lay people. METHODS: NoteAid incorporates two core components: CoDeMed, a lexical resource of lay definitions for medical terms, and MedLink, a computational unit that links medical terms to lay definitions. We developed innovative computational methods, including an adapted distant supervision algorithm to prioritize medical terms important for EHR comprehension to facilitate the effort of building CoDeMed. Ten physician domain experts evaluated the user interface and content quality of NoteAid. The evaluation protocol included a cognitive walkthrough session and a postsession questionnaire. Physician feedback sessions were audio-recorded. We used standard content analysis methods to analyze qualitative data from these sessions. RESULTS: Physician feedback was mixed. Positive feedback on NoteAid included (1) Easy to use, (2) Good visual display, (3) Satisfactory system speed, and (4) Adequate lay definitions. Opportunities for improvement arising from evaluation sessions and feedback included (1) improving the display of definitions for partially matched terms, (2) including more medical terms in CoDeMed, (3) improving the handling of terms whose definitions vary depending on different contexts, and (4) standardizing the scope of definitions for medicines. On the basis of these results, we have improved NoteAid's user interface and a number of definitions, and added 4502 more definitions in CoDeMed. CONCLUSIONS: Physician evaluation yielded useful feedback for content validation and refinement of this innovative tool that has the potential to improve patient EHR comprehension and experience using patient portals. Future ongoing work will develop algorithms to handle ambiguous medical terms and test and evaluate NoteAid with patients.
Clinical Relation Extraction Toward Drug Safety Surveillance Using Electronic Health Record Narratives: Classical Learning Versus Deep Learning.
Munkhdalai, T.; Liu, F.; and Yu, H.
JMIR public health and surveillance, 4(2): e29. April 2018.
doi link bibtex abstract
doi link bibtex abstract
@article{munkhdalai_clinical_2018, title = {Clinical {Relation} {Extraction} {Toward} {Drug} {Safety} {Surveillance} {Using} {Electronic} {Health} {Record} {Narratives}: {Classical} {Learning} {Versus} {Deep} {Learning}}, volume = {4}, issn = {2369-2960}, shorttitle = {Clinical {Relation} {Extraction} {Toward} {Drug} {Safety} {Surveillance} {Using} {Electronic} {Health} {Record} {Narratives}}, doi = {10.2196/publichealth.9361}, abstract = {BACKGROUND: Medication and adverse drug event (ADE) information extracted from electronic health record (EHR) notes can be a rich resource for drug safety surveillance. Existing observational studies have mainly relied on structured EHR data to obtain ADE information; however, ADEs are often buried in the EHR narratives and not recorded in structured data. OBJECTIVE: To unlock ADE-related information from EHR narratives, there is a need to extract relevant entities and identify relations among them. In this study, we focus on relation identification. This study aimed to evaluate natural language processing and machine learning approaches using the expert-annotated medical entities and relations in the context of drug safety surveillance, and investigate how different learning approaches perform under different configurations. METHODS: We have manually annotated 791 EHR notes with 9 named entities (eg, medication, indication, severity, and ADEs) and 7 different types of relations (eg, medication-dosage, medication-ADE, and severity-ADE). Then, we explored 3 supervised machine learning systems for relation identification: (1) a support vector machines (SVM) system, (2) an end-to-end deep neural network system, and (3) a supervised descriptive rule induction baseline system. For the neural network system, we exploited the state-of-the-art recurrent neural network (RNN) and attention models. We report the performance by macro-averaged precision, recall, and F1-score across the relation types. RESULTS: Our results show that the SVM model achieved the best average F1-score of 89.1\% on test data, outperforming the long short-term memory (LSTM) model with attention (F1-score of 65.72\%) as well as the rule induction baseline system (F1-score of 7.47\%) by a large margin. The bidirectional LSTM model with attention achieved the best performance among different RNN models. With the inclusion of additional features in the LSTM model, its performance can be boosted to an average F1-score of 77.35\%. CONCLUSIONS: It shows that classical learning models (SVM) remains advantageous over deep learning models (RNN variants) for clinical relation identification, especially for long-distance intersentential relations. However, RNNs demonstrate a great potential of significant improvement if more training data become available. Our work is an important step toward mining EHRs to improve the efficacy of drug safety surveillance. Most importantly, the annotated data used in this study will be made publicly available, which will further promote drug safety research in the community.}, language = {eng}, number = {2}, journal = {JMIR public health and surveillance}, author = {Munkhdalai, Tsendsuren and Liu, Feifan and Yu, Hong}, month = apr, year = {2018}, pmid = {29695376 PMCID: PMC5943628}, keywords = {drug-related side effects and adverse reactions, electronic health records, medical informatics applications, natural language processing, neural networks}, pages = {e29}, }
BACKGROUND: Medication and adverse drug event (ADE) information extracted from electronic health record (EHR) notes can be a rich resource for drug safety surveillance. Existing observational studies have mainly relied on structured EHR data to obtain ADE information; however, ADEs are often buried in the EHR narratives and not recorded in structured data. OBJECTIVE: To unlock ADE-related information from EHR narratives, there is a need to extract relevant entities and identify relations among them. In this study, we focus on relation identification. This study aimed to evaluate natural language processing and machine learning approaches using the expert-annotated medical entities and relations in the context of drug safety surveillance, and investigate how different learning approaches perform under different configurations. METHODS: We have manually annotated 791 EHR notes with 9 named entities (eg, medication, indication, severity, and ADEs) and 7 different types of relations (eg, medication-dosage, medication-ADE, and severity-ADE). Then, we explored 3 supervised machine learning systems for relation identification: (1) a support vector machines (SVM) system, (2) an end-to-end deep neural network system, and (3) a supervised descriptive rule induction baseline system. For the neural network system, we exploited the state-of-the-art recurrent neural network (RNN) and attention models. We report the performance by macro-averaged precision, recall, and F1-score across the relation types. RESULTS: Our results show that the SVM model achieved the best average F1-score of 89.1% on test data, outperforming the long short-term memory (LSTM) model with attention (F1-score of 65.72%) as well as the rule induction baseline system (F1-score of 7.47%) by a large margin. The bidirectional LSTM model with attention achieved the best performance among different RNN models. With the inclusion of additional features in the LSTM model, its performance can be boosted to an average F1-score of 77.35%. CONCLUSIONS: It shows that classical learning models (SVM) remains advantageous over deep learning models (RNN variants) for clinical relation identification, especially for long-distance intersentential relations. However, RNNs demonstrate a great potential of significant improvement if more training data become available. Our work is an important step toward mining EHRs to improve the efficacy of drug safety surveillance. Most importantly, the annotated data used in this study will be made publicly available, which will further promote drug safety research in the community.
A hybrid Neural Network Model for Joint Prediction of Presence and Period Assertions of Medical Events in Clinical Notes.
Rumeng, L.; Abhyuday N, J.; and Hong, Y.
AMIA Annual Symposium Proceedings, 2017: 1149–1158. April 2018.
Paper
link
bibtex
abstract
@article{rumeng_hybrid_2018, title = {A hybrid {Neural} {Network} {Model} for {Joint} {Prediction} of {Presence} and {Period} {Assertions} of {Medical} {Events} in {Clinical} {Notes}}, volume = {2017}, issn = {1942-597X}, url = {https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5977733/}, abstract = {In this paper, we propose a novel neural network architecture for clinical text mining. We formulate this hybrid neural network model (HNN), composed of recurrent neural network and deep residual network, to jointly predict the presence and period assertion values associated with medical events in clinical texts. We evaluate the effectiveness of our model on a corpus of expert-annotated longitudinal Electronic Health Records (EHR) notes from Cancer patients. Our experiments show that HNN improves the joint assertion classification accuracy as compared to conventional baselines.}, urldate = {2018-10-01}, journal = {AMIA Annual Symposium Proceedings}, author = {Rumeng, Li and Abhyuday N, Jagannatha and Hong, Yu}, month = apr, year = {2018}, pmid = {29854183}, pmcid = {PMC5977733}, pages = {1149--1158}, }
In this paper, we propose a novel neural network architecture for clinical text mining. We formulate this hybrid neural network model (HNN), composed of recurrent neural network and deep residual network, to jointly predict the presence and period assertion values associated with medical events in clinical texts. We evaluate the effectiveness of our model on a corpus of expert-annotated longitudinal Electronic Health Records (EHR) notes from Cancer patients. Our experiments show that HNN improves the joint assertion classification accuracy as compared to conventional baselines.
Assessing Readability of Medical Documents: A Ranking Approach.
Zheng, J.; and Yu, H
The Journal of Medical Internet Research Medical Informatics. March 2018.
doi link bibtex abstract
doi link bibtex abstract
@article{zheng_assessing_2018, title = {Assessing {Readability} of {Medical} {Documents}: {A} {Ranking} {Approach}.}, doi = {DOI: 10.2196/medinform.8611}, abstract = {BACKGROUND: The use of electronic health record (EHR) systems with patient engagement capabilities, including viewing, downloading, and transmitting health information, has recently grown tremendously. However, using these resources to engage patients in managing their own health remains challenging due to the complex and technical nature of the EHR narratives. OBJECTIVE: Our objective was to develop a machine learning-based system to assess readability levels of complex documents such as EHR notes. METHODS: We collected difficulty ratings of EHR notes and Wikipedia articles using crowdsourcing from 90 readers. We built a supervised model to assess readability based on relative orders of text difficulty using both surface text features and word embeddings. We evaluated system performance using the Kendall coefficient of concordance against human ratings. RESULTS: Our system achieved significantly higher concordance (.734) with human annotators than did a baseline using the Flesch-Kincaid Grade Level, a widely adopted readability formula (.531). The improvement was also consistent across different disease topics. This method's concordance with an individual human user's ratings was also higher than the concordance between different human annotators (.658). CONCLUSIONS: We explored methods to automatically assess the readability levels of clinical narratives. Our ranking-based system using simple textual features and easy-to-learn word embeddings outperformed a widely used readability formula. Our ranking-based method can predict relative difficulties of medical documents. It is not constrained to a predefined set of readability levels, a common design in many machine learning-based systems. Furthermore, the feature set does not rely on complex processing of the documents. One potential application of our readability ranking is personalization, allowing patients to better accommodate their own background knowledge.}, journal = {The Journal of Medical Internet Research Medical Informatics}, author = {Zheng, JP and Yu, H}, month = mar, year = {2018}, pmid = {29572199 PMCID: PMC5889493}, }
BACKGROUND: The use of electronic health record (EHR) systems with patient engagement capabilities, including viewing, downloading, and transmitting health information, has recently grown tremendously. However, using these resources to engage patients in managing their own health remains challenging due to the complex and technical nature of the EHR narratives. OBJECTIVE: Our objective was to develop a machine learning-based system to assess readability levels of complex documents such as EHR notes. METHODS: We collected difficulty ratings of EHR notes and Wikipedia articles using crowdsourcing from 90 readers. We built a supervised model to assess readability based on relative orders of text difficulty using both surface text features and word embeddings. We evaluated system performance using the Kendall coefficient of concordance against human ratings. RESULTS: Our system achieved significantly higher concordance (.734) with human annotators than did a baseline using the Flesch-Kincaid Grade Level, a widely adopted readability formula (.531). The improvement was also consistent across different disease topics. This method's concordance with an individual human user's ratings was also higher than the concordance between different human annotators (.658). CONCLUSIONS: We explored methods to automatically assess the readability levels of clinical narratives. Our ranking-based system using simple textual features and easy-to-learn word embeddings outperformed a widely used readability formula. Our ranking-based method can predict relative difficulties of medical documents. It is not constrained to a predefined set of readability levels, a common design in many machine learning-based systems. Furthermore, the feature set does not rely on complex processing of the documents. One potential application of our readability ranking is personalization, allowing patients to better accommodate their own background knowledge.
Understanding Deep Learning Performance through an Examination of Test Set Difficulty: A Psychometric Case Study.
Lalor, J.; Wu, H.; Munkhdalai, T.; and Yu, H.
In EMNLP, 2018.
Paper
doi
link
bibtex
abstract
@inproceedings{lalor_understanding_2018, title = {Understanding {Deep} {Learning} {Performance} through an {Examination} of {Test} {Set} {Difficulty}: {A} {Psychometric} {Case} {Study}}, url = {https://arxiv.org/abs/1702.04811v3}, doi = {DOI: 10.18653/v1/D18-1500}, abstract = {Interpreting the performance of deep learning models beyond test set accuracy is challenging. Characteristics of individual data points are often not considered during evaluation, and each data point is treated equally. We examine the impact of a test set question's difficulty to determine if there is a relationship between difficulty and performance. We model difficulty using well-studied psychometric methods on human response patterns. Experiments on Natural Language Inference (NLI) and Sentiment Analysis (SA) show that the likelihood of answering a question correctly is impacted by the question's difficulty. As DNNs are trained with more data, easy examples are learned more quickly than hard examples.}, booktitle = {{EMNLP}}, author = {Lalor, John and Wu, Hao and Munkhdalai, Tsendsuren and Yu, Hong}, year = {2018}, }
Interpreting the performance of deep learning models beyond test set accuracy is challenging. Characteristics of individual data points are often not considered during evaluation, and each data point is treated equally. We examine the impact of a test set question's difficulty to determine if there is a relationship between difficulty and performance. We model difficulty using well-studied psychometric methods on human response patterns. Experiments on Natural Language Inference (NLI) and Sentiment Analysis (SA) show that the likelihood of answering a question correctly is impacted by the question's difficulty. As DNNs are trained with more data, easy examples are learned more quickly than hard examples.
Soft Label Memorization-Generalization for Natural Language Inference.
Lalor, J.; Wu, H.; and Yu, H.
In 2018.
Paper
link
bibtex
abstract
@inproceedings{lalor_soft_2018, title = {Soft {Label} {Memorization}-{Generalization} for {Natural} {Language} {Inference}.}, url = {https://arxiv.org/abs/1702.08563v3}, abstract = {Often when multiple labels are obtained for a training example it is assumed that there is an element of noise that must be accounted for. It has been shown that this disagreement can be considered signal instead of noise. In this work we investigate using soft labels for training data to improve generalization in machine learning models. However, using soft labels for training Deep Neural Networks (DNNs) is not practical due to the costs involved in obtaining multiple labels for large data sets. We propose soft label memorization-generalization (SLMG), a fine-tuning approach to using soft labels for training DNNs. We assume that differences in labels provided by human annotators represent ambiguity about the true label instead of noise. Experiments with SLMG demonstrate improved generalization performance on the Natural Language Inference (NLI) task. Our experiments show that by injecting a small percentage of soft label training data (0.03\% of training set size) we can improve generalization performance over several baselines.}, author = {Lalor, John and Wu, Hao and Yu, Hong}, year = {2018}, }
Often when multiple labels are obtained for a training example it is assumed that there is an element of noise that must be accounted for. It has been shown that this disagreement can be considered signal instead of noise. In this work we investigate using soft labels for training data to improve generalization in machine learning models. However, using soft labels for training Deep Neural Networks (DNNs) is not practical due to the costs involved in obtaining multiple labels for large data sets. We propose soft label memorization-generalization (SLMG), a fine-tuning approach to using soft labels for training DNNs. We assume that differences in labels provided by human annotators represent ambiguity about the true label instead of noise. Experiments with SLMG demonstrate improved generalization performance on the Natural Language Inference (NLI) task. Our experiments show that by injecting a small percentage of soft label training data (0.03% of training set size) we can improve generalization performance over several baselines.
Sentence Simplification with Memory-Augmented Neural Networks.
Vu, T.; Hu, B.; Munkhdalai, T.; and Yu, H.
In North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018.
doi link bibtex abstract
doi link bibtex abstract
@inproceedings{vu_sentence_2018, title = {Sentence {Simplification} with {Memory}-{Augmented} {Neural} {Networks}}, doi = {DOI:10.18653/v1/N18-2013}, abstract = {Sentence simplification aims to simplify the content and structure of complex sentences, and thus make them easier to interpret for human readers, and easier to process for downstream NLP applications. Recent advances in neural machine translation have paved the way for novel approaches to the task. In this paper, we adapt an architecture with augmented memory capacities called Neural Semantic Encoders (Munkhdalai and Yu, 2017) for sentence simplification. Our experiments demonstrate the effectiveness of our approach on different simplification datasets, both in terms of automatic evaluation measures and human judgments.}, booktitle = {North {American} {Chapter} of the {Association} for {Computational} {Linguistics}: {Human} {Language} {Technologies}}, author = {Vu, Tu and Hu, Baotian and Munkhdalai, Tsendsuren and Yu, Hong}, year = {2018}, }
Sentence simplification aims to simplify the content and structure of complex sentences, and thus make them easier to interpret for human readers, and easier to process for downstream NLP applications. Recent advances in neural machine translation have paved the way for novel approaches to the task. In this paper, we adapt an architecture with augmented memory capacities called Neural Semantic Encoders (Munkhdalai and Yu, 2017) for sentence simplification. Our experiments demonstrate the effectiveness of our approach on different simplification datasets, both in terms of automatic evaluation measures and human judgments.
Recent Trends In Oral Anticoagulant Use and Post-Discharge Complications Among Atrial Fibrillation Patients With Acute Myocardial Infarction.
Amartya Kundu; Kevin O ’Day; Darleen M. Lessard; Joel M. Gore1; Steven A. Lubitz; Hong Yu; Mohammed W. Akhter; Daniel Z. Fisher; Robert M. Hayward Jr.; Nils Henninger; Jane S. Saczynski; Allan J. Walkey; Alok Kapoor; Jorge Yarzebski; Robert J. Goldberg; and David D. McManus
In 2018. Journal of Atrial Fibrillation
doi link bibtex abstract
doi link bibtex abstract
@inproceedings{amartya_kundu_recent_2018, title = {Recent {Trends} {In} {Oral} {Anticoagulant} {Use} and {Post}-{Discharge} {Complications} {Among} {Atrial} {Fibrillation} {Patients} {With} {Acute} {Myocardial} {Infarction}}, doi = {DOI: 10.4022/jafib.1749}, abstract = {BACKGROUND: Atrial fibrillation (AF) is a common complication of acute myocardial infarction (AMI).The CHA2DS2VAScand CHADS2risk scoresare used to identifypatients with AF at risk for strokeand to guide oral anticoagulants (OAC) use, including patients with AMI. However, the epidemiology of AF, further stratifiedaccording to patients' risk of stroke, has not been wellcharacterized among those hospitalized for AMI. METHODS: We examined trends in the frequency of AF, rates of discharge OAC use, and post-discharge outcomes among 6,627 residents of the Worcester, Massachusetts area who survived hospitalization for AMI at 11 medical centers between 1997 and 2011. RESULTS: A total of 1,050AMI patients had AF (16\%) andthe majority (91\%)had a CHA2DS2VAScscore {\textgreater}2.AF rates were highest among patients in the highest stroke risk group.In comparison to patients without AF, patients with AMI and AF in the highest stroke risk category had higher rates of post-discharge complications, including higher 30-day re-hospitalization [27 \% vs. 17 \%], 30-day post-discharge death [10 \% vs. 5\%], and 1-year post-discharge death [46 \% vs. 18 \%] (p {\textless} 0.001 for all). Notably, fewerthan half of guideline-eligible AF patientsreceived an OACprescription at discharge. Usage rates for other evidence-based therapiessuch as statins and beta-blockers,lagged in comparison to AMI patients free from AF. CONCLUSIONS: Our findings highlight the need to enhance efforts towards stroke prevention among AMI survivors with AF.}, publisher = {Journal of Atrial Fibrillation}, author = {{Amartya Kundu} and {Kevin O ’Day} and {Darleen M. Lessard} and {Joel M. Gore1} and {Steven A. Lubitz} and {Hong Yu} and {Mohammed W. Akhter} and {Daniel Z. Fisher} and {Robert M. Hayward Jr.} and {Nils Henninger} and {Jane S. Saczynski} and {Allan J. Walkey} and {Alok Kapoor} and {Jorge Yarzebski} and {Robert J. Goldberg} and {David D. McManus}}, year = {2018}, pmid = {29988239 PMCID: PMC6006973}, }
BACKGROUND: Atrial fibrillation (AF) is a common complication of acute myocardial infarction (AMI).The CHA2DS2VAScand CHADS2risk scoresare used to identifypatients with AF at risk for strokeand to guide oral anticoagulants (OAC) use, including patients with AMI. However, the epidemiology of AF, further stratifiedaccording to patients' risk of stroke, has not been wellcharacterized among those hospitalized for AMI. METHODS: We examined trends in the frequency of AF, rates of discharge OAC use, and post-discharge outcomes among 6,627 residents of the Worcester, Massachusetts area who survived hospitalization for AMI at 11 medical centers between 1997 and 2011. RESULTS: A total of 1,050AMI patients had AF (16%) andthe majority (91%)had a CHA2DS2VAScscore \textgreater2.AF rates were highest among patients in the highest stroke risk group.In comparison to patients without AF, patients with AMI and AF in the highest stroke risk category had higher rates of post-discharge complications, including higher 30-day re-hospitalization [27 % vs. 17 %], 30-day post-discharge death [10 % vs. 5%], and 1-year post-discharge death [46 % vs. 18 %] (p \textless 0.001 for all). Notably, fewerthan half of guideline-eligible AF patientsreceived an OACprescription at discharge. Usage rates for other evidence-based therapiessuch as statins and beta-blockers,lagged in comparison to AMI patients free from AF. CONCLUSIONS: Our findings highlight the need to enhance efforts towards stroke prevention among AMI survivors with AF.
ComprehENotes: An Instrument to Assess Patient EHR Note Reading Comprehension of Electronic Health Record Notes: Development and Validation.
Lalor, J; Wu, H; Chen, L; Mazor, K; and Yu, H
The Journal of Medical Internet Research. April 2018.
doi link bibtex abstract
doi link bibtex abstract
@article{lalor_comprehenotes:_2018, title = {{ComprehENotes}: {An} {Instrument} to {Assess} {Patient} {EHR} {Note} {Reading} {Comprehension} of {Electronic} {Health} {Record} {Notes}: {Development} and {Validation}}, doi = {DOI: 10.2196/jmir.9380}, abstract = {BACKGROUND: Patient portals are widely adopted in the United States and allow millions of patients access to their electronic health records (EHRs), including their EHR clinical notes. A patient's ability to understand the information in the EHR is dependent on their overall health literacy. Although many tests of health literacy exist, none specifically focuses on EHR note comprehension. OBJECTIVE: The aim of this paper was to develop an instrument to assess patients' EHR note comprehension. METHODS: We identified 6 common diseases or conditions (heart failure, diabetes, cancer, hypertension, chronic obstructive pulmonary disease, and liver failure) and selected 5 representative EHR notes for each disease or condition. One note that did not contain natural language text was removed. Questions were generated from these notes using Sentence Verification Technique and were analyzed using item response theory (IRT) to identify a set of questions that represent a good test of ability for EHR note comprehension. RESULTS: Using Sentence Verification Technique, 154 questions were generated from the 29 EHR notes initially obtained. Of these, 83 were manually selected for inclusion in the Amazon Mechanical Turk crowdsourcing tasks and 55 were ultimately retained following IRT analysis. A follow-up validation with a second Amazon Mechanical Turk task and IRT analysis confirmed that the 55 questions test a latent ability dimension for EHR note comprehension. A short test of 14 items was created along with the 55-item test. CONCLUSIONS: We developed ComprehENotes, an instrument for assessing EHR note comprehension from existing EHR notes, gathered responses using crowdsourcing, and used IRT to analyze those responses, thus resulting in a set of questions to measure EHR note comprehension. Crowdsourced responses from Amazon Mechanical Turk can be used to estimate item parameters and select a subset of items for inclusion in the test set using IRT. The final set of questions is the first test of EHR note comprehension.}, journal = {The Journal of Medical Internet Research}, author = {Lalor, J and Wu, H and Chen, L and Mazor, K and Yu, H}, month = apr, year = {2018}, pmid = {29695372 PMCID: PMC5943623}, }
BACKGROUND: Patient portals are widely adopted in the United States and allow millions of patients access to their electronic health records (EHRs), including their EHR clinical notes. A patient's ability to understand the information in the EHR is dependent on their overall health literacy. Although many tests of health literacy exist, none specifically focuses on EHR note comprehension. OBJECTIVE: The aim of this paper was to develop an instrument to assess patients' EHR note comprehension. METHODS: We identified 6 common diseases or conditions (heart failure, diabetes, cancer, hypertension, chronic obstructive pulmonary disease, and liver failure) and selected 5 representative EHR notes for each disease or condition. One note that did not contain natural language text was removed. Questions were generated from these notes using Sentence Verification Technique and were analyzed using item response theory (IRT) to identify a set of questions that represent a good test of ability for EHR note comprehension. RESULTS: Using Sentence Verification Technique, 154 questions were generated from the 29 EHR notes initially obtained. Of these, 83 were manually selected for inclusion in the Amazon Mechanical Turk crowdsourcing tasks and 55 were ultimately retained following IRT analysis. A follow-up validation with a second Amazon Mechanical Turk task and IRT analysis confirmed that the 55 questions test a latent ability dimension for EHR note comprehension. A short test of 14 items was created along with the 55-item test. CONCLUSIONS: We developed ComprehENotes, an instrument for assessing EHR note comprehension from existing EHR notes, gathered responses using crowdsourcing, and used IRT to analyze those responses, thus resulting in a set of questions to measure EHR note comprehension. Crowdsourced responses from Amazon Mechanical Turk can be used to estimate item parameters and select a subset of items for inclusion in the test set using IRT. The final set of questions is the first test of EHR note comprehension.
Detecting Hypoglycemia Incidence from Patients’ Secure Messages.
Chen, J; and Yu, H
In 2018.
link bibtex
link bibtex
@inproceedings{chen_detecting_2018, title = {Detecting {Hypoglycemia} {Incidence} from {Patients}’ {Secure} {Messages}}, author = {Chen, J and Yu, H}, year = {2018}, }
Extraction of Information Related to Adverse Drug Events from Electronic Health Record Notes: Design of an End-to-End Model Based on Deep Learning.
Li, F.; Liu, W.; and Yu, H.
JMIR medical informatics, 6(4): e12159. November 2018.
doi link bibtex abstract
doi link bibtex abstract
@article{li_extraction_2018, title = {Extraction of {Information} {Related} to {Adverse} {Drug} {Events} from {Electronic} {Health} {Record} {Notes}: {Design} of an {End}-to-{End} {Model} {Based} on {Deep} {Learning}}, volume = {6}, issn = {2291-9694}, shorttitle = {Extraction of {Information} {Related} to {Adverse} {Drug} {Events} from {Electronic} {Health} {Record} {Notes}}, doi = {10.2196/12159}, abstract = {BACKGROUND: Pharmacovigilance and drug-safety surveillance are crucial for monitoring adverse drug events (ADEs), but the main ADE-reporting systems such as Food and Drug Administration Adverse Event Reporting System face challenges such as underreporting. Therefore, as complementary surveillance, data on ADEs are extracted from electronic health record (EHR) notes via natural language processing (NLP). As NLP develops, many up-to-date machine-learning techniques are introduced in this field, such as deep learning and multi-task learning (MTL). However, only a few studies have focused on employing such techniques to extract ADEs. OBJECTIVE: We aimed to design a deep learning model for extracting ADEs and related information such as medications and indications. Since extraction of ADE-related information includes two steps-named entity recognition and relation extraction-our second objective was to improve the deep learning model using multi-task learning between the two steps. METHODS: We employed the dataset from the Medication, Indication and Adverse Drug Events (MADE) 1.0 challenge to train and test our models. This dataset consists of 1089 EHR notes of cancer patients and includes 9 entity types such as Medication, Indication, and ADE and 7 types of relations between these entities. To extract information from the dataset, we proposed a deep-learning model that uses a bidirectional long short-term memory (BiLSTM) conditional random field network to recognize entities and a BiLSTM-Attention network to extract relations. To further improve the deep-learning model, we employed three typical MTL methods, namely, hard parameter sharing, parameter regularization, and task relation learning, to build three MTL models, called HardMTL, RegMTL, and LearnMTL, respectively. RESULTS: Since extraction of ADE-related information is a two-step task, the result of the second step (ie, relation extraction) was used to compare all models. We used microaveraged precision, recall, and F1 as evaluation metrics. Our deep learning model achieved state-of-the-art results (F1=65.9\%), which is significantly higher than that (F1=61.7\%) of the best system in the MADE1.0 challenge. HardMTL further improved the F1 by 0.8\%, boosting the F1 to 66.7\%, whereas RegMTL and LearnMTL failed to boost the performance. CONCLUSIONS: Deep learning models can significantly improve the performance of ADE-related information extraction. MTL may be effective for named entity recognition and relation extraction, but it depends on the methods, data, and other factors. Our results can facilitate research on ADE detection, NLP, and machine learning.}, language = {eng}, number = {4}, journal = {JMIR medical informatics}, author = {Li, Fei and Liu, Weisong and Yu, Hong}, month = nov, year = {2018}, pmid = {30478023 PMCID: PMC6288593}, keywords = {adverse drug event, deep learning, multi-task learning, named entity recognition, natural language processing, relation extraction}, pages = {e12159}, }
BACKGROUND: Pharmacovigilance and drug-safety surveillance are crucial for monitoring adverse drug events (ADEs), but the main ADE-reporting systems such as Food and Drug Administration Adverse Event Reporting System face challenges such as underreporting. Therefore, as complementary surveillance, data on ADEs are extracted from electronic health record (EHR) notes via natural language processing (NLP). As NLP develops, many up-to-date machine-learning techniques are introduced in this field, such as deep learning and multi-task learning (MTL). However, only a few studies have focused on employing such techniques to extract ADEs. OBJECTIVE: We aimed to design a deep learning model for extracting ADEs and related information such as medications and indications. Since extraction of ADE-related information includes two steps-named entity recognition and relation extraction-our second objective was to improve the deep learning model using multi-task learning between the two steps. METHODS: We employed the dataset from the Medication, Indication and Adverse Drug Events (MADE) 1.0 challenge to train and test our models. This dataset consists of 1089 EHR notes of cancer patients and includes 9 entity types such as Medication, Indication, and ADE and 7 types of relations between these entities. To extract information from the dataset, we proposed a deep-learning model that uses a bidirectional long short-term memory (BiLSTM) conditional random field network to recognize entities and a BiLSTM-Attention network to extract relations. To further improve the deep-learning model, we employed three typical MTL methods, namely, hard parameter sharing, parameter regularization, and task relation learning, to build three MTL models, called HardMTL, RegMTL, and LearnMTL, respectively. RESULTS: Since extraction of ADE-related information is a two-step task, the result of the second step (ie, relation extraction) was used to compare all models. We used microaveraged precision, recall, and F1 as evaluation metrics. Our deep learning model achieved state-of-the-art results (F1=65.9%), which is significantly higher than that (F1=61.7%) of the best system in the MADE1.0 challenge. HardMTL further improved the F1 by 0.8%, boosting the F1 to 66.7%, whereas RegMTL and LearnMTL failed to boost the performance. CONCLUSIONS: Deep learning models can significantly improve the performance of ADE-related information extraction. MTL may be effective for named entity recognition and relation extraction, but it depends on the methods, data, and other factors. Our results can facilitate research on ADE detection, NLP, and machine learning.
Reference Standard Development to Train Natural Language Processing Algorithms to Detect Problematic Buprenorphine-Naloxone Therapy.
Celena B Peters; Fran Cunningham; Adam Gordon; Hong Yu; Cedric Salone; Jessica Zacher; Ronald Carico; Jianwei Leng; Nikolh Durley; Weisong Liu; Chao-Chin Lu; Emily Druhl; Feifan Liu; and Brian C Sauer
In VA Pharmacy Informatics Conference 2018, 2018.
Paper
link
bibtex
@inproceedings{celena_b_peters_reference_2018, title = {Reference {Standard} {Development} to {Train} {Natural} {Language} {Processing} {Algorithms} to {Detect} {Problematic} {Buprenorphine}-{Naloxone} {Therapy}}, url = {https://vapharmacytraining.remote-learner.net/mod/resource/view.php?id=13218}, booktitle = {{VA} {Pharmacy} {Informatics} {Conference} 2018}, author = {{Celena B Peters} and {Fran Cunningham} and {Adam Gordon} and {Hong Yu} and {Cedric Salone} and {Jessica Zacher} and {Ronald Carico} and {Jianwei Leng} and {Nikolh Durley} and {Weisong Liu} and {Chao-Chin Lu} and {Emily Druhl} and {Feifan Liu} and {Brian C Sauer}}, year = {2018}, }
Inadequate diversity of information resources searched in US-affiliated systematic reviews and meta-analyses: 2005-2016.
Pradhan, R.; Garnick, K.; Barkondaj, B.; Jordan, H. S.; Ash, A.; and Yu, H.
Journal of Clinical Epidemiology, 102: 50–62. October 2018.
doi link bibtex abstract
doi link bibtex abstract
@article{pradhan_inadequate_2018, title = {Inadequate diversity of information resources searched in {US}-affiliated systematic reviews and meta-analyses: 2005-2016}, volume = {102}, issn = {1878-5921}, shorttitle = {Inadequate diversity of information resources searched in {US}-affiliated systematic reviews and meta-analyses}, doi = {10.1016/j.jclinepi.2018.05.024}, abstract = {OBJECTIVE: Systematic reviews and meta-analyses (SRMAs) rely upon comprehensive searches into diverse resources that catalog primary studies. However, since what constitutes a comprehensive search is unclear, we examined trends in databases searched from 2005-2016, surrounding the publication of search guidelines in 2013, and associations between resources searched and evidence of publication bias in SRMAs involving human subjects. STUDY DESIGN: To ensure comparability of included SRMAs over the 12 years in the face of a near 100-fold increase of international SRMAs (mainly genetic studies from China) during this period, we focused on USA-affiliated SRMAs, manually reviewing 100 randomly selected SRMAs from those published in each year. After excluding articles (mainly for inadequate detail or out-of-scope methods), we identified factors associated with the databases searched, used network analysis to see which resources were simultaneously searched, and used logistic regression to link information sources searched with a lower chance of finding publication bias. RESULTS: Among 817 SRMA articles studied, the common resources used were Medline (95\%), EMBASE (44\%), and Cochrane (41\%). Methods journal SRMAs were most likely to use registries and grey literature resources. We found substantial co-searching of resources with only published materials, and not complemented by searches of registries and the grey literature. The 2013 guideline did not substantially increase searching of registries and grey literature resources to retrieve primary studies for the SRMAs. When used to augment Medline, Scopus (in all SRMAs) and ClinicalTrials.gov (in SRMAs with safety outcomes) were negatively associated with publication bias. CONCLUSIONS: Even SRMAs that search multiple sources tend to search similar resources. Our study supports searching Scopus and CTG in addition to Medline to reduce the chance of publication bias.}, language = {eng}, journal = {Journal of Clinical Epidemiology}, author = {Pradhan, Richeek and Garnick, Kyle and Barkondaj, Bikramjit and Jordan, Harmon S. and Ash, Arlene and Yu, Hong}, month = oct, year = {2018}, pmid = {29879464}, pmcid = {PMC6250602}, keywords = {Evidence synthesis, Grey literature, Literature databases, Meta-analysis, Publication bias, Systematic review, Trial registries}, pages = {50--62}, }
OBJECTIVE: Systematic reviews and meta-analyses (SRMAs) rely upon comprehensive searches into diverse resources that catalog primary studies. However, since what constitutes a comprehensive search is unclear, we examined trends in databases searched from 2005-2016, surrounding the publication of search guidelines in 2013, and associations between resources searched and evidence of publication bias in SRMAs involving human subjects. STUDY DESIGN: To ensure comparability of included SRMAs over the 12 years in the face of a near 100-fold increase of international SRMAs (mainly genetic studies from China) during this period, we focused on USA-affiliated SRMAs, manually reviewing 100 randomly selected SRMAs from those published in each year. After excluding articles (mainly for inadequate detail or out-of-scope methods), we identified factors associated with the databases searched, used network analysis to see which resources were simultaneously searched, and used logistic regression to link information sources searched with a lower chance of finding publication bias. RESULTS: Among 817 SRMA articles studied, the common resources used were Medline (95%), EMBASE (44%), and Cochrane (41%). Methods journal SRMAs were most likely to use registries and grey literature resources. We found substantial co-searching of resources with only published materials, and not complemented by searches of registries and the grey literature. The 2013 guideline did not substantially increase searching of registries and grey literature resources to retrieve primary studies for the SRMAs. When used to augment Medline, Scopus (in all SRMAs) and ClinicalTrials.gov (in SRMAs with safety outcomes) were negatively associated with publication bias. CONCLUSIONS: Even SRMAs that search multiple sources tend to search similar resources. Our study supports searching Scopus and CTG in addition to Medline to reduce the chance of publication bias.
2017
(10)
Ranking medical terms to support expansion of lay language resources for patient comprehension of electronic health record notes: adapted distant supervision approach.
Chen, J.; Jagannatha, A. N.; Fodeh, S. J.; and Yu, H.
JMIR medical informatics, 5(4): e42. October 2017.
doi link bibtex abstract
doi link bibtex abstract
@article{chen_ranking_2017, title = {Ranking medical terms to support expansion of lay language resources for patient comprehension of electronic health record notes: adapted distant supervision approach}, volume = {5}, issn = {2291-9694}, shorttitle = {Ranking medical terms to support expansion of lay language resources for patient comprehension of electronic health record notes}, doi = {10.2196/medinform.8531}, abstract = {BACKGROUND: Medical terms are a major obstacle for patients to comprehend their electronic health record (EHR) notes. Clinical natural language processing (NLP) systems that link EHR terms to lay terms or definitions allow patients to easily access helpful information when reading through their EHR notes, and have shown to improve patient EHR comprehension. However, high-quality lay language resources for EHR terms are very limited in the public domain. Because expanding and curating such a resource is a costly process, it is beneficial and even necessary to identify terms important for patient EHR comprehension first. OBJECTIVE: We aimed to develop an NLP system, called adapted distant supervision (ADS), to rank candidate terms mined from EHR corpora. We will give EHR terms ranked as high by ADS a higher priority for lay language annotation-that is, creating lay definitions for these terms. METHODS: Adapted distant supervision uses distant supervision from consumer health vocabulary and transfer learning to adapt itself to solve the problem of ranking EHR terms in the target domain. We investigated 2 state-of-the-art transfer learning algorithms (ie, feature space augmentation and supervised distant supervision) and designed 5 types of learning features, including distributed word representations learned from large EHR data for ADS. For evaluating ADS, we asked domain experts to annotate 6038 candidate terms as important or nonimportant for EHR comprehension. We then randomly divided these data into the target-domain training data (1000 examples) and the evaluation data (5038 examples). We compared ADS with 2 strong baselines, including standard supervised learning, on the evaluation data. RESULTS: The ADS system using feature space augmentation achieved the best average precision, 0.850, on the evaluation set when using 1000 target-domain training examples. The ADS system using supervised distant supervision achieved the best average precision, 0.819, on the evaluation set when using only 100 target-domain training examples. The 2 ADS systems both performed significantly better than the baseline systems (P{\textless}.001 for all measures and all conditions). Using a rich set of learning features contributed to ADS's performance substantially. CONCLUSIONS: ADS can effectively rank terms mined from EHRs. Transfer learning improved ADS's performance even with a small number of target-domain training examples. EHR terms prioritized by ADS were used to expand a lay language resource that supports patient EHR comprehension. The top 10,000 EHR terms ranked by ADS are available upon request.}, language = {eng}, number = {4}, journal = {JMIR medical informatics}, author = {Chen, Jinying and Jagannatha, Abhyuday N. and Fodeh, Samah J. and Yu, Hong}, month = oct, year = {2017}, pmid = {29089288}, pmcid = {PMC5686421}, keywords = {Information extraction, electronic health records, lexical entry selection, natural language processing, transfer learning}, pages = {e42}, }
BACKGROUND: Medical terms are a major obstacle for patients to comprehend their electronic health record (EHR) notes. Clinical natural language processing (NLP) systems that link EHR terms to lay terms or definitions allow patients to easily access helpful information when reading through their EHR notes, and have shown to improve patient EHR comprehension. However, high-quality lay language resources for EHR terms are very limited in the public domain. Because expanding and curating such a resource is a costly process, it is beneficial and even necessary to identify terms important for patient EHR comprehension first. OBJECTIVE: We aimed to develop an NLP system, called adapted distant supervision (ADS), to rank candidate terms mined from EHR corpora. We will give EHR terms ranked as high by ADS a higher priority for lay language annotation-that is, creating lay definitions for these terms. METHODS: Adapted distant supervision uses distant supervision from consumer health vocabulary and transfer learning to adapt itself to solve the problem of ranking EHR terms in the target domain. We investigated 2 state-of-the-art transfer learning algorithms (ie, feature space augmentation and supervised distant supervision) and designed 5 types of learning features, including distributed word representations learned from large EHR data for ADS. For evaluating ADS, we asked domain experts to annotate 6038 candidate terms as important or nonimportant for EHR comprehension. We then randomly divided these data into the target-domain training data (1000 examples) and the evaluation data (5038 examples). We compared ADS with 2 strong baselines, including standard supervised learning, on the evaluation data. RESULTS: The ADS system using feature space augmentation achieved the best average precision, 0.850, on the evaluation set when using 1000 target-domain training examples. The ADS system using supervised distant supervision achieved the best average precision, 0.819, on the evaluation set when using only 100 target-domain training examples. The 2 ADS systems both performed significantly better than the baseline systems (P\textless.001 for all measures and all conditions). Using a rich set of learning features contributed to ADS's performance substantially. CONCLUSIONS: ADS can effectively rank terms mined from EHRs. Transfer learning improved ADS's performance even with a small number of target-domain training examples. EHR terms prioritized by ADS were used to expand a lay language resource that supports patient EHR comprehension. The top 10,000 EHR terms ranked by ADS are available upon request.
Meta Networks.
Munkhdalai, T.; and Yu, H.
In ICML, volume 70, pages 2554–2563, Sydney, Australia, August 2017.
link bibtex abstract
link bibtex abstract
@inproceedings{munkhdalai_meta_2017, address = {Sydney, Australia}, title = {Meta {Networks}}, volume = {70}, abstract = {Neural networks have been successfully applied in applications with a large amount of labeled data. However, the task of rapid generalization on new concepts with small training data while preserving performances on previously learned ones still presents a significant challenge to neural network models. In this work, we introduce a novel meta learning method, Meta Networks (MetaNet), that learns a meta-level knowledge across tasks and shifts its inductive biases via fast parameterization for rapid generalization. When evaluated on Omniglot and Mini-ImageNet benchmarks, our MetaNet models achieve a near human-level performance and outperform the baseline approaches by up to 6\% accuracy. We demonstrate several appealing properties of MetaNet relating to generalization and continual learning.}, booktitle = {{ICML}}, author = {Munkhdalai, Tsendsuren and Yu, Hong}, month = aug, year = {2017}, pmid = {31106300; PMCID: PMC6519722}, pages = {2554--2563}, }
Neural networks have been successfully applied in applications with a large amount of labeled data. However, the task of rapid generalization on new concepts with small training data while preserving performances on previously learned ones still presents a significant challenge to neural network models. In this work, we introduce a novel meta learning method, Meta Networks (MetaNet), that learns a meta-level knowledge across tasks and shifts its inductive biases via fast parameterization for rapid generalization. When evaluated on Omniglot and Mini-ImageNet benchmarks, our MetaNet models achieve a near human-level performance and outperform the baseline approaches by up to 6% accuracy. We demonstrate several appealing properties of MetaNet relating to generalization and continual learning.
Neural Semantic Encoders.
Munkhdalai, T; and Yu, H.
In European Chapter of the Association for Computational Linguistics 2017 (EACL), volume 1, pages 397–407, April 2017.