
Excellent! Next you can
create a new website with this list, or
embed it in an existing web page.
This is just a preview! If you would like to use this list on your web page
or create a new webpage based on this,
create a free account and upload
the file there. Then you will be able to modify it going forward.
To the site owner:
Action required! Mendeley is changing its API. In order to keep using Mendeley with BibBase past April 14th, you need to:
- renew the authorization for BibBase on Mendeley, and
- update the BibBase URL in your page the same way you did when you initially set up this page.
2023
(13)
Revisiting the Architectures like Pointer Networks to Efficiently Improve the Next Word Distribution, Summarization Factuality, and Beyond.
Chang, H.; Yao, Z.; Gon, A.; Yu, H.; and McCallum, A.
July 2023.
ACL 2023, equal contribution from the first two authors.
Paper
link
bibtex
abstract
@misc{chang_revisiting_2023, address = {Canada}, title = {Revisiting the {Architectures} like {Pointer} {Networks} to {Efficiently} {Improve} the {Next} {Word} {Distribution}, {Summarization} {Factuality}, and {Beyond}}, url = {http://arxiv.org/abs/2305.12289}, abstract = {Is the output softmax layer, which is adopted by most language models (LMs), always the best way to compute the next word probability? Given so many attention layers in a modern transformer-based LM, are the pointer networks redundant nowadays? In this study, we discover that the answers to both questions are no. This is because the softmax bottleneck sometimes prevents the LMs from predicting the desired distribution and the pointer networks can be used to break the bottleneck efficiently. Based on the finding, we propose several softmax alternatives by simplifying the pointer networks and accelerating the word-by-word rerankers. In GPT-2, our proposals are significantly better and more efficient than mixture of softmax, a state-of-the-art softmax alternative. In summarization experiments, without significantly decreasing its training/testing speed, our best method based on T5-Small improves factCC score by 2 points in CNN/DM and XSUM dataset, and improves MAUVE scores by 30\% in BookSum paragraph-level dataset.}, urldate = {2023-05-23}, publisher = {arXiv}, author = {Chang, Haw-Shiuan and Yao, Zonghai and Gon, Alolika and Yu, Hong and McCallum, Andrew}, month = jul, year = {2023}, note = {ACL 2023, equal contribution from the first two authors.}, keywords = {Computer Science - Computation and Language}, }
Is the output softmax layer, which is adopted by most language models (LMs), always the best way to compute the next word probability? Given so many attention layers in a modern transformer-based LM, are the pointer networks redundant nowadays? In this study, we discover that the answers to both questions are no. This is because the softmax bottleneck sometimes prevents the LMs from predicting the desired distribution and the pointer networks can be used to break the bottleneck efficiently. Based on the finding, we propose several softmax alternatives by simplifying the pointer networks and accelerating the word-by-word rerankers. In GPT-2, our proposals are significantly better and more efficient than mixture of softmax, a state-of-the-art softmax alternative. In summarization experiments, without significantly decreasing its training/testing speed, our best method based on T5-Small improves factCC score by 2 points in CNN/DM and XSUM dataset, and improves MAUVE scores by 30% in BookSum paragraph-level dataset.
PaniniQA: Enhancing Patient Education Through Interactive Question Answering.
Cai, P.; Yao, Z.; Liu, F.; Wang, D.; Reilly, M.; Zhou, H.; Li, L.; Cao, Y.; Kapoor, A.; Bajracharya, A.; Berlowtiz, D.; and Yu, H.
Transactions of the Association for Computational Linguistics. 2023.
Accepted. Equal contribution from the first two authors.
link bibtex
link bibtex
@article{cai_paniniqa_2023, title = {{PaniniQA}: {Enhancing} {Patient} {Education} {Through} {Interactive} {Question} {Answering}}, journal = {Transactions of the Association for Computational Linguistics}, author = {Cai, Pengshan and Yao, Zonghai and Liu, Fei and Wang, Dakuo and Reilly, Meghan and Zhou, Huixue and Li, Lingxi and Cao, Yi and Kapoor, Alok and Bajracharya, Adarsha and Berlowtiz, Dan and Yu, Hong}, year = {2023}, note = {Accepted. Equal contribution from the first two authors.}, }
Automated identification of eviction status from electronic health record notes.
Yao, Z.; Tsai, J.; Liu, W.; Levy, D. A; Druhl, E.; Reisman, J. I; and Yu, H.
Journal of the American Medical Informatics Association,ocad081. May 2023.
Paper
doi
link
bibtex
abstract
@article{yao_automated_2023, title = {Automated identification of eviction status from electronic health record notes}, issn = {1527-974X}, url = {https://doi.org/10.1093/jamia/ocad081}, doi = {10.1093/jamia/ocad081}, abstract = {Evictions are important social and behavioral determinants of health. Evictions are associated with a cascade of negative events that can lead to unemployment, housing insecurity/homelessness, long-term poverty, and mental health problems. In this study, we developed a natural language processing system to automatically detect eviction status from electronic health record (EHR) notes.We first defined eviction status (eviction presence and eviction period) and then annotated eviction status in 5000 EHR notes from the Veterans Health Administration (VHA). We developed a novel model, KIRESH, that has shown to substantially outperform other state-of-the-art models such as fine-tuning pretrained language models like BioBERT and Bio\_ClinicalBERT. Moreover, we designed a novel prompt to further improve the model performance by using the intrinsic connection between the 2 subtasks of eviction presence and period prediction. Finally, we used the Temperature Scaling-based Calibration on our KIRESH-Prompt method to avoid overconfidence issues arising from the imbalance dataset.KIRESH-Prompt substantially outperformed strong baseline models including fine-tuning the Bio\_ClinicalBERT model to achieve 0.74672 MCC, 0.71153 Macro-F1, and 0.83396 Micro-F1 in predicting eviction period and 0.66827 MCC, 0.62734 Macro-F1, and 0.7863 Micro-F1 in predicting eviction presence. We also conducted additional experiments on a benchmark social determinants of health (SBDH) dataset to demonstrate the generalizability of our methods.KIRESH-Prompt has substantially improved eviction status classification. We plan to deploy KIRESH-Prompt to the VHA EHRs as an eviction surveillance system to help address the US Veterans’ housing insecurity.}, urldate = {2023-05-19}, journal = {Journal of the American Medical Informatics Association}, author = {Yao, Zonghai and Tsai, Jack and Liu, Weisong and Levy, David A and Druhl, Emily and Reisman, Joel I and Yu, Hong}, month = may, year = {2023}, keywords = {Computer Science - Computation and Language}, pages = {ocad081}, }
Evictions are important social and behavioral determinants of health. Evictions are associated with a cascade of negative events that can lead to unemployment, housing insecurity/homelessness, long-term poverty, and mental health problems. In this study, we developed a natural language processing system to automatically detect eviction status from electronic health record (EHR) notes.We first defined eviction status (eviction presence and eviction period) and then annotated eviction status in 5000 EHR notes from the Veterans Health Administration (VHA). We developed a novel model, KIRESH, that has shown to substantially outperform other state-of-the-art models such as fine-tuning pretrained language models like BioBERT and Bio_ClinicalBERT. Moreover, we designed a novel prompt to further improve the model performance by using the intrinsic connection between the 2 subtasks of eviction presence and period prediction. Finally, we used the Temperature Scaling-based Calibration on our KIRESH-Prompt method to avoid overconfidence issues arising from the imbalance dataset.KIRESH-Prompt substantially outperformed strong baseline models including fine-tuning the Bio_ClinicalBERT model to achieve 0.74672 MCC, 0.71153 Macro-F1, and 0.83396 Micro-F1 in predicting eviction period and 0.66827 MCC, 0.62734 Macro-F1, and 0.7863 Micro-F1 in predicting eviction presence. We also conducted additional experiments on a benchmark social determinants of health (SBDH) dataset to demonstrate the generalizability of our methods.KIRESH-Prompt has substantially improved eviction status classification. We plan to deploy KIRESH-Prompt to the VHA EHRs as an eviction surveillance system to help address the US Veterans’ housing insecurity.
Buprenorphine use and courses of care for opioid use disorder treatment within the Veterans Health Administration.
Gordon, A. J.; Saxon, A. J.; Kertesz, S.; Wyse, J. J.; Manhapra, A.; Lin, L. A.; Chen, W.; Hansen, J.; Pinnell, D.; Huynh, T.; Baylis, J. D.; Cunningham, F. E.; Ghitza, U. E.; Bart, G.; Yu, H.; and Sauer, B. C.
Drug and Alcohol Dependence, 248: 109902. July 2023.
Paper
doi
link
bibtex
abstract
@article{gordon_buprenorphine_2023, title = {Buprenorphine use and courses of care for opioid use disorder treatment within the {Veterans} {Health} {Administration}}, volume = {248}, issn = {0376-8716}, url = {https://www.sciencedirect.com/science/article/pii/S0376871623001400}, doi = {10.1016/j.drugalcdep.2023.109902}, abstract = {Background Retention of patients in buprenorphine medication treatment for opioid use disorder (B-MOUD) reduces harms associated with opioid use disorder (OUD). We sought to characterize the patients receiving B-MOUD and courses of B-MOUD in a large healthcare system. Methods We conducted a retrospective, open cohort study of patients with OUD who either did or did not receive B-MOUD courses within the Veterans Health Administration (VHA) from January 2006 through July 2019, using VHA clinical data. We compared patients receiving or not receiving B-MOUD, characterized B-MOUD courses (e.g., length and doses), and examined persistence, across patient characteristics, over time. We used analyses for normally or non-normally distributed continuous variables, categorical data, and persistence over time (Kaplan-Meier persistence curves). Results We identified 255,726 Veterans with OUD; 40,431 (15.8\%) had received 63,929 B-MOUD courses. Compared to patients with OUD without B-MOUD, patients with B-MOUD were younger, more often of white race, and had more co-morbidities. The frequency of new B-MOUD starts and prevalent B-MOUD patients ranged from 1550 and 1989 in 2007 to 8146 and 16,505 in 2018, respectively. The median duration of B-MOUD was 157 (IQR: 37–537) days for all courses and 33.8\% patients had more than one course. The average proportion days covered was 90\% (SD: 0.15), and the average prescribed daily dose was 13.44 (SD: 6.5). Conclusions Within a VHA B-MOUD cohort, courses increased more than 10-fold from 2006 to 2016 with nearly half of patients experiencing multiple courses. Patient demographics seem to dictate the length of courses.}, language = {en}, urldate = {2023-05-15}, journal = {Drug and Alcohol Dependence}, author = {Gordon, Adam J. and Saxon, Andrew J. and Kertesz, Stefan and Wyse, Jessica J. and Manhapra, Ajay and Lin, Lewei A. and Chen, Wei and Hansen, Jared and Pinnell, Derek and Huynh, Tina and Baylis, Jacob D. and Cunningham, Francesca E. and Ghitza, Udi E. and Bart, Gavin and Yu, Hong and Sauer, Brian C.}, month = jul, year = {2023}, keywords = {Buprenorphine, Opioid-Related Disorders}, pages = {109902}, }
Background Retention of patients in buprenorphine medication treatment for opioid use disorder (B-MOUD) reduces harms associated with opioid use disorder (OUD). We sought to characterize the patients receiving B-MOUD and courses of B-MOUD in a large healthcare system. Methods We conducted a retrospective, open cohort study of patients with OUD who either did or did not receive B-MOUD courses within the Veterans Health Administration (VHA) from January 2006 through July 2019, using VHA clinical data. We compared patients receiving or not receiving B-MOUD, characterized B-MOUD courses (e.g., length and doses), and examined persistence, across patient characteristics, over time. We used analyses for normally or non-normally distributed continuous variables, categorical data, and persistence over time (Kaplan-Meier persistence curves). Results We identified 255,726 Veterans with OUD; 40,431 (15.8%) had received 63,929 B-MOUD courses. Compared to patients with OUD without B-MOUD, patients with B-MOUD were younger, more often of white race, and had more co-morbidities. The frequency of new B-MOUD starts and prevalent B-MOUD patients ranged from 1550 and 1989 in 2007 to 8146 and 16,505 in 2018, respectively. The median duration of B-MOUD was 157 (IQR: 37–537) days for all courses and 33.8% patients had more than one course. The average proportion days covered was 90% (SD: 0.15), and the average prescribed daily dose was 13.44 (SD: 6.5). Conclusions Within a VHA B-MOUD cohort, courses increased more than 10-fold from 2006 to 2016 with nearly half of patients experiencing multiple courses. Patient demographics seem to dictate the length of courses.
Vision Meets Definitions: Unsupervised Visual Word Sense Disambiguation Incorporating Gloss Information.
Kwon, S.; Garodia, R.; Lee, M.; Yang, Z.; and Yu, H.
In Toronto Canada, July 2023.
ACL 2023
link bibtex
link bibtex
@inproceedings{kwon_vision_2023, address = {Toronto Canada}, title = {Vision {Meets} {Definitions}: {Unsupervised} {Visual} {Word} {Sense} {Disambiguation} {Incorporating} {Gloss} {Information}}, author = {Kwon, Sunjae and Garodia, Rishabh and Lee, Minhwa and Yang, Zhichao and Yu, Hong}, month = jul, year = {2023}, note = {ACL 2023}, }
Generating User-Engaging News Headlines.
Cai, P.; Song, K.; Cho, S.; Wang, H.; Wang, X.; Yu, H.; Liu, F.; and Yu, D.
In Toronto, Canada, July 2023.
ACL 2023
link bibtex
link bibtex
@inproceedings{cai_generating_2023, address = {Toronto, Canada}, title = {Generating {User}-{Engaging} {News} {Headlines}}, shorttitle = {{ACL} 2023}, author = {Cai, Pengshan and Song, Kaiqiang and Cho, Sangwoo and Wang, Hongwei and Wang, Xiaoyang and Yu, Hong and Liu, Fei and Yu, Dong}, month = jul, year = {2023}, note = {ACL 2023}, }
Intentional Self-Harm among US Veterans with Traumatic Brain Injury and/or Posttraumatic Stress Disorder: A Retrospective Cohort Study, 2008-2017.
Rawat, B. P. S.; Reisman, J.; Pogoda, T. K.; Liu, W.; Rongali, S.; Aseltine, R.; Chen, K.; Tsai, J.; Berlowitz, D. R.; Yu, H.; and Carlson, K.
JMIR Public Health and Surveillance. 2023.
In Press
link bibtex abstract
link bibtex abstract
@article{rawat_intentional_2023, title = {Intentional {Self}-{Harm} among {US} {Veterans} with {Traumatic} {Brain} {Injury} and/or {Posttraumatic} {Stress} {Disorder}: {A} {Retrospective} {Cohort} {Study}, 2008-2017}, abstract = {BACKGROUND: Veterans with a history of traumatic brain injury (TBI) and/or posttraumatic stress disorder (PTSD) may be at increased risk of suicide attempts and other forms of intentional self-harm as compared to Veterans without TBI or PTSD. OBJECTIVE: Using administrative data from the United States (US) Veterans Health Administration (VHA), we studied associations between TBI and PTSD diagnoses, and subsequent diagnoses of intentional self-harm, among US Veterans who used VHA healthcare between 2008 and 2017. METHODS: All Veterans with encounters or hospitalizations for intentional self-harm were assigned "index dates" corresponding to the date of first related visit; among those without intentional self-harm, we randomly selected a date from among the Veteran's healthcare encounters to match the distribution of case index dates over the 10-year period. We then examined prevalence of TBI and PTSD diagnoses within the 5-year period prior to Veterans' index dates. TBI, PTSD, and intentional self-harm were identified using International Classification of Diseases (ICD) diagnosis and external cause of injury codes from inpatient and outpatient VHA encounters. We stratified analyses by Veterans' average yearly VHA utilization in the 5-year period before their index date (low, medium, or high). Variations in prevalence and odds of intentional self-harm diagnoses were compared by Veterans' prior TBI and PTSD diagnosis status (TBI-only, PTSD-only, and comorbid TBI/PTSD) for each VHA utilization stratum. Multivariable models adjusted for age, sex, race, ethnicity, marital status, Department of Veterans Affairs (VA) service-connection status, and Charlson Comorbidity Index scores. RESULTS: Across all three VHA utilization strata, prevalence of intentional self-harm diagnoses was higher among Veterans diagnosed with TBI, PTSD, or TBI/PTSD than among Veterans with neither diagnosis. The observed difference was most pronounced among Veterans in the high VHA utilization stratum. Prevalence of intentional self-harm was six times higher among those with comorbid TBI/PTSD (11.63\%) than among Veterans with neither TBI nor PTSD (1.92\%). Adjusted odds ratios (ORs) suggested that, after accounting for potential confounders, Veterans with TBI, PTSD, or comorbid TBI/PTSD had higher odds of self-harm compared to Veterans without these diagnoses. Among Veterans with high VHA utilization, those with comorbid TBI/PTSD were 4.26 times more likely to receive diagnoses for intentional self-harm than Veterans with neither diagnosis (95\% confidence interval: 4.15-4.38). This pattern was similar for Veterans with low and medium VHA utilization. CONCLUSIONS: Veterans with TBI and/or PTSD diagnoses, compared to those with neither diagnosis, were substantially more likely to be subsequently diagnosed with intentional self-harm between 2008 and 2017. These associations were most pronounced among Veterans who used VHA healthcare most frequently. These findings suggest a need for suicide prevention efforts targeted at Veterans with these diagnoses.}, journal = {JMIR Public Health and Surveillance}, author = {Rawat, Bhanu Pratap Singh and Reisman, Joel and Pogoda, Terri K. and Liu, Weisong and Rongali, Subendhu and Aseltine, Rob and Chen, Kun and Tsai, Jack and Berlowitz, Dan R. and Yu, Hong and Carlson, Kathleen}, year = {2023}, note = {In Press}, }
BACKGROUND: Veterans with a history of traumatic brain injury (TBI) and/or posttraumatic stress disorder (PTSD) may be at increased risk of suicide attempts and other forms of intentional self-harm as compared to Veterans without TBI or PTSD. OBJECTIVE: Using administrative data from the United States (US) Veterans Health Administration (VHA), we studied associations between TBI and PTSD diagnoses, and subsequent diagnoses of intentional self-harm, among US Veterans who used VHA healthcare between 2008 and 2017. METHODS: All Veterans with encounters or hospitalizations for intentional self-harm were assigned "index dates" corresponding to the date of first related visit; among those without intentional self-harm, we randomly selected a date from among the Veteran's healthcare encounters to match the distribution of case index dates over the 10-year period. We then examined prevalence of TBI and PTSD diagnoses within the 5-year period prior to Veterans' index dates. TBI, PTSD, and intentional self-harm were identified using International Classification of Diseases (ICD) diagnosis and external cause of injury codes from inpatient and outpatient VHA encounters. We stratified analyses by Veterans' average yearly VHA utilization in the 5-year period before their index date (low, medium, or high). Variations in prevalence and odds of intentional self-harm diagnoses were compared by Veterans' prior TBI and PTSD diagnosis status (TBI-only, PTSD-only, and comorbid TBI/PTSD) for each VHA utilization stratum. Multivariable models adjusted for age, sex, race, ethnicity, marital status, Department of Veterans Affairs (VA) service-connection status, and Charlson Comorbidity Index scores. RESULTS: Across all three VHA utilization strata, prevalence of intentional self-harm diagnoses was higher among Veterans diagnosed with TBI, PTSD, or TBI/PTSD than among Veterans with neither diagnosis. The observed difference was most pronounced among Veterans in the high VHA utilization stratum. Prevalence of intentional self-harm was six times higher among those with comorbid TBI/PTSD (11.63%) than among Veterans with neither TBI nor PTSD (1.92%). Adjusted odds ratios (ORs) suggested that, after accounting for potential confounders, Veterans with TBI, PTSD, or comorbid TBI/PTSD had higher odds of self-harm compared to Veterans without these diagnoses. Among Veterans with high VHA utilization, those with comorbid TBI/PTSD were 4.26 times more likely to receive diagnoses for intentional self-harm than Veterans with neither diagnosis (95% confidence interval: 4.15-4.38). This pattern was similar for Veterans with low and medium VHA utilization. CONCLUSIONS: Veterans with TBI and/or PTSD diagnoses, compared to those with neither diagnosis, were substantially more likely to be subsequently diagnosed with intentional self-harm between 2008 and 2017. These associations were most pronounced among Veterans who used VHA healthcare most frequently. These findings suggest a need for suicide prevention efforts targeted at Veterans with these diagnoses.
Associations Between Natural Language Processing–Enriched Social Determinants of Health and Suicide Death Among US Veterans.
Mitra, A.; Pradhan, R.; Melamed, R. D.; Chen, K.; Hoaglin, D. C.; Tucker, K. L.; Reisman, J. I.; Yang, Z.; Liu, W.; Tsai, J.; and Yu, H.
JAMA Network Open, 6(3): e233079–e233079. March 2023.
_eprint: https://jamanetwork.com/journals/jamanetworkopen/articlepdf/2802468/mitra_2023_oi_230126_1678209361.50297.pdf
Paper
doi
link
bibtex
abstract
@article{mitra_associations_2023, title = {Associations {Between} {Natural} {Language} {Processing}–{Enriched} {Social} {Determinants} of {Health} and {Suicide} {Death} {Among} {US} {Veterans}}, volume = {6}, issn = {2574-3805}, url = {https://doi.org/10.1001/jamanetworkopen.2023.3079}, doi = {10.1001/jamanetworkopen.2023.3079}, abstract = {Social determinants of health (SDOHs) are known to be associated with increased risk of suicidal behaviors, but few studies use SDOHs from unstructured electronic health record notes.To investigate associations between veterans’ death by suicide and recent SDOHs, identified using structured and unstructured data.This nested case-control study included veterans who received care under the US Veterans Health Administration from October 1, 2010, to September 30, 2015. A natural language processing (NLP) system was developed to extract SDOHs from unstructured clinical notes. Structured data yielded 6 SDOHs (ie, social or familial problems, employment or financial problems, housing instability, legal problems, violence, and nonspecific psychosocial needs), NLP on unstructured data yielded 8 SDOHs (social isolation, job or financial insecurity, housing instability, legal problems, barriers to care, violence, transition of care, and food insecurity), and combining them yielded 9 SDOHs. Data were analyzed in May 2022.Occurrence of SDOHs over a maximum span of 2 years compared with no occurrence of SDOH.Cases of suicide death were matched with 4 controls on birth year, cohort entry date, sex, and duration of follow-up. Suicide was ascertained by National Death Index, and patients were followed up for up to 2 years after cohort entry with a study end date of September 30, 2015. Adjusted odds ratios (aORs) and 95\% CIs were estimated using conditional logistic regression.Of 6 122 785 veterans, 8821 committed suicide during 23 725 382 person-years of follow-up (incidence rate 37.18 per 100 000 person-years). These 8821 veterans were matched with 35 284 control participants. The cohort was mostly male (42 540 [96.45\%]) and White (34 930 [79.20\%]), with 6227 (14.12\%) Black veterans. The mean (SD) age was 58.64 (17.41) years. Across the 5 common SDOHs, NLP-extracted SDOH, on average, retained 49.92\% of structured SDOHs and covered 80.03\% of all SDOH occurrences. SDOHs, obtained by structured data and/or NLP, were significantly associated with increased risk of suicide. The 3 SDOHs with the largest effect sizes were legal problems (aOR, 2.66; 95\% CI, 2.46-2.89), violence (aOR, 2.12; 95\% CI, 1.98-2.27), and nonspecific psychosocial needs (aOR, 2.07; 95\% CI, 1.92-2.23), when obtained by combining structured data and NLP.In this study, NLP-extracted SDOHs, with and without structured SDOHs, were associated with increased risk of suicide among veterans, suggesting the potential utility of NLP in public health studies.}, number = {3}, journal = {JAMA Network Open}, author = {Mitra, Avijit and Pradhan, Richeek and Melamed, Rachel D. and Chen, Kun and Hoaglin, David C. and Tucker, Katherine L. and Reisman, Joel I. and Yang, Zhichao and Liu, Weisong and Tsai, Jack and Yu, Hong}, month = mar, year = {2023}, note = {\_eprint: https://jamanetwork.com/journals/jamanetworkopen/articlepdf/2802468/mitra\_2023\_oi\_230126\_1678209361.50297.pdf}, pages = {e233079--e233079}, }
Social determinants of health (SDOHs) are known to be associated with increased risk of suicidal behaviors, but few studies use SDOHs from unstructured electronic health record notes.To investigate associations between veterans’ death by suicide and recent SDOHs, identified using structured and unstructured data.This nested case-control study included veterans who received care under the US Veterans Health Administration from October 1, 2010, to September 30, 2015. A natural language processing (NLP) system was developed to extract SDOHs from unstructured clinical notes. Structured data yielded 6 SDOHs (ie, social or familial problems, employment or financial problems, housing instability, legal problems, violence, and nonspecific psychosocial needs), NLP on unstructured data yielded 8 SDOHs (social isolation, job or financial insecurity, housing instability, legal problems, barriers to care, violence, transition of care, and food insecurity), and combining them yielded 9 SDOHs. Data were analyzed in May 2022.Occurrence of SDOHs over a maximum span of 2 years compared with no occurrence of SDOH.Cases of suicide death were matched with 4 controls on birth year, cohort entry date, sex, and duration of follow-up. Suicide was ascertained by National Death Index, and patients were followed up for up to 2 years after cohort entry with a study end date of September 30, 2015. Adjusted odds ratios (aORs) and 95% CIs were estimated using conditional logistic regression.Of 6 122 785 veterans, 8821 committed suicide during 23 725 382 person-years of follow-up (incidence rate 37.18 per 100 000 person-years). These 8821 veterans were matched with 35 284 control participants. The cohort was mostly male (42 540 [96.45%]) and White (34 930 [79.20%]), with 6227 (14.12%) Black veterans. The mean (SD) age was 58.64 (17.41) years. Across the 5 common SDOHs, NLP-extracted SDOH, on average, retained 49.92% of structured SDOHs and covered 80.03% of all SDOH occurrences. SDOHs, obtained by structured data and/or NLP, were significantly associated with increased risk of suicide. The 3 SDOHs with the largest effect sizes were legal problems (aOR, 2.66; 95% CI, 2.46-2.89), violence (aOR, 2.12; 95% CI, 1.98-2.27), and nonspecific psychosocial needs (aOR, 2.07; 95% CI, 1.92-2.23), when obtained by combining structured data and NLP.In this study, NLP-extracted SDOHs, with and without structured SDOHs, were associated with increased risk of suicide among veterans, suggesting the potential utility of NLP in public health studies.
Web Information Extraction for Social Good: Food Pantry Answering As an Example.
Chen, H.; and Yu, H.
In Austin, TX, May 2023. ACM
The Web Conference 2023, Austin TX
doi link bibtex abstract
doi link bibtex abstract
@inproceedings{chen_web_2023, address = {Austin, TX}, title = {Web {Information} {Extraction} for {Social} {Good}: {Food} {Pantry} {Answering} {As} an {Example}}, doi = {10.1145/3543507.3583880}, abstract = {Social Determinants of Health (SDH) have more influence on health outcome than clinical care or the physical environment, namely food insecurity, housing instability, and health literacy. Many researchers design applications as a bridge to connect between resource providers and the deprived population. In this study, we take food pantries as a solution to mitigate food insecurity as an example to illustrate an automatic system combining location-aware information retrieval, web information extraction and domain-specific answering. To acquire the latest knowledge, our proposed framework first retrieves pantry candidates based on geolocation of the user, and utilizes structural information from markup language to extract semantic chunks related to six common requests. We use BERT and RoBERTa as information extraction models and compare three different web page segmentation methods in the experiments.}, publisher = {ACM}, author = {Chen, Huan-Yuan and Yu, Hong}, month = may, year = {2023}, note = {The Web Conference 2023, Austin TX}, }
Social Determinants of Health (SDH) have more influence on health outcome than clinical care or the physical environment, namely food insecurity, housing instability, and health literacy. Many researchers design applications as a bridge to connect between resource providers and the deprived population. In this study, we take food pantries as a solution to mitigate food insecurity as an example to illustrate an automatic system combining location-aware information retrieval, web information extraction and domain-specific answering. To acquire the latest knowledge, our proposed framework first retrieves pantry candidates based on geolocation of the user, and utilizes structural information from markup language to extract semantic chunks related to six common requests. We use BERT and RoBERTa as information extraction models and compare three different web page segmentation methods in the experiments.
Evaluating the efficacy of NoteAid on EHR note comprehension among US Veterans through Amazon Mechanical Turk.
Lalor, J. P.; Wu, H.; Mazor, K. M.; and Yu, H.
International Journal of Medical Informatics, 172: 105006. April 2023.
Paper
doi
link
bibtex
abstract
@article{lalor_evaluating_2023, title = {Evaluating the efficacy of {NoteAid} on {EHR} note comprehension among {US} {Veterans} through {Amazon} {Mechanical} {Turk}}, volume = {172}, issn = {1386-5056}, url = {https://www.sciencedirect.com/science/article/pii/S1386505623000230}, doi = {10.1016/j.ijmedinf.2023.105006}, abstract = {Objective Low health literacy is a concern among US Veterans. In this study, we evaluated NoteAid, a system that provides lay definitions to medical jargon terms in EHR notes to help Veterans comprehend EHR notes. We expected that low initial scores for Veterans would be improved by using NoteAid. Materials and Methods We recruited Veterans from the Amazon Mechanical Turk crowd work platform (MTurk). We also recruited non-Veterans from MTurk as a control group for comparison. We randomly split recruited MTurk Veteran participants into control and intervention groups. We recruited non-Veteran participants into mutually exclusive control or intervention tasks on the MTurk platform. We showed participants de-identified EHR notes and asked them to answer comprehension questions related to the notes. We provided participants in the intervention group with EHR note content processed with NoteAid, while NoteAid was not available for participants in the control group. Results We recruited 94 Veterans and 181 non-Veterans. NoteAid leads to a significant improvement for non-Veterans but not for Veterans. Comparing Veterans recruited via MTurk with non-Veterans recruited via MTurk, we found that without NoteAid, Veterans have significantly higher raw scores than non-Veterans. This difference is not significant with NoteAid. Discussion That Veterans outperform a comparable population of non-Veterans is a surprising outcome. Without NoteAid, scores on the test are already high for Veterans, therefore, minimizing the ability of an intervention such as NoteAid to improve performance. With regards to Veterans, understanding the health literacy of Veterans has been an open question. We show here that Veterans score higher than a comparable, non-Veteran population. Conclusion Veterans on MTurk do not see improved scores when using NoteAid, but they already score high on the test, significantly higher than non-Veterans. When evaluating NoteAid, population specifics need to be considered, as performance may vary across groups. Future work investigating the effectiveness of NoteAid on improving comprehension with local Veterans and developing a more difficult test to assess groups with higher health literacy is needed.}, language = {en}, urldate = {2023-02-19}, journal = {International Journal of Medical Informatics}, author = {Lalor, John P. and Wu, Hao and Mazor, Kathleen M. and Yu, Hong}, month = apr, year = {2023}, keywords = {Electronic health records, Health information technology, Health literacy}, pages = {105006}, }
Objective Low health literacy is a concern among US Veterans. In this study, we evaluated NoteAid, a system that provides lay definitions to medical jargon terms in EHR notes to help Veterans comprehend EHR notes. We expected that low initial scores for Veterans would be improved by using NoteAid. Materials and Methods We recruited Veterans from the Amazon Mechanical Turk crowd work platform (MTurk). We also recruited non-Veterans from MTurk as a control group for comparison. We randomly split recruited MTurk Veteran participants into control and intervention groups. We recruited non-Veteran participants into mutually exclusive control or intervention tasks on the MTurk platform. We showed participants de-identified EHR notes and asked them to answer comprehension questions related to the notes. We provided participants in the intervention group with EHR note content processed with NoteAid, while NoteAid was not available for participants in the control group. Results We recruited 94 Veterans and 181 non-Veterans. NoteAid leads to a significant improvement for non-Veterans but not for Veterans. Comparing Veterans recruited via MTurk with non-Veterans recruited via MTurk, we found that without NoteAid, Veterans have significantly higher raw scores than non-Veterans. This difference is not significant with NoteAid. Discussion That Veterans outperform a comparable population of non-Veterans is a surprising outcome. Without NoteAid, scores on the test are already high for Veterans, therefore, minimizing the ability of an intervention such as NoteAid to improve performance. With regards to Veterans, understanding the health literacy of Veterans has been an open question. We show here that Veterans score higher than a comparable, non-Veteran population. Conclusion Veterans on MTurk do not see improved scores when using NoteAid, but they already score high on the test, significantly higher than non-Veterans. When evaluating NoteAid, population specifics need to be considered, as performance may vary across groups. Future work investigating the effectiveness of NoteAid on improving comprehension with local Veterans and developing a more difficult test to assess groups with higher health literacy is needed.
Context Variance Evaluation of Pretrained Language Models for Prompt-based Biomedical Knowledge Probing.
Yao, Z.; Cao, Y.; Yang, Z.; and Yu, H.
March 2023.
AMIA 2023 Informatics Summit, Seattle WA
Paper
link
bibtex
abstract
@misc{yao_context_2023, address = {Seattle WA, USA}, title = {Context {Variance} {Evaluation} of {Pretrained} {Language} {Models} for {Prompt}-based {Biomedical} {Knowledge} {Probing}}, url = {http://arxiv.org/abs/2211.10265}, abstract = {Pretrained language models (PLMs) have motivated research on what kinds of knowledge these models learn. Fill-in-the-blanks problem (e.g., cloze tests) is a natural approach for gauging such knowledge. BioLAMA generates prompts for biomedical factual knowledge triples and uses the Top-k accuracy metric to evaluate different PLMs' knowledge. However, existing research has shown that such prompt-based knowledge probing methods can only probe a lower bound of knowledge. Many factors like prompt-based probing biases make the LAMA benchmark unreliable and unstable. This problem is more prominent in BioLAMA. The severe long-tailed distribution in vocabulary and large-N-M relation make the performance gap between LAMA and BioLAMA remain notable. To address these, we introduce context variance into the prompt generation and propose a new rank-change-based evaluation metric. Different from the previous known-unknown evaluation criteria, we propose the concept of "Misunderstand" in LAMA for the first time. Through experiments on 12 PLMs, our context variance prompts and Understand-Confuse-Misunderstand (UCM) metric makes BioLAMA more friendly to large-N-M relations and rare relations. We also conducted a set of control experiments to disentangle "understand" from just "read and copy".}, urldate = {2022-12-17}, publisher = {arXiv}, author = {Yao, Zonghai and Cao, Yi and Yang, Zhichao and Yu, Hong}, month = mar, year = {2023}, note = {AMIA 2023 Informatics Summit, Seattle WA}, keywords = {Computer Science - Artificial Intelligence, Computer Science - Computation and Language}, }
Pretrained language models (PLMs) have motivated research on what kinds of knowledge these models learn. Fill-in-the-blanks problem (e.g., cloze tests) is a natural approach for gauging such knowledge. BioLAMA generates prompts for biomedical factual knowledge triples and uses the Top-k accuracy metric to evaluate different PLMs' knowledge. However, existing research has shown that such prompt-based knowledge probing methods can only probe a lower bound of knowledge. Many factors like prompt-based probing biases make the LAMA benchmark unreliable and unstable. This problem is more prominent in BioLAMA. The severe long-tailed distribution in vocabulary and large-N-M relation make the performance gap between LAMA and BioLAMA remain notable. To address these, we introduce context variance into the prompt generation and propose a new rank-change-based evaluation metric. Different from the previous known-unknown evaluation criteria, we propose the concept of "Misunderstand" in LAMA for the first time. Through experiments on 12 PLMs, our context variance prompts and Understand-Confuse-Misunderstand (UCM) metric makes BioLAMA more friendly to large-N-M relations and rare relations. We also conducted a set of control experiments to disentangle "understand" from just "read and copy".
Multi-label Few-shot ICD Coding as Autoregressive Generation with Prompt.
Yang, Z.; Kwon, S.; Yao, Z.; and Yu, H.
February 2023.
AAAI 2023, Washington DC
Paper
link
bibtex
abstract
@misc{yang_multi-label_2023, address = {Washington DC USA}, title = {Multi-label {Few}-shot {ICD} {Coding} as {Autoregressive} {Generation} with {Prompt}}, url = {http://arxiv.org/abs/2211.13813}, abstract = {Automatic International Classification of Diseases (ICD) coding aims to assign multiple ICD codes to a medical note with an average of 3,000+ tokens. This task is challenging due to the high-dimensional space of multi-label assignment (155,000+ ICD code candidates) and the long-tail challenge - Many ICD codes are infrequently assigned yet infrequent ICD codes are important clinically. This study addresses the long-tail challenge by transforming this multi-label classification task into an autoregressive generation task. Specifically, we first introduce a novel pretraining objective to generate free text diagnoses and procedure using the SOAP structure, the medical logic physicians use for note documentation. Second, instead of directly predicting the high dimensional space of ICD codes, our model generates the lower dimension of text descriptions, which then infer ICD codes. Third, we designed a novel prompt template for multi-label classification. We evaluate our Generation with Prompt model with the benchmark of all code assignment (MIMIC-III-full) and few shot ICD code assignment evaluation benchmark (MIMIC-III-few). Experiments on MIMIC-III-few show that our model performs with a marco F1 30.2, which substantially outperforms the previous MIMIC-III-full SOTA model (marco F1 4.3) and the model specifically designed for few/zero shot setting (marco F1 18.7). Finally, we design a novel ensemble learner, a cross attention reranker with prompts, to integrate previous SOTA and our best few-shot coding predictions. Experiments on MIMIC-III-full show that our ensemble learner substantially improves both macro and micro F1, from 10.4 to 14.6 and from 58.2 to 59.1, respectively.}, urldate = {2022-12-18}, publisher = {arXiv}, author = {Yang, Zhichao and Kwon, Sunjae and Yao, Zonghai and Yu, Hong}, month = feb, year = {2023}, note = {AAAI 2023, Washington DC}, keywords = {Computer Science - Artificial Intelligence, Computer Science - Computation and Language}, }
Automatic International Classification of Diseases (ICD) coding aims to assign multiple ICD codes to a medical note with an average of 3,000+ tokens. This task is challenging due to the high-dimensional space of multi-label assignment (155,000+ ICD code candidates) and the long-tail challenge - Many ICD codes are infrequently assigned yet infrequent ICD codes are important clinically. This study addresses the long-tail challenge by transforming this multi-label classification task into an autoregressive generation task. Specifically, we first introduce a novel pretraining objective to generate free text diagnoses and procedure using the SOAP structure, the medical logic physicians use for note documentation. Second, instead of directly predicting the high dimensional space of ICD codes, our model generates the lower dimension of text descriptions, which then infer ICD codes. Third, we designed a novel prompt template for multi-label classification. We evaluate our Generation with Prompt model with the benchmark of all code assignment (MIMIC-III-full) and few shot ICD code assignment evaluation benchmark (MIMIC-III-few). Experiments on MIMIC-III-few show that our model performs with a marco F1 30.2, which substantially outperforms the previous MIMIC-III-full SOTA model (marco F1 4.3) and the model specifically designed for few/zero shot setting (marco F1 18.7). Finally, we design a novel ensemble learner, a cross attention reranker with prompts, to integrate previous SOTA and our best few-shot coding predictions. Experiments on MIMIC-III-full show that our ensemble learner substantially improves both macro and micro F1, from 10.4 to 14.6 and from 58.2 to 59.1, respectively.
H4H: A Comprehensive Repository of Housing Resources for Homelessness.
Osebe, S.; Tsai, J.; and Yu, H.
In Seattle WA, USA, March 2023.
AMIA 2023 Informatics Summit, Seattle WA
link bibtex
link bibtex
@inproceedings{osebe_h4h_2023, address = {Seattle WA, USA}, title = {{H4H}: {A} {Comprehensive} {Repository} of {Housing} {Resources} for {Homelessness}}, author = {Osebe, Samuel and Tsai, Jack and Yu, Hong}, month = mar, year = {2023}, note = {AMIA 2023 Informatics Summit, Seattle WA}, }
2022
(25)
Enhancing the prediction of disease outcomes using electronic health records and pretrained deep learning models.
Yang, Z.; Liu, W.; Berlowitz, D.; and Yu, H.
December 2022.
arXiv:2212.12067 [cs]
Paper
doi
link
bibtex
abstract
@misc{yang_enhancing_2022, title = {Enhancing the prediction of disease outcomes using electronic health records and pretrained deep learning models}, url = {http://arxiv.org/abs/2212.12067}, doi = {10.48550/arXiv.2212.12067}, abstract = {Question: Can an encoder-decoder architecture pretrained on a large dataset of longitudinal electronic health records improves patient outcome predictions? Findings: In this prognostic study of 6.8 million patients, our denoising sequence-to-sequence prediction model of multiple outcomes outperformed state-of-the-art models scuh pretrained BERT on a broad range of patient outcomes, including intentional self-harm and pancreatic cancer. Meaning: Deep bidirectional and autoregressive representation improves patient outcome prediction.}, urldate = {2023-02-19}, publisher = {arXiv}, author = {Yang, Zhichao and Liu, Weisong and Berlowitz, Dan and Yu, Hong}, month = dec, year = {2022}, note = {arXiv:2212.12067 [cs]}, keywords = {Computer Science - Artificial Intelligence, Computer Science - Computers and Society, Computer Science - Machine Learning}, }
Question: Can an encoder-decoder architecture pretrained on a large dataset of longitudinal electronic health records improves patient outcome predictions? Findings: In this prognostic study of 6.8 million patients, our denoising sequence-to-sequence prediction model of multiple outcomes outperformed state-of-the-art models scuh pretrained BERT on a broad range of patient outcomes, including intentional self-harm and pancreatic cancer. Meaning: Deep bidirectional and autoregressive representation improves patient outcome prediction.
Geographic Disparities in Prevalence of Opioid Use Disorders in US Veterans.
Li, W.; Leon, C.; Liu, W.; Sung, M. L.; Kerns, R. D.; Becker, W. C.; and Yu, H.