The complex data found in unstructured physician notes in electronic medical records (EMR) provide a viable source for identifying systemic lupus erythematosus (SLE) flares in an EMR database, according to study results presented by Eli Lilly and Veradigm at the 2022 Canadian Rheumatology Association conference last week.
This retrospective study used the Veradigm Health Insights Ambulatory EHR (HIAE) Research Database linked with administrative claims to identify SLE patients who had newly initiated select immunosuppressants or advanced therapies. The EMR dataset consists of de-identified patient records sourced from ambulatory/outpatient primary care and specialty settings.
Natural language processing (NLP) was used to identify key flare-related words and phrases from unstructured EMR clinical notes and to assist in developing rules to categorize notes. In addition, classified notes were reviewed by clinicians as indicative of a high-confidence flare, probable flare, or not a flare.
The study identified a total of 801 eligible patients who had initiated immunosuppressants during the study period. Of these, 20% were identified as having at least one high-confidence or probable flare during the twelve months after immunosuppressant initiation. Inter-rater agreement (Fleiss Kappa = 0.68), or the degree of agreement among independent observers who assess the same phenomenon, on flare status was substantial.
Researchers have previously attempted to use algorithms to identify SLE flares using large medical and pharmacy claims databases, but these attempts have not been clinically validated. However, important diagnostic information may not be available in structured data. Details omitted with common coding practices may lead to potential false negative classifications, but unstructured medical notes have the potential to provide the additional detail needed to accurately identify and classify flare events.
This study demonstrates that the rich but complex data in unstructured physician notes offer a viable approach to using large datasets to assess flare occurrence in patients with SLE. Specifically, NLP combined with clinical review of unstructured notes was shown to be a feasible approach to identifying SLE flares in a large EMR database. Future work will investigate concordance between NLP and structured data approaches, and on development of a matching learning algorithm for identify presence and severity of flares.