Repustate is now a part of Sprout Social. Read the press release here


Finding the proverbial needle in thousands of haystacks

Unstructured, inconsistent and sporadic at best, progress notes entered into EMRs have long been a source of frustration. But using NLP based text analytics, one hospital decided to take action and classify, organize and search all progress notes in their entire system.

Start Your Free Trial

NLP in Healthcare & Pharma

What is Healthcare Big Data Analytics?

Healthcare creates a lot of Big Data in the form of electronic health records, or EHRs. Basically EHRs are digital versions of a patient's paper chart, and contain the medical and treatment histories of patients. EHRs are essential to the continuous digital evolution and transformation of the healthcare industry. Big data in healthcare refers to the voluminous quantities of data created by digital technologies that collect patients' information. This data can only be useful to medical practitioners and caregivers if it is structured, making it possible to easily retrieve, analyse and apply the insights within the data in order to easily and quickly improve patient treatment, satisfaction and experience.

Why is NLP critical for Big Data analysis in Healthcare?

One of the major challenges faced by healthcare analytics is that much of the electronic data it has is unstructured. The insights that sit within this unstructured data are buried and unavailable for use. This often makes it practically impossible to apply the insights to the real world needs and challenges within healthcare. This also means that discovering trends, patterns and patient voice in the data, an essential element of medical research and development, is more difficult and time consuming. Over the last few years, the healthcare industry has looked to Natural Language Processing to help it provide its data collection the necessary structure required to optimize its value to improve patient care.

How can Natural Language Processing, Text Analytics and Sentiment Analysis benefit Healthcare?

Natural language processing, or NLP for short, is focused on enabling computers to understand and communicate using human language. When we unpack the term, the meaning and significance of natural language processing becomes more obvious. “Natural” implies human, while “language” suggests a rule-based system of communication, or an exchange of meaningful symbols. “Processing” implies some form of software programming that manipulates or extracts data from stored, electronic files. Now the meaning of NLP begins to quickly surface. NLP is the software processing of human language for the purpose of manipulating and extracting data from electronic text files. NLP can be understood as a subset of artificial intelligence because it's goal is to ultimately have computers understand, mimic and perform various types of human language activities. It includes methods such text analytics, named entity recognition, sentiment analysis and semantic search.

This is why natural language processing in healthcare is a growing trend. NLP and its analytics applications provide the structure necessary to healthcare data making it more easily insightful, shareable and manageable by multiple care providers and organizations - such as laboratories, specialists, medical imaging facilities, pharmacies, emergency facilities, and school and workplace clinics. To gain analytical insights from all clinicians involved in patient care, healthcare organizations apply the benefits of NLP to their data.

Case study: NLP in Healthcare

The same scenario plays out across doctor's offices and hospitals countless times across the world every day. A doctor, nurse or other medical practitioners quickly jots down a few notes into an EMR, providing sparse details and little context, and quickly moves on to the next patient. It's up to someone else to decipher the notes at a later point down the road.

A better way

Rather than asking people to change their ways and possibly disrupt their workflows, one large hospital network in the northeast US decided to use text analytics to classify, organize and then make searchable all of the important information contained within an EMR.

The hospital's CTO asked Repustate to help him solve this problem:

"There's so much data buried in these records, I wish I could get it out. I wish could ask questions like "What was the result of prescribing a particular drug at various dosage levels?" And that's just one example, I have so many."

Applying Natural language processing in health care

Taking that question as a good first problem to tackle, Repustate applied its Natural language processing tools to each and every progress note in the EMR. Specifically, named entity recognition was applied to identify and annotate each of the following:

  • The primary reason the patient came (e.g. chest pain, shortness of breath)
  • Each and every medication prescribed and by any name, it might go by (e.g. Tylenol, paracetemol, acetaminophen, Panadol)
  • The dosage it was prescribed at (e.g. 2 pills twice a day, 40mg daily)
  • The length of time between successive visits (e.g. patient returned two weeks later)
  • And the ensuing result (e.g. symptoms went away, symptoms persisted)


With this information tagged and stored, each health record now contains a tremendous amount of value. Reports can be run to monitor the effectiveness of certain medications and/or dosage combinations. The outcomes that patients have with certain medications and even specific medical practitioners also becomes self-evident. Sentiment analysis can be used to determine efficacy by looking out for positive or negative terms and phrases.

More work remains as the hospital administrators, having seen the success of the scenario above, have numerous insights they want to glean from this new found structured data. With the hard part of structuring and annotating progress notes out of the way, the fun part begins.