"Data is the new oil. Like oil, data is valuable, but if unrefined it cannot really be used. It has to be changed into gas, plastic, chemicals, etc. to create a valuable entity that drives profitable activity."
- Clive Humby
How do we measure the "value" of data?
Big Data had been a healthcare buzzword for the past few years until those unfamiliar with the industry realized the challenges posed by data structure, security, data standardization, and data storage and exchange are tediously complex. As a clinical informaticist, data structure and standardization, and how both influence data "quality," has always been my passion.
I was relieved to see "big data" lose some of its luster as the industry realized that before we can leverage big data, we need to leverage small data in contextually appropriate ways. Accordingly, the size of one's data is less critical than identifying specific data points that deliver valuable, meaningful, actionable insights when appropriately aggregated and manipulated. Knowing what to do with this information, and executing those insights, is the key to unlocking value -- considerable or otherwise. To realize a Return on Investment (ROI), care delivery processes must evolve to improve efficiency, coordination, and quality. However, before healthcare administrators or clinicians make a change, they must trust that their decisions are based on valid, usable, and relevant information.
How do we separate usable data from noise?
Data in and of itself is not valuable, and more data only sometimes equals better data. The opposite is often true. The amount and variety of data collected by today's healthcare organizations make deriving insights more complex. This additional complexity may seem counterintuitive to some, but unfortunately, all data sets contain "noise," which makes finding a signal difficult. The bigger the data, the louder the noise. But what is noise?
Noisy data is data with significant amounts of meaningless information. This unwanted data does nothing to enhance or clarify the relationship between the data and the question. The criticality of addressing noise is especially true in Artificial Intelligence, where noise can cause algorithms to miss patterns in the data.
Given these complexities, how should we approach defining and evaluating high-quality data? Which characteristics are important?
Identifying valid data
There are three characteristics used to evaluate the strength of "evidence" when trying to prove or disprove a hypothesis. These same principles can also be used to evaluate data:
accuracy
reliability
validity
Accuracy and reliability are two characteristics of data that provide us feedback on how closely and consistently the data generated objectively represents the truth. Accuracy is the exactness or precision of a measurement. For example, if you were measuring the length of a pencil, you would get a much more accurate measurement if you measured it in centimeters than inches.
Reliability is the degree to which repeated observations will produce the same result under identical circumstances. Your measurement is reliable if you get the same result or length every time you measure a pencil.
Validity, on the other hand, is a much more abstract concept. Validity pertains to the appropriateness, correctness, meaningfulness, and usefulness of the meaning and inferences the users ascribe to the data. Data validity tells us the extent to which the data represents what was intended. The simplest explanation I've come across for intent is to have a specific purpose or goal that you are trying to reach.
Identifying relevant data
Data is usable and trustworthy based on its validity or ability to represent a phenomenon accurately, be it care processes, treatment regimes, caregiver activity, or patient symptoms. Therefore, data points are relevant only within a specific context or use case. The context of the data – who collected it, where it was collected, when, and how it was collected, are all relevant in determining if data is "fit for purpose" to answer a specific question. Given the breadth and complexity of patient care and healthcare operations that generate the data, it is 'no surprise that 51 percent of organizations do not truly know what kind of data – or how much of it – they need to archive or collect to generate actionable insights from their information' (Healthcare IT April 2015).
It takes years of experience working in the industry to understand patient care-related processes and operations. This type of knowledge and insight is critical to understanding the context of healthcare data (what, where, when, and how). A comprehensive understanding of the data context enables one to select better inputs, which leads to better outputs. Better outputs lead to better questions and answers. Better answers lead to knowledge and understanding.
Whether deciding what data to archive, what to include in a data warehouse, what to use in developing predictive analytics, or what to feed an analytic product, data relevance and validity is critical to realizing ROI for Health IT investments.
Please contact us at info@sos4hit.com if you'd like support identifying valid data for FHIR Resources, quality measurement, clinical decision support, or artificial intelligence.
Comments