Tải bản đầy đủ (.pdf) (13 trang)

navigating health data ecosystem

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (2.04 MB, 13 trang )




Navigating the Health Data Ecosystem
Ian Eslick, Tuhin Sinha, Roger Magoulas, and Rob Rustad


Navigating the Health Data Ecosystem
by Ian Eslick, Tuhin Sinha, Roger Magoulas, and Rob Rustad
Copyright © 2015 O’Reilly Media, Inc. All rights reserved. Cover image © 2015 Mike Beauregard:
“Meandering in the Arctic.”
Printed in the United States of America.
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.
O’Reilly books may be purchased for educational, business, or sales promotional use. Online
editions are also available for most titles (). For more information,
contact our corporate/institutional sales department: 800-998-9938 or
May 2015: First Edition
Revision History for the First Edition
2015-05-05: First Release
The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Navigating the Health Data
Ecosystem, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc.
While the publisher and the author have used good faith efforts to ensure that the information and
instructions contained in this work are accurate, the publisher and the author disclaim all
responsibility for errors or omissions, including without limitation responsibility for damages
resulting from the use of or reliance on this work. Use of the information and instructions contained in
this work is at your own risk. If any code samples or other technology this work contains or describes
is subject to open source licenses or the intellectual property rights of others, it is your responsibility
to ensure that your use thereof complies with such licenses and/or rights.
978-1-491-92720-5
[LSI]



Chapter 1. The “Six C’s”: Understanding
the Health Data Terrain in the Era of
Precision Medicine
Background
A few years ago, O’Reilly became interested in health topics, running the Strata RX conference,
writing a report on “How Data Science is Transforming Health Care: Solving the Wanamaker
Dilemma,” and publishing Hacking Healthcare. Our social network grew to include people in the
health care space, informing our nascent thoughts about data in the age of the Affordable Care Act and
the problems and opportunities facing the health care industry. We had the notion that aggregating data
from traditional and new device-based sources could change much of what we understand about
medicine — thoughts now captured by the concept of “precision medicine.” From that early thinking,
we developed the framework for a grant with the Robert Wood Johnson Foundation (RWJF) to
explore the technical, organizational, legal, privacy, and other issues around aggregating healthrelated data for research — to provide empirical lessons for organizations also interested in pushing
for data in health care initiatives. This report begins the process of sharing what we’ve learned.

Introduction
After decades of maturing in more aggressive industries, data-driven technologies are being adopted,
developed, funded, and deployed throughout the health care market at an unprecedented scale.
February 2015 marked the inaugural working group meeting of the newly announced NIH Precision
Medicine Initiative designed to aggregate a million-person cohort of genotype/phenotype dense
longitudinal health data, where donors provide researchers with the raw epidemiological evidence to
develop better decision-making, treatment, and potential cures for diseases like cancer. In the past
several years, many established companies and new start-ups have also started to apply collective
intelligence and “big data” platforms to health and health care problems. All these efforts encounter a
set of unique challenges that experts coming from other disciplines do not always fully appreciate.
In 2014, the Robert Wood Johnson Foundation funded the subject of this report, a research effort
called “Operationalizing Health Data,” a deep dive into the health care ecosystem focused on
understanding and advancing the integration of personalized health data in both clinical and research
organizations. RWJF encouraged the small group of data scientists, innovators and health researchers

working on the grant to find and prototype concrete solutions facing several partner organizations
trying to leverage the value of health data. The research intends to empirically inform innovation
teams, often coming from non-health-related industries, about the messy details of using and making
sense of data in the heavily regulated hospital IT environment.


This report describes key learnings identified by the project across six major facets of the health data
ecosystem: complexity, computing, context, culture, contracts, and commerce. In future reports, we
will focus on specific tactical challenges the project team addressed.

Complexity: Enormous Domain, Noisy Data, Not Designed
for Machine Consumption
In marked contrast to much of the data generated in enterprise or consumer markets, health care data
is exceedingly complex, and this complexity makes direct application of the techniques we’ve learned
in other industries surprisingly challenging. Underlying this is the simple fact that the human organism
has no closed-form solution. Despite thousands of years of study, our real-world comprehension of
human physiology is largely indirect and sparse. When coupled with the other challenges already
inherent in data-intensive applications, the fact that we don’t necessarily know the root causes for
measured chemical and biological changes makes health data analysis and analytics particularly
demanding.
Nearly all data derived from a biological system is messy, whether captured via device, blood test,
medical record, or survey. Working with health data requires understanding the innate challenges of
the data as well as managing many other difficulties, such as:
Measurements are not typically stable; there are many possible sources of variation.
Electronic Medical Record (EMR) discrete data is often entered by hand; even parsing can be
challenging.
The same underlying data can be encoded or labeled in multiple ways.
A vast system of legacy systems and protocols must often be navigated.
Personal health data tends to be dominated by longitudinal/time-series data; interpretation of this
data is not necessarily well understood by either researchers or clinicians in practice.

We can see examples of these challenges in work performed with a partner developing a personal
health app that presents a history of laboratory test results to a patient. Laboratory measurements such
as Serum Albumin, a measure of the blood concentration of a protein produced by the liver, provide
evidence of potential health problems or risk factors. The goal of the app is to enhance the clinical
visit and give patients agency by helping the patient reflect on the history of their test results along
with questions they might want to ask their clinician. It’s a simple concept to describe, but not so
simple to execute.
The value produced by a blood test for a single patient will vary from laboratory to laboratory as
well as periodically over time. To provide a point of comparison, laboratories provide reference
ranges with their test results. These ranges define what is a normal (in range) or abnormal (out of
range) result. Reference ranges typically tell you whether you are in the same basic range as 80-95%


of the population, but they do not typically tell you whether a given measure is significant for you
personally. In our test population, Serum Albumin lower thresholds for when a measure became
“abnormal” was 3.3 or 3.8 g/dL, depending on the laboratory used. Given that the mean value of all
samples together was 3.3, these thresholds become central to determining when a patient is at obvious
risk.
These causes of variation confound our ability to directly aggregate across multiple patients and
laboratories. Moreover, no common convention exists to normalize laboratory results for aggregation,
prediction, or optimization. Do we aggregate the discrete interpretation of inside, above, below the
reference range? Do we aggregate based on the standard deviation of a measure — does the data even
have a normal distribution? Do we just ignore the noise, aggregate the values, and rely on the law of
large numbers over a large population of patients? These decisions all require a fairly sophisticated
understanding of the inference you are trying to draw from the aggregate data set.
Perhaps the most interesting question when dealing with health data is what a specific measure means
for an individual. Clinicians do this for us all the time. For example, many people with naturally high
“bad” (LDL) cholesterol have compensating high levels of “good” (HDL) cholesterol. They shouldn’t
necessarily be on a statin, yet they can be well above the upper limit that research suggests is a
tipping point for increased risk of heart failure. The clinician knows to ignore these values for this

patient based on all the other factors; it’s not a straightforward computation for the machine.
Clinicians typically refer to population level results to guide individual decisions. However, we can
also use personal health data to capture a “baseline” so we can compare our health today to our
health in the past. Baselines help us answer critical questions about whether we are stable, how we
respond to therapies, etc. Personal health baseline measurements also enable a much more precise
reasoning about the significance of a change when what is normal for us is not normal for everyone
else.

Computing: Standards and Inter-System Exchangeability
Accessing and parsing data can also be a significant challenge. Most electronic medical records are
not much better than electronic paper — meaning that the data entered into it are entered for purposes
of discoverability (to help all providers understand the patient case), documentation (what happened
for legal reasons), regulatory compliance, and billing (documenting care for the payer). This point is
essential: data is entered into medical records primarily so that other people can find and review it. It
is not entered to enable automated or aggregate analysis. So, while EMRs are a giant leap forward,
they are not a panacea for machine learning and suffer from significant garbage-in, garbage-out
problems.
One of the hospital systems we talked to still receives all of its laboratory data by fax image and hand
transcribes the fax content into its EMR. The recorded laboratory data typically stores the data in
certain database fields in certain ways, but there is variation in how data is encoded across both
technicians and laboratories for the same laboratory test. You will continue to find bugs in the data
for weeks or months after first starting to exchange data. These issues with EMR data make precision


clinical medicine a greater challenge than more established uses of EMR data, such as population
management.
Another partner has spent nearly $20 million in grant money on a project over the past five years
building a standardized registry for a disease condition that simply makes a standard form available
in more than 40 centers so the data can be relayed into an open source registry system (Indivo) that is
used to perform analysis across more than 15 thousand patients.

The bottom line is there is no standard, interoperable schema for documenting human health in a
digital format — the way cars in a manufacturing system can be — and until some agreed-upon
methodology for doing that exists, teams working on both intra- and inter-hospital data aggregations
will struggle to generate apples-to-apples normalized results.

Context: Critical Metadata for Accurate Interpretation
In addition to the testing variations described above, the values produced by a laboratory blood test
are subject to tremendous variation due to contextual factors such as the time of day the blood was
drawn, what the patient was eating or drinking at the time, the handling of the specimen, the time
between draw and analysis, the specific method of analysis, etc. A high value for a given parameter
may or may not have clinical relevance, even if you are using a personal baseline. For example, if
you forgot to skip your morning coffee before taking a pre-diabetic blood sugar test, you can get a
false positive for high blood sugar.
Like blood tests, many medical data sets will have only limited machine consumable metadata
describing what can be essential context for clinical and research analysis of provided data. One
leading researcher at the NIH we spoke with argued that the primary reason he is not interested in
patient-provided data is the lack of this critical contextual data.
The importance of context in interpreting data in health is one of the key barriers for those who seek
to augment or replace clinicians with analytics. An analytics system is only as good as the input, and
today, health care systems do not give us very good inputs. When we examined laboratory results, we
saw cases of missing reference ranges; miscoded data; and, for some measures, a great deal of noise.
For example, we saw records of “failed pulmonary function test” created for billing purposes that
had no specific and actionable information about the patient’s health status. The only way to tell
which tests failed was to read the free text notes associated with the test; no single regular expression
allowed for automatic filtering of these failed tests.
These challenges arise in personal health data as well. A motion sensor tells us when we have
activity, but it doesn’t tell us when inactivity is due to the sensor being located in a purse, on a table,
or because we’re actually sitting at a desk or on the couch. If you are using this signal to assess the
actual activity level of a single person, it might be insufficient for clinical use. However, the
opportunity with personal health data is that we can triangulate across several signals to assess the

actual context, and it is often possible to engage the user periodically to fill in critical blanks.
Personal health data, properly managed, can make a powerful contribution to the health care system


by augmenting the impoverished context clinicians currently get during patient visits. Evidence shows
that patients interact more honestly with machines and that devices assess patient behavior more
accurately than subjective self-report. New mobile phone health data frameworks from Apple and
Google, along with many independent phone/sensor-based applications can provide a rich source of
contextual clarity. The Health Data Exploration Project — based at the California Institute for
Telecommunications and Information Technology (Calit2) and supported by the Robert Wood
Johnson Foundation — is sponsoring what they call “agile research projects” designed to accelerate
the understanding of how health care can best embrace these rapidly evolving personal health data
technologies.

Culture: Lean Start-Up Difficulties in Hospital Ecosystems
Clinicians have developed a strong culture of skepticism about why a product, algorithm, or process
is beneficial. Three elements of the health care provider culture should be noted in particular: risk
aversion, evidence orientation, and workflow sensitivity.
Most clinicians make decisions on a daily basis that, if wrong, will profoundly and negatively impact
a patient’s life. If you have something that works, that you feel helps people in the large, then
innovation becomes a two edged sword — you might be able to do better, but you also risk doing
worse and hurting people. The false-negative in health care is intolerable; it is literally foresworn in
the Hippocratic oath.
Given the challenges of building systems that present information or promote specific actions based
on health data, and the costs and consequence of being wrong, health care professionals have an
exceedingly high bar for utility and reliability. Most clinicians (and researchers) will only take a new
product or intervention seriously when you’ve proven its value in a peer-reviewed, published trial,
conducted according to medical-ethical standards.
For example, showing patients laboratory results and prompting them to talk with their clinician about
the questions “the system” thinks are important is unlikely to be adopted until testing shows that a

population of more than 100 patients using the process have fewer errors, require less provider time,
and/or experience better outcomes than a population of more than 100 patients who don’t.
While this kind of research is far cheaper in digital health than in medical devices and biological
therapies, trials create a heavy burden for start-ups seeking to enter the health care space. Just getting
inside-the-institution access to clinical environments is a significant hurdle for most digital health
start-ups, much less providing randomized, placebo-controlled, double-blind study results of their
proposed solution. Proving the efficacy of a process on a population is a difficult and time consuming
task compared to the lean start-up strategy used to such great effect in consumer and enterprise
applications.
Finally, even if you can show that an intervention is better, it has to fit naturally into the hospital’s
workflow. The care process in the medical environment is already packed with complexity: multiple
procedures; safety considerations; instruments; devices; constantly changing schedules; delays; a
multitude of contributing specialists, insurance, and paperwork constraints; and, in general, very


poorly designed IT systems to help you manage all this. Adding a new tool or process into this
environment requires that it induce only a small change in current practice, be an absolute must-have,
or provide significant time savings to compensate for the new cognitive load and costs, such as
training, inventory management, installation, etc.

Contracts: Navigating IRB, HIPAA, and EULA Frameworks
Combining research data, clinical data, and consumer personal health data involves the interaction of
multiple legal and regulatory frameworks. The most common frameworks are: Institutional Review
Board (IRB), Health Insurance Portability and Accountability Act / Health Information Technology
for Economic and Clinical Health (HIPAA/HITECH) , and an mHealth application’s EULA. These
frameworks involve a very complicated set of constraints on how data can be stored, transported,
analyzed, or repurposed. Most importantly, these issues can be a source of significant friction for
people who want to work with health data.
Research data is governed by a patient consent process that ensures participants understand the risks
and rewards of sharing their health data. These consent forms and the entire research process are

reviewed by institutional review boards (IRBs) to ensure that the research is ethical. While digital
heath start-ups should feel compelled to get IRB approval for any work exposing humans to research,
many don’t. If you have a hospital partner helping with your medical research, they generally need to
have their participation governed by an IRB-approved protocol (by law, research on human subjects
supported by government funds must be approved by an IRB). Typically, institutions only trust their
own IRB, so a multi-institutional study can take three to six months just to get the procedures
approved. If you want to change a form or adapt the study in any way, you need the IRB to approve
the proposed change, a process that can take anywhere from one to eight weeks depending on the
institution and the concerns of the examiner. A single examiner who doesn’t agree with your approach
can cost you weeks or months of delay. Despite these risks, companies that want to use published
evidence as part of documenting their value proposition are well advised to consider engaging with a
partner or a third-party IRB to facilitate publication of the results of populations using your product.
Any data a patient provides to a hospital as part of the ordinary process of clinical care is regulated
by the HIPAA and HITECH acts. These acts describe what a “covered entity” (a hospital or other
provider service) can do with the data you have to provide to them in order to receive medical care.
These regulations typically describe access controls, auditing provisions, data security policies,
training programs, etc., that have to be followed. Breaches, or leakage of health data, require
notification to your partners, the patients, and the government. Breaches can result in civil fines, and
breaches caused by willful negligence can trigger much higher fines, easily reaching into the millions.
Beyond your own behavior, you are also responsible for ensuring the same behavior from all your
subcontractors who come in contact with personal health data (your cloud provider, customer support
system, email system, etc.). This chain of responsibility is formalized in a Business Associate
Agreement (BAA), a rigorous legal contract you adopt if you work with a covered entity. The terms
of such documents are typically standardized per institution and non-negotiable.


Any product that interacts with the health care system must also pass an extremely conservative
security review by the networking team, and a risk management review by the legal team. As
innovation is not a prime directive of these teams, you are subject to the most conservative
interpretation of risk, and changing their focus to strategic goals often involves convincing the most

senior members of the institution. That said, having a deployment at several major centers can
significantly reduce the perceived risk. Successful start-ups often emerge out of existing institutions
that are motivated to deploy and help prove that a concept works.
In the case of the grant work, we hoped to start analyzing a data set from a major research institution
at the start of the project. Instead, we ran into a maze of organizational bureaucracy as the various
players wrestled with the process described above. The contractor was unfamiliar with the hospital’s
process, and many people at the hospital had to pass judgement on the project for it to move ahead.
We didn’t face a lack of effort, as everyone involved worked hard to get access to the data approved
-- there’s just a lot to contend with.

Commerce: How Do Digital Health Start-Ups Get Paid?
In most markets, if you create a differentiated solution to a clear problem that has economic impact
for your customers, you can find some model by which you get compensated for that solution. This is
simply not the case in health care. We have seen multiple projects that add real, tangible value to
patients and/or hospitals that, for structural reasons, fail to get any traction in the market. This failure
to launch can involve:
Payment and billing models
Pace of adoption
Evidence barriers
Integration friction and costs
It is common for new innovations in digital health to encounter “perverse incentives” in health care,
where the existing market creates artificial barriers to entry and adoption. What might seem to be a
common-sense solution, with fundamental market-driven principles, in another sector might fail
completely in health care for obscure reasons. Payment models are heavily regulated and slow to
change in health care, so instead you are well advised to identify a value you can get paid for, then
work to develop defensive barriers to compete for the revenue. We have observed that investors with
significant experience in health care focus on the business and payment models before the technology
and typically prefer not to fund companies until they’ve achieved $1 million to $2 million in recurring
revenues. The message is that in health care, the only proof of scalable value comes in the form of a
recurring revenue stream.

In digital health, payment models come in several flavors:
Billing codes


Efficiencies tied to billing codes
Outcomes
Population management
On the expenses side, there are health-care-specific costs to be reckoned with as well. For a start-up,
HIPAA insurance is relatively affordable at small scale, but fees can go into the tens of thousands of
dollars as your patient volume grows. Plan on spending $5 per patient, declining to perhaps 10 cents
per patient at volume, just on HIPAA liability insurance.

Summary
Health care innovation contains more hurdles and requires more finesse than many start-ups anticipate
— particularly for teams that come out of social-media-, consumer-, or enterprise-focused
companies. However, by paying attention to the lessons from our six C’s (complexity, computing,
context, culture, contracts, and commerce), teams can anticipate more of what they face, and plan,
resource, and organize funding accordingly. As the CTO of one medical center told us, innovation
comes “down to change management — it’s not the technology; it’s the adoption, training, and new
roles and jobs required to take advantage of new technology.”
While all of these barriers, in aggregate, might seem insurmountable, the chasm between the worlds
of open tech innovation and closed proprietary health care have never been narrower. According to
Rock Health, the $4.1 billion of venture funding for digital start-ups in 2014 was nearly the
equivalent to the prior three years (2011-2013) combined. Companies that can identify a solution that
successfully navigates all of these constraints can not only create extraordinary financial returns, but
they can have a real impact on the lives of millions of people. Despite the challenges, we feel there
are many practical innovations that remain to be discovered and deployed at the intersection of
personal health data and the traditional health care and clinical research ecosystems.
In future posts, we will explore the results of three of the deep-dive projects we pursued in the course
of the RWJF grant as well as introduce a new company that emerged from a meeting of the minds

among some of our contributors and collaborators. These projects include a secure analytics platform
that facilitates cross-institutional collaboration at a distance, a model and open source API concept
for digital health SaaS platforms, and an exploration into how to extract insights from personal health
time-series data.



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

Tải bản đầy đủ ngay
×