Tải bản đầy đủ (.pdf) (18 trang)

IT training navigating health data ecosystem khotailieu

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (2.23 MB, 18 trang )


Make Data Work
strataconf.com
Presented by O’Reilly and Cloudera,
Strata + Hadoop World is where
cutting-edge data science and new
business fundamentals intersect—
and merge.
n

n

n

Learn business applications of
data technologies
Develop new skills through
trainings and in-depth tutorials
Connect with an international
community of thousands who
work with data

Job # 15420


Navigating the Health
Data Ecosystem

Ian Eslick, Tuhin Sinha,
Roger Magoulas, and Rob Rustad



Navigating the Health Data Ecosystem
by Ian Eslick, Tuhin Sinha, Roger Magoulas, and Rob Rustad
Copyright © 2015 O’Reilly Media, Inc. All rights reserved. Cover image © 2015 Mike
Beauregard: “Meandering in the Arctic.”
Printed in the United States of America.
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA
95472.
O’Reilly books may be purchased for educational, business, or sales promotional use.
Online editions are also available for most titles (). For
more information, contact our corporate/institutional sales department:
800-998-9938 or
May 2015:

First Edition

Revision History for the First Edition
2015-05-05: First Release
The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Navigating the
Health Data Ecosystem, the cover image, and related trade dress are trademarks of
O’Reilly Media, Inc.
While the publisher and the author have used good faith efforts to ensure that the
information and instructions contained in this work are accurate, the publisher and
the author disclaim all responsibility for errors or omissions, including without limi‐
tation responsibility for damages resulting from the use of or reliance on this work.
Use of the information and instructions contained in this work is at your own risk. If
any code samples or other technology this work contains or describes is subject to
open source licenses or the intellectual property rights of others, it is your responsi‐
bility to ensure that your use thereof complies with such licenses and/or rights.


978-1-491-92720-5
[LSI]


Table of Contents

The “Six C’s”: Understanding the Health Data Terrain in the Era of
Precision Medicine. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Background
Introduction
Complexity: Enormous Domain, Noisy Data, Not Designed
for Machine Consumption
Computing: Standards and Inter-System Exchangeability
Context: Critical Metadata for Accurate Interpretation
Culture: Lean Start-Up Difficulties in Hospital Ecosystems
Contracts: Navigating IRB, HIPAA, and EULA Frameworks
Commerce: How Do Digital Health Start-Ups Get Paid?
Summary

1
1

2
4
5
7
8
10
11


v



The “Six C’s”: Understanding the
Health Data Terrain in the Era of
Precision Medicine

Background
A few years ago, O’Reilly became interested in health topics, run‐
ning the Strata RX conference, writing a report on “How Data Sci‐
ence is Transforming Health Care: Solving the Wanamaker
Dilemma,” and publishing Hacking Healthcare. Our social network
grew to include people in the health care space, informing our nas‐
cent thoughts about data in the age of the Affordable Care Act and
the problems and opportunities facing the health care industry. We
had the notion that aggregating data from traditional and new
device-based sources could change much of what we understand
about medicine — thoughts now captured by the concept of “preci‐
sion medicine.” From that early thinking, we developed the frame‐
work for a grant with the Robert Wood Johnson Foundation (RWJF)
to explore the technical, organizational, legal, privacy, and other
issues around aggregating health-related data for research — to pro‐
vide empirical lessons for organizations also interested in pushing
for data in health care initiatives. This report begins the process of
sharing what we’ve learned.

Introduction
After decades of maturing in more aggressive industries, datadriven technologies are being adopted, developed, funded, and
deployed throughout the health care market at an unprecedented

1


scale. February 2015 marked the inaugural working group meeting
of the newly announced NIH Precision Medicine Initiative designed
to aggregate a million-person cohort of genotype/phenotype dense
longitudinal health data, where donors provide researchers with the
raw epidemiological evidence to develop better decision-making,
treatment, and potential cures for diseases like cancer. In the past
several years, many established companies and new start-ups have
also started to apply collective intelligence and “big data” platforms
to health and health care problems. All these efforts encounter a set
of unique challenges that experts coming from other disciplines do
not always fully appreciate.
In 2014, the Robert Wood Johnson Foundation funded the subject
of this report, a research effort called “Operationalizing Health
Data,” a deep dive into the health care ecosystem focused on under‐
standing and advancing the integration of personalized health data
in both clinical and research organizations. RWJF encouraged the
small group of data scientists, innovators and health researchers
working on the grant to find and prototype concrete solutions fac‐
ing several partner organizations trying to leverage the value of
health data. The research intends to empirically inform innovation
teams, often coming from non-health-related industries, about the
messy details of using and making sense of data in the heavily regu‐
lated hospital IT environment.
This report describes key learnings identified by the project across
six major facets of the health data ecosystem: complexity, comput‐
ing, context, culture, contracts, and commerce. In future reports, we
will focus on specific tactical challenges the project team addressed.


Complexity: Enormous Domain, Noisy Data,
Not Designed for Machine Consumption
In marked contrast to much of the data generated in enterprise or
consumer markets, health care data is exceedingly complex, and this
complexity makes direct application of the techniques we’ve learned
in other industries surprisingly challenging. Underlying this is the
simple fact that the human organism has no closed-form solution.
Despite thousands of years of study, our real-world comprehension
of human physiology is largely indirect and sparse. When coupled
with the other challenges already inherent in data-intensive applica‐
tions, the fact that we don’t necessarily know the root causes for
2

|

The “Six C’s”: Understanding the Health Data Terrain in the Era of Precision Medicine


measured chemical and biological changes makes health data analy‐
sis and analytics particularly demanding.
Nearly all data derived from a biological system is messy, whether
captured via device, blood test, medical record, or survey. Working
with health data requires understanding the innate challenges of the
data as well as managing many other difficulties, such as:
• Measurements are not typically stable; there are many possible
sources of variation.
• Electronic Medical Record (EMR) discrete data is often entered
by hand; even parsing can be challenging.
• The same underlying data can be encoded or labeled in multiple

ways.
• A vast system of legacy systems and protocols must often be
navigated.
• Personal health data tends to be dominated by longitudinal/
time-series data; interpretation of this data is not necessarily
well understood by either researchers or clinicians in practice.
We can see examples of these challenges in work performed with a
partner developing a personal health app that presents a history of
laboratory test results to a patient. Laboratory measurements such
as Serum Albumin, a measure of the blood concentration of a pro‐
tein produced by the liver, provide evidence of potential health
problems or risk factors. The goal of the app is to enhance the clini‐
cal visit and give patients agency by helping the patient reflect on the
history of their test results along with questions they might want to
ask their clinician. It’s a simple concept to describe, but not so sim‐
ple to execute.
The value produced by a blood test for a single patient will vary
from laboratory to laboratory as well as periodically over time. To
provide a point of comparison, laboratories provide reference
ranges with their test results. These ranges define what is a normal
(in range) or abnormal (out of range) result. Reference ranges typi‐
cally tell you whether you are in the same basic range as 80-95% of
the population, but they do not typically tell you whether a given
measure is significant for you personally. In our test population,
Serum Albumin lower thresholds for when a measure became
“abnormal” was 3.3 or 3.8 g/dL, depending on the laboratory used.

Complexity: Enormous Domain, Noisy Data, Not Designed for Machine Consumption

|


3


Given that the mean value of all samples together was 3.3, these
thresholds become central to determining when a patient is at obvi‐
ous risk.
These causes of variation confound our ability to directly aggregate
across multiple patients and laboratories. Moreover, no common
convention exists to normalize laboratory results for aggregation,
prediction, or optimization. Do we aggregate the discrete interpreta‐
tion of inside, above, below the reference range? Do we aggregate
based on the standard deviation of a measure — does the data even
have a normal distribution? Do we just ignore the noise, aggregate
the values, and rely on the law of large numbers over a large popula‐
tion of patients? These decisions all require a fairly sophisticated
understanding of the inference you are trying to draw from the
aggregate data set.
Perhaps the most interesting question when dealing with health data
is what a specific measure means for an individual. Clinicians do
this for us all the time. For example, many people with naturally
high “bad” (LDL) cholesterol have compensating high levels of
“good” (HDL) cholesterol. They shouldn’t necessarily be on a statin,
yet they can be well above the upper limit that research suggests is a
tipping point for increased risk of heart failure. The clinician knows
to ignore these values for this patient based on all the other factors;
it’s not a straightforward computation for the machine.
Clinicians typically refer to population level results to guide individ‐
ual decisions. However, we can also use personal health data to cap‐
ture a “baseline” so we can compare our health today to our health

in the past. Baselines help us answer critical questions about
whether we are stable, how we respond to therapies, etc. Personal
health baseline measurements also enable a much more precise rea‐
soning about the significance of a change when what is normal for
us is not normal for everyone else.

Computing: Standards and Inter-System
Exchangeability
Accessing and parsing data can also be a significant challenge. Most
electronic medical records are not much better than electronic paper
— meaning that the data entered into it are entered for purposes of
discoverability (to help all providers understand the patient case),

4

|

The “Six C’s”: Understanding the Health Data Terrain in the Era of Precision Medicine


documentation (what happened for legal reasons), regulatory com‐
pliance, and billing (documenting care for the payer). This point is
essential: data is entered into medical records primarily so that other
people can find and review it. It is not entered to enable automated
or aggregate analysis. So, while EMRs are a giant leap forward, they
are not a panacea for machine learning and suffer from significant
garbage-in, garbage-out problems.
One of the hospital systems we talked to still receives all of its labo‐
ratory data by fax image and hand transcribes the fax content into
its EMR. The recorded laboratory data typically stores the data in

certain database fields in certain ways, but there is variation in how
data is encoded across both technicians and laboratories for the
same laboratory test. You will continue to find bugs in the data for
weeks or months after first starting to exchange data. These issues
with EMR data make precision clinical medicine a greater challenge
than more established uses of EMR data, such as population man‐
agement.
Another partner has spent nearly $20 million in grant money on a
project over the past five years building a standardized registry for a
disease condition that simply makes a standard form available in
more than 40 centers so the data can be relayed into an open source
registry system (Indivo) that is used to perform analysis across more
than 15 thousand patients.
The bottom line is there is no standard, interoperable schema for
documenting human health in a digital format — the way cars in a
manufacturing system can be — and until some agreed-upon meth‐
odology for doing that exists, teams working on both intra- and
inter-hospital data aggregations will struggle to generate apples-toapples normalized results.

Context: Critical Metadata for Accurate
Interpretation
In addition to the testing variations described above, the values pro‐
duced by a laboratory blood test are subject to tremendous variation
due to contextual factors such as the time of day the blood was
drawn, what the patient was eating or drinking at the time, the han‐
dling of the specimen, the time between draw and analysis, the spe‐
cific method of analysis, etc. A high value for a given parameter may

Context: Critical Metadata for Accurate Interpretation


|

5


or may not have clinical relevance, even if you are using a personal
baseline. For example, if you forgot to skip your morning coffee
before taking a pre-diabetic blood sugar test, you can get a false pos‐
itive for high blood sugar.
Like blood tests, many medical data sets will have only limited
machine consumable metadata describing what can be essential
context for clinical and research analysis of provided data. One lead‐
ing researcher at the NIH we spoke with argued that the primary
reason he is not interested in patient-provided data is the lack of this
critical contextual data.
The importance of context in interpreting data in health is one of
the key barriers for those who seek to augment or replace clinicians
with analytics. An analytics system is only as good as the input, and
today, health care systems do not give us very good inputs. When we
examined laboratory results, we saw cases of missing reference
ranges; miscoded data; and, for some measures, a great deal of noise.
For example, we saw records of “failed pulmonary function test”
created for billing purposes that had no specific and actionable
information about the patient’s health status. The only way to tell
which tests failed was to read the free text notes associated with the
test; no single regular expression allowed for automatic filtering of
these failed tests.
These challenges arise in personal health data as well. A motion sen‐
sor tells us when we have activity, but it doesn’t tell us when inactiv‐
ity is due to the sensor being located in a purse, on a table, or

because we’re actually sitting at a desk or on the couch. If you are
using this signal to assess the actual activity level of a single person,
it might be insufficient for clinical use. However, the opportunity
with personal health data is that we can triangulate across several
signals to assess the actual context, and it is often possible to engage
the user periodically to fill in critical blanks.
Personal health data, properly managed, can make a powerful con‐
tribution to the health care system by augmenting the impoverished
context clinicians currently get during patient visits. Evidence shows
that patients interact more honestly with machines and that devices
assess patient behavior more accurately than subjective self-report.
New mobile phone health data frameworks from Apple and Google,
along with many independent phone/sensor-based applications can
provide a rich source of contextual clarity. The Health Data Explora‐

6

| The “Six C’s”: Understanding the Health Data Terrain in the Era of Precision Medicine


tion Project — based at the California Institute for Telecommunica‐
tions and Information Technology (Calit2) and supported by the
Robert Wood Johnson Foundation — is sponsoring what they call
“agile research projects” designed to accelerate the understanding of
how health care can best embrace these rapidly evolving personal
health data technologies.

Culture: Lean Start-Up Difficulties in Hospital
Ecosystems
Clinicians have developed a strong culture of skepticism about why

a product, algorithm, or process is beneficial. Three elements of the
health care provider culture should be noted in particular: risk aver‐
sion, evidence orientation, and workflow sensitivity.
Most clinicians make decisions on a daily basis that, if wrong, will
profoundly and negatively impact a patient’s life. If you have some‐
thing that works, that you feel helps people in the large, then inno‐
vation becomes a two edged sword — you might be able to do better,
but you also risk doing worse and hurting people. The false-negative
in health care is intolerable; it is literally foresworn in the Hippo‐
cratic oath.
Given the challenges of building systems that present information or
promote specific actions based on health data, and the costs and
consequence of being wrong, health care professionals have an
exceedingly high bar for utility and reliability. Most clinicians (and
researchers) will only take a new product or intervention seriously
when you’ve proven its value in a peer-reviewed, published trial,
conducted according to medical-ethical standards.
For example, showing patients laboratory results and prompting
them to talk with their clinician about the questions “the system”
thinks are important is unlikely to be adopted until testing shows
that a population of more than 100 patients using the process have
fewer errors, require less provider time, and/or experience better
outcomes than a population of more than 100 patients who don’t.
While this kind of research is far cheaper in digital health than in
medical devices and biological therapies, trials create a heavy burden
for start-ups seeking to enter the health care space. Just getting
inside-the-institution access to clinical environments is a significant
hurdle for most digital health start-ups, much less providing
Culture: Lean Start-Up Difficulties in Hospital Ecosystems


|

7


randomized, placebo-controlled, double-blind study results of their
proposed solution. Proving the efficacy of a process on a population
is a difficult and time consuming task compared to the lean start-up
strategy used to such great effect in consumer and enterprise appli‐
cations.
Finally, even if you can show that an intervention is better, it has to
fit naturally into the hospital’s workflow. The care process in the
medical environment is already packed with complexity: multiple
procedures; safety considerations; instruments; devices; constantly
changing schedules; delays; a multitude of contributing specialists,
insurance, and paperwork constraints; and, in general, very poorly
designed IT systems to help you manage all this. Adding a new tool
or process into this environment requires that it induce only a small
change in current practice, be an absolute must-have, or provide sig‐
nificant time savings to compensate for the new cognitive load and
costs, such as training, inventory management, installation, etc.

Contracts: Navigating IRB, HIPAA, and EULA
Frameworks
Combining research data, clinical data, and consumer personal
health data involves the interaction of multiple legal and regulatory
frameworks. The most common frameworks are: Institutional
Review Board (IRB), Health Insurance Portability and Accountabil‐
ity Act / Health Information Technology for Economic and Clinical
Health (HIPAA/HITECH) , and an mHealth application’s EULA.

These frameworks involve a very complicated set of constraints on
how data can be stored, transported, analyzed, or repurposed. Most
importantly, these issues can be a source of significant friction for
people who want to work with health data.
Research data is governed by a patient consent process that ensures
participants understand the risks and rewards of sharing their health
data. These consent forms and the entire research process are
reviewed by institutional review boards (IRBs) to ensure that the
research is ethical. While digital heath start-ups should feel compel‐
led to get IRB approval for any work exposing humans to research,
many don’t. If you have a hospital partner helping with your medical
research, they generally need to have their participation governed by
an IRB-approved protocol (by law, research on human subjects sup‐
ported by government funds must be approved by an IRB). Typi‐
8

|

The “Six C’s”: Understanding the Health Data Terrain in the Era of Precision Medicine


cally, institutions only trust their own IRB, so a multi-institutional
study can take three to six months just to get the procedures
approved. If you want to change a form or adapt the study in any
way, you need the IRB to approve the proposed change, a process
that can take anywhere from one to eight weeks depending on the
institution and the concerns of the examiner. A single examiner who
doesn’t agree with your approach can cost you weeks or months of
delay. Despite these risks, companies that want to use published evi‐
dence as part of documenting their value proposition are well

advised to consider engaging with a partner or a third-party IRB to
facilitate publication of the results of populations using your prod‐
uct.
Any data a patient provides to a hospital as part of the ordinary pro‐
cess of clinical care is regulated by the HIPAA and HITECH acts.
These acts describe what a “covered entity” (a hospital or other pro‐
vider service) can do with the data you have to provide to them in
order to receive medical care. These regulations typically describe
access controls, auditing provisions, data security policies, training
programs, etc., that have to be followed. Breaches, or leakage of
health data, require notification to your partners, the patients, and
the government. Breaches can result in civil fines, and breaches
caused by willful negligence can trigger much higher fines, easily
reaching into the millions.
Beyond your own behavior, you are also responsible for ensuring
the same behavior from all your subcontractors who come in con‐
tact with personal health data (your cloud provider, customer sup‐
port system, email system, etc.). This chain of responsibility is for‐
malized in a Business Associate Agreement (BAA), a rigorous legal
contract you adopt if you work with a covered entity. The terms of
such documents are typically standardized per institution and nonnegotiable.
Any product that interacts with the health care system must also
pass an extremely conservative security review by the networking
team, and a risk management review by the legal team. As innova‐
tion is not a prime directive of these teams, you are subject to the
most conservative interpretation of risk, and changing their focus to
strategic goals often involves convincing the most senior members
of the institution. That said, having a deployment at several major
centers can significantly reduce the perceived risk. Successful start-


Contracts: Navigating IRB, HIPAA, and EULA Frameworks

|

9


ups often emerge out of existing institutions that are motivated to
deploy and help prove that a concept works.
In the case of the grant work, we hoped to start analyzing a data set
from a major research institution at the start of the project. Instead,
we ran into a maze of organizational bureaucracy as the various
players wrestled with the process described above. The contractor
was unfamiliar with the hospital’s process, and many people at the
hospital had to pass judgement on the project for it to move ahead.
We didn’t face a lack of effort, as everyone involved worked hard to
get access to the data approved -- there’s just a lot to contend with.

Commerce: How Do Digital Health Start-Ups
Get Paid?
In most markets, if you create a differentiated solution to a clear
problem that has economic impact for your customers, you can find
some model by which you get compensated for that solution. This is
simply not the case in health care. We have seen multiple projects
that add real, tangible value to patients and/or hospitals that, for
structural reasons, fail to get any traction in the market. This failure
to launch can involve:
• Payment and billing models
• Pace of adoption
• Evidence barriers

• Integration friction and costs
It is common for new innovations in digital health to encounter
“perverse incentives” in health care, where the existing market cre‐
ates artificial barriers to entry and adoption. What might seem to be
a common-sense solution, with fundamental market-driven princi‐
ples, in another sector might fail completely in health care for
obscure reasons. Payment models are heavily regulated and slow to
change in health care, so instead you are well advised to identify a
value you can get paid for, then work to develop defensive barriers
to compete for the revenue. We have observed that investors with
significant experience in health care focus on the business and pay‐
ment models before the technology and typically prefer not to fund
companies until they’ve achieved $1 million to $2 million in recur‐

10

| The “Six C’s”: Understanding the Health Data Terrain in the Era of Precision Medicine


ring revenues. The message is that in health care, the only proof of
scalable value comes in the form of a recurring revenue stream.
In digital health, payment models come in several flavors:
• Billing codes
• Efficiencies tied to billing codes
• Outcomes
• Population management
On the expenses side, there are health-care-specific costs to be reck‐
oned with as well. For a start-up, HIPAA insurance is relatively
affordable at small scale, but fees can go into the tens of thousands
of dollars as your patient volume grows. Plan on spending $5 per

patient, declining to perhaps 10 cents per patient at volume, just on
HIPAA liability insurance.

Summary
Health care innovation contains more hurdles and requires more
finesse than many start-ups anticipate — particularly for teams that
come out of social-media-, consumer-, or enterprise-focused com‐
panies. However, by paying attention to the lessons from our six C’s
(complexity, computing, context, culture, contracts, and commerce),
teams can anticipate more of what they face, and plan, resource, and
organize funding accordingly. As the CTO of one medical center
told us, innovation comes “down to change management — it’s not
the technology; it’s the adoption, training, and new roles and jobs
required to take advantage of new technology.”
While all of these barriers, in aggregate, might seem insurmounta‐
ble, the chasm between the worlds of open tech innovation and
closed proprietary health care have never been narrower. According
to Rock Health, the $4.1 billion of venture funding for digital startups in 2014 was nearly the equivalent to the prior three years
(2011-2013) combined. Companies that can identify a solution that
successfully navigates all of these constraints can not only create
extraordinary financial returns, but they can have a real impact on
the lives of millions of people. Despite the challenges, we feel there
are many practical innovations that remain to be discovered and
deployed at the intersection of personal health data and the tradi‐
tional health care and clinical research ecosystems.
Summary

|

11



In future posts, we will explore the results of three of the deep-dive
projects we pursued in the course of the RWJF grant as well as intro‐
duce a new company that emerged from a meeting of the minds
among some of our contributors and collaborators. These projects
include a secure analytics platform that facilitates cross-institutional
collaboration at a distance, a model and open source API concept
for digital health SaaS platforms, and an exploration into how to
extract insights from personal health time-series data.

12

|

The “Six C’s”: Understanding the Health Data Terrain in the Era of Precision Medicine



×