oil gas data

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (10.32 MB, 20 trang )

Oil, Gas, and Data
High-Performance Data Tools in the Production of Industrial Power
Daniel Cowles

Oil, Gas, and Data
by Daniel Cowles
Copyright © 2015 O’Reilly Media, Inc. All rights reserved.
Printed in the United States of America.
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.
O’Reilly books may be purchased for educational, business, or sales promotional use. Online
editions are also available for most titles (). For more information,
contact our corporate/institutional sales department: 800-998-9938 or
Editor: Tim McGovern
Production Editor: Kara Ebrahim
Interior Designer: David Futato
Cover Designer: Ellie Volckhausen
April 2015: First Edition
Revision History for the First Edition
2015-04-10: First Release

The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Oil, Gas, and Data, the cover
image, and related trade dress are trademarks of O’Reilly Media, Inc.
While the publisher and the author have used good faith efforts to ensure that the information and
instructions contained in this work are accurate, the publisher and the author disclaim all
responsibility for errors or omissions, including without limitation responsibility for damages
resulting from the use of or reliance on this work. Use of the information and instructions contained in
this work is at your own risk. If any code samples or other technology this work contains or describes

is subject to open source licenses or the intellectual property rights of others, it is your responsibility
to ensure that your use thereof complies with such licenses and/or rights.
978-1-491-92289-7
[LSI]

Chapter 1. Oil, Gas, and Data
Introduction
When you hear “innovation in oil and gas,” your first thoughts might go to hardware—bigger, faster,
deeper drilling; more powerful pumping equipment; and bigger transport—or to the “shale
revolution”—unconventional wells, hydraulic fracturing, horizontal drilling, and other enhanced oil
recovery (EOR) techniques. But, just like any other industry where optimization is important—and
due to large capital investment and high cost of error, it’s perhaps even more important in oil and gas
than in most other industries—the potential benefits of predictive analytics, data science, and machine
learning, along with rapid increases in computer processing power and speed, greater and cheaper
storage, and advances in digital imaging and processing, have driven innovation and created a rich
and disruptive movement among oil and gas companies and their suppliers.
The truth is, the oil and gas industry has been dealing with large amounts of data longer than most,
some even calling it the “original big data industry.”1,2 Large increases in the quantity, resolution, and
frequency of seismic data, and advances in “Internet-of-Things"-like network-attached sensors,
devices, and appliances, are being combined with large amounts of historical data—both digital and
physical—to create one of the most complex data science problems out there, and a new industry is
developing to help solve it.
In oil and gas more than in almost any other industry, efficiency and accuracy is highly valued, and
small improvements in efficiency and productivity can make a significant economic difference. When
a typical well can cost upwards of ten million dollars—and often much more—the cost of error is
great, and managing cost versus benefit can mean the difference between profitability and loss. And
not unlike a tech startup, where a meaningful investment upfront is required before knowing how
much the return will be—if any—you may have to dig many holes to find a successful well.
Obviously, the more certainty you can have, the better, and incrementally increasing certainty is a

place where data science and predictive analytics promise to help. The payoff from analytics isn’t
limited to exploration: once a well has been successfully drilled, production efficiency and
optimization remains important in the lifetime ultimate recovery of a well.
In addition, given crude price fluctuations and many other unpredictable outside variables, capital
project planning itself is rife with uncertainty, and large-scale projects often face significant
overages. “In 2011, upstream offshore oil and gas projects…around 28% had a cost blowout of more
than 50% and the root cause of that is…they got the numbers wrong,” says Dominic Thasarathar, who
watches the energy sector for the Thought Leadership team at Autodesk. “Their costs have gone up,
they’re dealing in everything from frontier environments to difficulties raising finance.” According to
the International Energy Agency, capital investments in energy projects have more than doubled since
2000, and are expected to grow by $2 trillion annually by 2035, so accurately predicting cost versus
benefit is extremely important.3 “Where we see big data fitting in,” continues Thasarathar, “is…if you

look at the performance for those big projects, it’s pretty much a horror story in terms of how it’s
dropped off over the last 15–20 years, and the root cause of that is, there’s so much that project teams
need to understand and assimilate in terms of information to make the right decision.”
But exploration and production aren’t the only areas that can benefit from innovative data and data
science driven solutions. From health, safety, and environmental, to cyber security, to transportation
and manufacturing—opportunities to create greater efficiencies exist throughout the entire
hydrocarbon production and delivery cycle.

Overview
The oil and gas industry is traditionally broken down into three broader categories: upstream, which
includes exploration, discovery, and both land and sea drilling and production; midstream, which
includes transportation, wholesale markets, and manufacturing and refinement of crude; and
downstream, which is primarily concerned with the delivery of refined products to the consumer. The
majority of big and fast data related innovation is found upstream, in the discovery and exploration
phase, where risk and uncertainty are high, conditions can be—to put it mildly—challenging, and
where failure is very expensive.

The industry is a mature and unique one, built on experience and hard-won knowledge, and employing
the world’s leading geological scientists and engineers. They’re very good at what they do, and
they’ve been doing it for a long time, but there is an imperative to add more big data and data science
skills like machine learning and predictive analytics into the mix, skills that oil companies haven’t
traditionally and broadly had in-house. According to Boaz Nur, former VP of Energy at data science
startup Kaggle, energy analysts think big data and analytics are the next frontier in oil and gas, but
they’re only now in the early adoption phase. “They [oil and gas companies] don’t shy away from
technology, they’re just careful,” Nur says. “A lot of snake oil has been sold to the oil and gas
companies over the years. They’ve also historically done a pretty good job of producing oil. They’re
[already] doing OK; what we’re proposing will help them take it up to the next level.” Adds Nur:
“They’re cautious but they’re optimistic.”
Halliburton is using big data and data science techniques to try to solve a variety of problems in the E
& P (exploration and production) upstream phase. “We are looking at trying to optimize seismic
space, trying to optimize drilling space, well planning,” says Dr. Satyam Priyadarshy, Halliburton’s
recently hired Chief Data Scientist. Priyadarshy is bringing some big data techniques to the space:
“For example, we are looking at how to optimize in the seismic world through distributed computing
[techniques] because it takes a long time to process the data.” But Priyadarshy says that it’s a mistake
to think that data science methods and techniques are new to oil and gas. “They’ve actually been using
machine learning for many years,” he says. “People have been using neural networks, fuzzy logic,
SVM, SVRs—pretty much any algorithm you want to talk about in machine learning, they have been
using it. But, they have been using these in limited cases, to limited value, and the goal is now for
people like us (data scientists) to build this into a more valuable product.” He says that the oil and
gas industry is unique in terms of the complexity of the data and models, and that turnkey solutions

from other traditional big data industries can’t be easily applied here. “It’s a complex challenge. It’s
not the same as the other big data players,” says Priyadarshy, who has worked widely on big data
projects in the news, media, Internet, and insurance spaces. “The complexity in the oil and gas
industry outweighs any other.”
Because of that complexity, Priyadarshy stresses the need for domain area expertise when dealing

with petrotechnical data, and he has his own definition of the skills a data scientist should have for
the space. “You need a person who has domain expertise, a person who is a computer scientist, and a
business person—these three actually form a real domain data scientist” for the oil and gas space.
Another complication is legacy and historical data: some is digital, but much is still found in binder
and paper form. From a predictive modeling standpoint, there’s value to be had, but dealing with old
systems and documents, often at isolated physical properties, or—as often happens in the industry—
inherited through acquisitions and neglected, makes integrating these pieces into your model
challenging.
Remote standalone locations and physical records and manuals also hamper efforts to digitally
connect a company’s systems and assets—the much discussed “digital oilfield” idea, where systems
are integrated and automated to tune and optimize operations across the breadth of the production
cycle. “The move to digital operations is increasing steadily, but there’s an awful lot of legacy out
there, things going back decades, where the drawings were done with, literally, pen and paper,” says
Neale Stidolph, Head of Information Management at Lockheed Martin and based in Aberdeen, where
he primarily deals with North Sea oil fields, including many older legacy wells. “A large part of the
industry is very much tied to documents and records. So, there’s still a need to maintain vast physical
archives…of boxes full of old information. And there’s a need to analyze and strip that to get more
value.” And since many of the physical sites involved are isolated, supplying their own power,
without modern communication networks, there are additional barriers to fully digitizing operations.
“One of the factors the rigs have to cope with is what they call a black start,” says Stidolph. “If your
rig goes down, it means you’ve lost everything: you’ve lost all power generation, all connectivity, all
systems of every type. You need a flashlight and you need a manual to be able to see how to get this
thing operational again.” Many of these rigs are in hazardous and remote environments, so off-theshelf connectivity solutions aren’t typically sufficient.
But, challenges and cultural resistance aside, big data methods are changing how the industry does
business, and these changes will ultimately result in a changed oil and gas industry.

Upstream
As previously mentioned, oil and gas has long been familiar with large and diverse datasets, and
improvements in technology and methodology are driving an exponential increase in the amount of
data being collected.

In the exploration space, for example, due to advances in seismic acquisition methodology, storage
capabilities, and processing power, data gathered via offshore seismic acquisition has gotten both
bigger—due to increased resolution—and faster, due to increase in frequency and rate of acquisition.

The result is 4D data (x/y/z space, and time) at a far higher resolution, providing far better
understanding of subsurface deposits and reservoirs than previously possible. Wide azimuth towed
streamer acquisition (WATS)—seismic exploration using multiple ships deploying a miles-wide
array of acoustic equipment—allows companies like Chevron and BP to create high-resolution
topographic maps under the earth and beneath salt canopies, and locate new oil fields that may not
have been found otherwise.4 Time-lapse seismic data acquisition also allows them to see how
reservoirs are behaving as oil begins to flow, allowing them to optimize production once it begins.
As the world’s energy demands continue to grow, and exploration efforts move farther offshore and
into deeper waters, the ability to accurately visualize deep, complex, subsurface topography is
essential. Recent deepwater discoveries in the Gulf of Mexico have been greatly aided by new
seismic techniques, and there is a direct relationship between improvements in data storage and data
processing, and improvements in seismically generated image resolution, which in turn results in new
and better understood hydrocarbon discoveries. And there is still room for improvement in seismic
acquisition image resolution: “Even at very high resolution, the images we can make today still have
gaps bigger than the size of a conference room,” says BP’s John Etgen.

Well Optimization and Mature Wells
Although a lot of recent press and activity focus on the “shale boom” and other unconventional
extraction techniques, according to Halliburton, 70% of the world’s oil and gas comes from mature
wells.5 A mature well is usually defined as one where peak production levels have been reached, and
extraction rates are on the decline, or when the majority of the relatively “easy to get” hydrocarbons
that the well will ultimately deliver have been extracted. Typically in wells the early oil and gas is
easier and cheaper to extract, and the industry hasn’t been enthusiastic about optimizing extraction,
holding a common belief that there is an “economic limit” where it costs more to get the resources out
of the ground than they’re worth on the market. However, modern EOR techniques have become more

efficient, and sensor data and predictive models play a part in that. These wells have a wide range of
factors that make them more complicated—poor flow, poor rock formations, bore cracks, complex
geological conditions—but they still have a lot to offer in terms of hydrocarbons. In an industry where
small margins mean large sums of money, getting the most from mature and end-of-life wells at the
lowest cost is another area where improved use of data can have a significant impact on results.
Again according to Halliburton, a 1% increase in production from the mature fields currently active
would add two years to the world’s oil and gas supply.

Remote Sensors and Network Attached Devices/I of T
There is already a lot of application and ongoing interest in network-attached devices, appliances,
and Internet of Things-like connected devices in the oil and gas space. Halliburton’s Priyadarshy
prefers the term “emerging technology devices,” to the “Internet of Things” label, which causes some
confusion and resistance in the industry. In any event, remoteness, geographic breadth of facilities and
pipelines, hazardous environments, and inaccessibility of many aspects of the oil and gas production

cycle make it highly disposed to automation and remote monitoring and optimization. Remotely
monitored and controlled devices can help lower cost, effort, and error in resource tracking, and can
decrease workforce overhead, improve logistics, and drive well and operations automation and
optimization. It’s a big piece of the “digital oilfield” concept, and one that the industry has already
embraced.
Sensors of all kinds are already used throughout the detection, production, and manufacturing cycle to
better understand and monitor processes and gather data. Sensors can capture fluid pressure, velocity
and flow, temperature, radiation levels (gamma ray energy is a useful indication in hydrocarbon
discovery), relative orientation and position, as well as chemical and biological make-up of physical
materials. Trending toward cheaper, smaller, and connected arrays, newer microsensors can
communicate with each other and with external networks.
From exploration to the gas pump, there are opportunities to use networked devices. Offshore,
submersible devices that gather information can be remotely controlled and are safer alternatives to
human-piloted crafts. Pumps can be remotely monitored and adjusted, and can be far more

economical than manual maintenance. Midstream in the transportation phase, networked devices can
help track resources through the many and various stages and handoffs that happen throughout the
crude transport process. Pipelines and remote equipment can be monitored and even maintained
remotely. Biomonitoring workers could increase safety. Gartner has predicted as many as 30 billion
connected devices by 2020, with 15% of those in the manufacturing sector.6
The data gathered from all these devices will be valuable for predictive analytics and other
applications: from well sensor data that can be analyzed to help optimize productivity, to operations
data that can monitor and calibrate operational systems, to transportation data that can help identify
bottlenecks and inefficiencies, to workforce data that can help drive safety. But, as Halliburton’s
Priyadarshy points out, with those benefits also come some new challenges; for example, sensor data
veracity in different physical environments: “Imagine a situation where you build a sensor for Texas
weather. If you were to take it to some Middle Eastern country like Kuwait, where temperatures are
[significantly] higher…if the sensor starts sending data and you are trying to predict based on what
you know from Texas, then you may be in deep trouble.”

Security
As discussed, there is significant pressure to lower costs and optimize, and remote-controlled and
network-attached devices of all types are a means to that end. The downside is, the more connected
you are, the more vulnerable you are to network intrusions, intentional or otherwise. And while
remote monitoring is also crucial to improved security, it can open holes itself. “There are gains from
the automation; you can get more protection, you can do better sensing of what’s happening along your
line, there’s lots and lots of opportunities for managing and monitoring the line using automation,”
says industrial and oil and gas cyber-security expert Eric Byres, “but your automation system, which
is supposed to be protecting your pipeline, [can become] the problem.”
A series of events and attacks have made the oil and gas industry keenly aware of the need to

dramatically improve their cyber-security. After 9/11, the industry became more concerned about
intentional and coordinated attacks, but it wasn’t until the Stuxnet worm attack in 2010 that they
started to really address the problem. Stuxnet hit an Iranian nuclear facility in 2010, causing the

failure of uranium-enriching centrifuges. Written specifically to exploit Microsoft and Siemens
vulnerabilities, Stuxnet was the first prominent attack against the PLC/SCADA (programmable logic
controller/supervisory control and data acquisition) systems used by industrial plants of all types,
including the oil and gas industry, and previously assumed to be safe from cyber attack. To make
things even scarier, Stuxnet—widely reported to be a joint Israeli/US made cyber-weapon—found its
way onto the Natanz enrichment facility while not connected to the Internet, via sneakernet, on USB
drives. In the case of Stuxnet, the collateral damage—and what might even be called friendly fire in
this new battlefield—spread into the wider industrial ecosystem, infecting Chevron, with
unconfirmed reports of at least three other major oil companies being affected as well.7,8
Given the sociopolitically charged nature of the industry, oil companies were justifiably worried by
Stuxnet. Suddenly, the ability for remote and unaffiliated parties to influence operational and safety
systems was very real: spills, blowouts, explosions, and the potential for loss of life. In addition to
wells and refineries, pipelines, trains, and other transportation methods are vulnerable to attack, and
beyond any human disaster, the repercussions could be environmentally catastrophic as well as
disruptive to business.
Since Stuxnet, there have been other attacks, including the Shamoon virus that hit Saudi Aramco in
2012. Initiated by a “disgruntled insider,” Shamoon wiped out the contents of between 30,000 and
55,000 Saudi Aramco workstations. These attacks, coupled with environmental and PR disasters like
Deepwater Horizon, have given the industry all the motivation it needs to get serious about security.
“I do think the oil and gas industry is ahead of all the other companies [in terms of security],”
continues Byres. “There is a real serious attempt to try and get security under control…that’s the good
news.” But while the majors like Shell, Exxon, Chevron, Total, and in particular BP (where Paul
Dorey was an early and vocal security advocate) have become very serious about security, you’re
only as strong as your weakest link, and the industry is dependent on and tightly integrated with
suppliers, contractors, and vendors, many with less sophisticated approaches to security. “That
terrifies the guys at BP, that’s why they started becoming evangelists in 2006, 2005—because they
realized they could do a good job on their site, and gain nothing because of the integration to all the
other companies around them. The other companies were so insecure.”
And it’s not just production that is threatened. The Night Dragon attacks—thought to be started in
China—targeted intellectual property.9 In the PR space, a Sony-type attack on internal email and

proprietary information systems could also have huge ramifications in a competitive and secretive
industry.
While some facilities remain off the net by virtue of being old and isolated, and instances of airgapped systems may persist, in general the digitally attached genie is out of the bottle: the industry is
moving rapidly toward digital openness, and it won’t be going back. As Byres notes, “The reality is,
modern networks in the oil and gas industry need a steady diet of data. Data going in and out; security
patches, lab results, remote maintenance, [and] interactions with customers. So, there’s no way you

can isolate a refinery anymore. There’s just too much need for data on the plant floor now with the
way we’ve built our systems.”
Technology might not always be the best solution in an industry as fundamentally physical as this one.
Byres relates a story of how one Nigerian delta oil company battled the theft of sections of pipe that
were being taken and sold as scrap metal. They started making them heavy enough to sink the boats
that were used to carry them off, and the thefts stopped. But anecdotes aside, security is now primary
for oil and gas IT, and while prevention is still important, most now agree that 100% impenetrability
is unlikely, and rapid detection is the most important security tool. This is an area where data science
and threat analytics can possibly help. Applying machine learning and pattern recognition to noisy and
ever-larger data streams can preemptively detect anomalies and identify attacks. But Byres thinks the
complexity of the problem means the industry is a ways off from really leveraging big data and data
science solutions in the security space: “There is an opportunity, there’s no question…but we’re still
a few years away before anyone uses it effectively.”

Health, Safety, and Environment
There is also a lot of optimism around the ways big data can help in the health, safety, and
environmental space (HSE), and around the ways that predictive analytics and machine learning can
be applied to anticipate well and manufacturing downtimes, malfunctions, accidents, and spills. As
increasing energy demand pushes oil and gas production into untapped frontiers and deeper waters,
with ever harsher and unpredictable environments, the potential for ecological, human, economic, and
public relations disaster increases. So, companies are highly incentivized to do everything they can to
anticipate and proactively address potential problems. This is a space where historical data can be

analyzed to predict future issues, and where models can also tie in new data sources, like weather.
In addition, unconventional resource plays have introduced a whole new set of environmental and
safety concerns, from water, air, and soil pollution to earthquakes. There is thought to be a significant
opportunity to tune and improve all aspects of unconventional drilling to reduce the harmful side
effects. Data science and predictive models can help drive optimization of fluid injection and more
accurate drilling, and by incorporating ever more underground sensor data into the model, further
improvements can be made.

High-Performance Computing and Beyond
To handle the massive increase in the amount of data—for example, in 2013 BP stated their
computing needs were 20,000 times greater than in 199910—companies have turned to expensive
high-performance computing centers, and are building out data science expertise in-house, or
engaging data science partners. Some of the largest private supercomputing facilities in the world are
now run by oil companies, with Italian energy company Eni, France-based Total Group, and now BP
all recently building HPC centers capable of greater than 2 petaflops.11 Eni—which utilizes a
CPU/GPU cluster—claims upward of 3 petaflops, while BP—whose facility cost upwards of $100

million—claims 3.8 petaflops of computing power and 23.5 petabytes of disk space, all geared
toward processing seismic imaging and hydrocarbon exploration data.12 GPU processing is now
commonplace for seismic data-crunching, and Intel, with their Xeon Phi processor, claims similar or
better cost/benefit performance.
As falling crude prices impact IT spending, and with cloud computing prices dropping almost as fast
as that of crude, it seems that open-source distributed data management computing, in the cloud or on
commodity hardware, could be poised to become a real presence in the oil and gas space,
particularly for smaller companies who can’t afford their own HPC center, or skunkworks projects
where resources are scarce. But it’s a cautious IT culture, with many companies waiting to see what
others do before them. “It’s usually cultural problems that get in the way more than technical
capabilities,” says Stidolph, about adopting new technology solutions. “We often call it ‘the race to
be second.’ In most industries, people want to innovate, to be the leader, which means you take

certain risks, and you make certain investments. In oil and gas, everybody kind of queues up to see
who steps out of line to make that investment and take that risk” and follow suit only once it’s proved
successful. But, while crunching seismic data may continue to live in the HPC world, there are many
other use cases where open source and NoSQL distributed data management systems like Hadoop
could provide cost-effective alternatives to HPC. Hadoop providers like Hortonworks have begun
working with the industry, and they see opportunities throughout the exploration to delivery petro
cycle (read more about Hortonworks next). Meanwhile, Cloudera has developed a “Seismic
Hadoop” project to demonstrate “how to store and process seismic data in a Hadoop cluster” on
commodity hardware.13

More Cloud and Mobile
Cloud-based processing of large datasets is also driving innovation and disruption in the supply
chain, and allowing for an untethered workforce. Products like Autodesk’s ReCap allow customers to
create large (many billion) point-cloud datasets, and render them as 3D models to mobile devices
quickly. In the oil and gas manufacturing space, this can mean visualizing wells or facilities on-site
via tablet or mobile device. “It’s advances like the cloud that allow things like ReCap to be able to
crunch those numbers and stitch photographs together,” says Autodesk’s Thasarathar. And using the
cloud to crunch those datasets is becoming attractive to oil and gas companies for a few different
reasons. Not only does it allow them to lessen their capital investment in soon-to-be-obsolete
hardware, but they can also move the infrastructure cost/benefit burden to the cloud provider. “The
fact that they can do it on demand, and you pay for what you use…that consumption-based business
model is incredibly attractive to the industry,” says Thasarathar.

Midstream and Downstream
Primarily because there is less uncertainty, and the cost of failure is lower, there is less innovative
data science activity in the midstream and downstream sectors, but there is opportunity there as well.
Many of the same principles and techniques apply, especially in midstream activities like crude

transport and pipeline security and safety, refinery maintenance, and failure monitoring, logistics, and

people and resource management.

Emerging Tech
Things are changing within the sector, where cluster compute platforms, massive and affordable
storage, and new techniques have enabled companies to evolve their existing tools and methods. Data
science as a practice is being adopted within the industry, but many companies lack the needed
internal data science resources. While they have abundant expertise in geosciences and engineering,
among other things, they don’t typically have big and unstructured data, machine learning, predictive
analytics, artificial intelligence, or other data science specific expertise. “They’re recognizing that
they have a lot of data, both historical as well as new, that they aren’t getting everything they can out
of,” says Kaggle’s Nur. So oil and gas related companies are taking different approaches, building
out data science teams internally or turning to outside companies for expertise. Let’s take a look at
some of these outside companies.

Hortonworks
Hortonworks is a leading provider of Hadoop solutions, well known in the tech sector, but relatively
new to oil and gas. They bring technical expertise and provide solutions with a toolset that oil and
gas isn’t familiar with, and they bring an open-source approach to a sector that isn’t known for its
openness or its willingness to share. But that’s changing as the industry starts to understand the
potential in data science and predictive analytics. “They all want to get into a modern data
architecture, and they realize Hadoop is a cornerstone for that,” says Ofer Mendelevitch,
Hortonwork’s Director of Data Science. Hortonworks sees opportunities to provide insight
throughout the upstream sector, from using predictive analytics to improve production optimization by
providing a better sense of when a well might go down, to being better able to predict and
proactively handle safety and environmental hazards, to providing a broader and more
multidimensional dashboard, including services like weather and social feeds.
Mendelevitch also sees potential in niche cases, like automatically processing LAS (Log ASCII
Standard) files, using algorithms and fitting curves to identify redundancy and greatly reduce work
currently done manually. In addition, with a lot of buzz around data security and Internet of Things,
they see companies adjusting their IT processes to collect more and new data, and becoming more in

touch with their social media streams and presence. And there are opportunities midstream and
downstream as well, in areas like equipment failure prediction, safety analytics, and portfolio
analysis.

Kaggle
Kaggle, a startup with roots as an analytics competition platform, is another tech sector company to
have brought data science expertise to the oil and gas space. Kaggle took a different approach, hoping
to provide expertise to the oil and gas sector by leveraging its large data science competition

community and platform to provide expertise to an industry who might not always have the data
science skills in-house to find the solution. Though this business model ultimately did not pan out,
Kaggle did achieve successes using data science and well logs, production data, and completion data
to optimize drilling parameters like well spacing, orientation, length, and more. “Of course all these
decisions they have to make have an economic cost component to them,” said Boaz Nur, former VP of
Energy at Kaggle. “The longer you drill the well, the more it costs; the more proppant14 you use, the
more it costs; the more fluid you use, the more it costs…there is some sort of optimal solution,” Nur
continues. “We ingest all the data…and we basically come up with that optimal solution. By helping
guide them, they’re able to think about the parameters that are most important. Our strategy is to find
where data science solutions add the most value, where the challenging problems are, and where data
science is the most applicable solution.”

SparkBeyond
Unlike some pure data science oriented startups, SparkBeyond uses domain area experts to work with
their data scientist team, to help ask the right questions and pick the right inputs for their models and
machine learning. They emphasize the value of expertise, and stress sound methodology when
building complex models. They use Apache Spark, yes, but many other tools as well, and apply a
broad multidisciplinary approach and diverse datasets to their oil and gas sector work, a sector full
of uncertainty throughout the entire production chain. In addition to standard seismic, production,
operational, and log data, they pull from a variety of other sources. “You need to incorporate weather

data and APIs with geological data, financial data with news articles to see how geopolitical events
can affect (production) cycles,” says Sagie Davidovich, SparkBeyond’s CEO. They also incorporate
data from other energy sectors, since it has a direct impact on oil and gas cost/benefit models.
Currently most of the work they do is in the unconventional well space, which makes building models
challenging because of the relatively small and incomplete sample set for wells of the shale boom
era. “Wells drilled since, say 2009, 2010…are very different from the wells drilled 20 years ago,”
says Meir Maor, SparkBeyond’s Chief Architect. “So, there’s actually less relevant data to learn
from, and most of these wells have not completed their lifetime, [which makes ultimate recovery
predictions difficult]. We’re looking for the areas where there is a lot of uncertainty in the exploration
process…how much oil is going to be produced, how fast it’s going to come out,” says Meir, “and
we’re also looking for places that decisions can be made, that if we can manage to lower that
uncertainty with a predictive model, it will be actionable.” And with so many new techniques and
methods emerging in the unconventional oil space, how one drills is becoming as much of an issue as
if and where to drill. “When you are talking about extracting hydrocarbons from solid rock, it
becomes exponentially more difficult. The techniques have advanced, so there are many different
ways [to drill], which can behave differently…so the decision space is wide and there is a lot of
money at stake and a lot of uncertainty as to what will come out.”
Given the high cost of error in the industry, trust and adoption of new technologies doesn’t come
easily. Even if you have sound predictive models, actual economic success can take years to prove.
So, in the meantime, SparkBeyond works hard to remain transparent and to build models that are easy

to understand. Says Davidovich: “What we learned is that being 30–40% more successful than our
competitors is only the first step to get in the door, then you go through other steps.”
But they’re seeing things change, partly due to external forces. “If you think about it, the big data hype
actually creates a lot of pressure on companies to introduce predictive analytics…and the oil and gas
industry is no different,” says Davidovich, and that’s an exciting prospect to him. “There are so many
new undiscovered opportunities to bring more certainty to this space, which affects every aspect of
our lives.” Adds Maor: “What’s even more interesting is, when our client actually acts on this…when
the insights we deliver drive action to change the world in a meaningful way.”

WellWiki
Joel Gehman started the WellWiki project when he was a grad student, at a time when the
Pennsylvania Marcellus Formation and debates around fracking first appeared in mainstream
consciousness. Hoping to become the Wikipedia for wells, WellWiki scrapes public databases to
compile a wiki of North American well info. They then combine the database feed with contributions
from the community to create a structured dataset tied to user-driven narrative content. The goal is to
have information on all the wells—4 million by Gehman’s estimate—drilled in North America since
the Drake Well in 1859. Neither a watchdog nor industry-backed entity, WellWiki remains neutral
while trying to provide information and bring transparency to a fragmented and controversial space.
“I think of it as giving wells biographies.” Gehman says, “Every well has a story.”
While the data was generally publically available, Gehman found it difficult to consume in its
nonstandardized form. He wants to harmonize the data and information, which is regulated and
recorded inconsistently by state and province in the United States and Canada. Landowners,
community members, citizens, journalists, and attorneys have used the site.
But maybe the real power of WellWiki is what happens behind the scenes, where all the data that has
been collected, scrubbed, and normalized can then be joined to other datasets using standardized and
unique keys, including parent company financial reports and other business data, to help researchers,
academics, journalists, and others understand and report on the industry.
There are other well-monitoring organizations, some working to keep an eye on the industry’s
activities. FracTracker is another data-driven organization, providing maps and analysis to “shine a
light” on the impact of fracking and other oil and gas development projects. SkyTruth is a nonprofit
that uses satellite and aerial remote sensor data and imagery to identify and quantify the effects of oil
and gas production on the environment. According to their site, “SkyTruth was the first to publicly
challenge BP’s inaccurate reports of the rate of oil spilling into the Gulf.”

Other Disruptors
There are other relatively new technologies beginning to be adopted that may impact niche segments
of the industry. 3D printing has the potential to bring disruption to the supply chain. Drones are
beginning to be used to acquire aerial images and remote-sensor data. Crowd-funding platforms like

crudefunders.com allow individuals to participate directly in the oil business. DIY spill cleanup and

monitoring organizations have sprung up in the aftermath of the Deepwater Horizon spill.

Summary
“It’s early times, but one of the important things to remember is this is iterative, [it’s] low hanging
fruit at first, but over time you can really dial in models to become really good at predicting, safety,
or maintenance,” says Hortonwork’s Mendelevitch, and the capabilities of these new tools and
platforms are in turn changing the way oil and gas does business; for example, cheaper storage allows
them to change retention policies, and keep more data longer. “In the past, a lot of this data was
thrown away pretty quickly, because the cost of storing it was very high. That’s the disruptive part of
Hadoop—data storage is so inexpensive.”
“It’s really important to understand the industry well and the pain points of the industry in order to
develop the appropriate solutions,” says Kaggle’s Nur, adding that cultural differences don’t matter if
you deliver. “If you have a solution that creates value, and is proven, then clients will use it.”
Despite it’s massive size, the oil and gas industry operates on small margins, and depends on
efficiency and optimization for profitability. Given extraordinary capital costs, a wide and deep array
of risk, and high cost of error, machine learning and predictive analytics—driven by faster,
distributed cluster computing and larger, cheaper storage—becomes an increasingly important factor
to address the efficiency and optimization required to extract profits along with hydrocarbons.

Innovation in Tough Economic Times
Recently, of course, oil prices have plummeted. The question is, how will lower crude prices affect
investments in innovation and data science? At a time when margins are being squeezed even further,
and vendors up and down the supply chain are being asked to cut back, will resources be found for
new investment? The consensus seems to be: yes, at first, projects will be cut. But then, innovation
becomes a necessity. “The first reaction is usually, we’ll look at our suppliers, and we’ll look at our
staff, and we’ll look at our freelance contractors, and we’ll ask them all to take a haircut,” says
Lockheed’s Stidolph, “and then they’ll look and see ‘Have we got any projects we can suspend or

defer?'…but then they have to look at how they can be smarter, and collaborate more, share drilling
rigs, move the data more efficiently, mine it harder.” Autodesk’s Thasarathar also sees opportunity in
collaboration, and shared IP, as well as a chance to get leaner and smarter: “Once the dust has settled
on the initial kneejerk of ‘cut capital spending by 20%’ or whatever it might be…I think they’re going
to look to the supply chains to deliver those costs that are going to be cut…and one way to do that is
through innovation and technology.” But the current drop in prices may require more than a “haircut,”
as falling prices are causing a steady decline in the number of rigs open for business, with over 500
US rigs having closed in the past year, and Halliburton, Schlumberger, and Baker Hughes have all
signaled significant layoffs to come.
While some people might see downtimes as an opportunity to innovate, others become even more risk
averse. But SparkBeyond’s Maor sees even more reason to embrace data solutions in that case:

“When you become risk averse…uncertainty becomes even more of an issue. You want to have a
really good idea of how much oil is going to come out. They aren’t going to stop drilling…and they
still have great uncertainty. Our solution has proven to dramatically reduce the amount of
uncertainty.”
“The cost of error is higher now [and] you cannot make as many trial and errors as you could before,
because you do need to slow down some of your activities,” adds Davidovich. “But this also creates
a higher pressure to innovate. It just takes the inherent challenges of applying predictive analytics in
such a risk averse space, and it makes them even more acute.”
Halliburton’s Priyadarshy sees the commitment to big data as a long-term play: “Data projects are an
investment, it’s not like you can get the return tomorrow. You have to invest in it, you have to build a
team, and you have to come up with a plan,” he says. “Think of it like building a startup within a
company.”

Post-Mortem
The falling price of crude is already having an impact on data science forays into the industry. Kaggle
—mentioned earlier in this article—recently eliminated their energy-industry consulting business.15 In
addition to falling prices, it’s possible that petro companies were uncomfortable with Kaggle’s

“competition-based” business model, which could require them to share their private data with the
data science community. While Kaggle’s innovative algorithms and predictive expertise may very
well have contributed insights and improved efficiencies to the century-old effort of supplying
petroleum to the industrial world, it seems that not enough companies were willing to make that leap
quite yet, particularly given the current environment. In the data era, however, we look forward to
seeing more experiments—and successes—in this high-stakes industry.
1

Karren, Charles. “Insight Report: Data Centre Developments Get up Close and Personal.”
OFFCOM News. CTLD Publishing Ltd. Web. 20 Mar. 2015.
2

Boman, Karen. “What Upstream Oil, Gas Can Learn About Big Data from Social Media.” Rigzone
News. Dice Holdings, Inc., 10 Dec. 2014. Web. 20 Mar. 2015.
3

“WORLD ENERGY INVESTMENT OUTLOOK 2014 FACTSHEET - OVERVIEW.”
International Energy Agency. OECD/IEA, 1 Jan. 2014. Web. 20 Mar. 2015.
4

“Marine Seismic Imaging.” BP.COM. British Petroleum. Web. 20 Mar. 2015.

5

“Maximizing the Value of Mature Fields.” Halliburton.com. Halliburton, 1 May 2012. Web. 20
Mar. 2015.
6

van der Meulen, Rob. “Gartner Says Personal Worlds and the Internet of Everything Are Colliding
to Create New Markets.” Gartner.com. Gartner, Inc., 11 Nov. 2013. Web. 20 Mar. 2015.

7

Sale, Richard. “Stuxnet Hit 4 Oil Companies.” Isssource.com. Industrial Safety and Security
Source, 15 Nov. 2012. Web. 20 Mar. 2015.
8

King, Rachael. “Virus Aimed at Iran Infected Chevron Network.” Wall Street Journal. Dow Jones

and Company, Inc., 9 Nov. 2012. Web. 20 Mar. 2015.
9

Kirk, Jeremy. “Night Dragon” Attacks from China Strike Energy Companies.” PCWorld.com. IDG,
12 Feb. 2011. Web. 20 Mar. 2015.
10

“BP Opens New Facility in Houston to House the World’s Largest Supercomputer for Commercial
Research.” BP.com. British Petroleum, 22 Oct. 2013. Web. 20 Mar. 2015.
11

Trader, Tiffany. “Eni Joins Oil and Gas Petaflop Club.” HPCWire.com. Tabor Communications,
Inc., 20 Nov. 2013. Web. 20 Mar. 2015.
12

“Number Crunching with Big Data.” BP.com. British Petroleum, 22 Dec. 2014. Web. 20 Mar.
2015.
13

Wills, Josh. “Seismic Data Science: Reflection Seismology and Hadoop.” Cloudera.com. 25 Jan.
2012. Web. 20 Feb. 2015.

14

“Hydraulic Fracturing Proppants.” Wikipedia.com. Wiki Foundation, 2 Feb. 2015. Web. 20 Mar.
2015.
15

McMillan, Robert. “DATA-SCIENCE DARLING KAGGLE CUTS A THIRD OF ITS STAFF.”
Wired.com. Condé Nast, 9 Feb. 2015. Web. 20 Mar. 2015.

About the Author
Dan Cowles is a writer, filmmaker, and data geek who has worked in the tech sector for the last 20+
years. Dan is interested in human beings and their stories, and all manner and method of telling them.
He lives in Berkeley, California, with his wife and son.

oil gas data

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về