Tải bản đầy đủ (.pdf) (27 trang)

oil gas data high performance data tools in the prodcution of industrial powers

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (2.52 MB, 27 trang )

Oil, Gas,
and Data
High-Performance Data Tools in the
Production of Industrial Power

Daniel Cowles
ISBN: 978-1-491-92289-7


Make Data Work
strataconf.com
Presented by O’Reilly and Cloudera,
Strata + Hadoop World is where
cutting-edge data science and new
business fundamentals intersect—
and merge.
n

n

n

Learn business applications of
data technologies
Develop new skills through
trainings and in-depth tutorials
Connect with an international
community of thousands who
work with data

Job # 15420




Oil, Gas, and Data
High-Performance Data Tools in the
Production of Industrial Power

Daniel Cowles


Oil, Gas, and Data
by Daniel Cowles
Copyright © 2015 O’Reilly Media, Inc. All rights reserved.
Printed in the United States of America.
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA
95472.
O’Reilly books may be purchased for educational, business, or sales promotional use.
Online editions are also available for most titles (). For
more information, contact our corporate/institutional sales department:
800-998-9938 or

Editor: Tim McGovern
Production Editor: Kara Ebrahim

Interior Designer: David Futato
Cover Designer: Ellie Volckhausen

First Edition

April 2015:


Revision History for the First Edition
2015-04-10:

First Release

The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Oil, Gas, and
Data, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc.
While the publisher and the author have used good faith efforts to ensure that the
information and instructions contained in this work are accurate, the publisher and
the author disclaim all responsibility for errors or omissions, including without limi‐
tation responsibility for damages resulting from the use of or reliance on this work.
Use of the information and instructions contained in this work is at your own risk. If
any code samples or other technology this work contains or describes is subject to
open source licenses or the intellectual property rights of others, it is your responsi‐
bility to ensure that your use thereof complies with such licenses and/or rights.

978-1-491-92289-7
[LSI]


Table of Contents

Oil, Gas, and Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Introduction
Overview
Upstream
Well Optimization and Mature Wells
Remote Sensors and Network Attached Devices/I of T
Security
Health, Safety, and Environment

High-Performance Computing and Beyond
More Cloud and Mobile
Midstream and Downstream
Emerging Tech
Summary
Innovation in Tough Economic Times
Post-Mortem

1
3
5
6
7
8
11
11
12
13
13
18
18
19

iii



Oil, Gas, and Data

Introduction

When you hear “innovation in oil and gas,” your first thoughts
might go to hardware—bigger, faster, deeper drilling; more powerful
pumping equipment; and bigger transport—or to the “shale revolu‐
tion”—unconventional wells, hydraulic fracturing, horizontal drill‐
ing, and other enhanced oil recovery (EOR) techniques. But, just
like any other industry where optimization is important—and due
to large capital investment and high cost of error, it’s perhaps even
more important in oil and gas than in most other industries—the
potential benefits of predictive analytics, data science, and machine
learning, along with rapid increases in computer processing power
and speed, greater and cheaper storage, and advances in digital
imaging and processing, have driven innovation and created a rich
and disruptive movement among oil and gas companies and their
suppliers.
The truth is, the oil and gas industry has been dealing with large
amounts of data longer than most, some even calling it the “original
big data industry.”1,2 Large increases in the quantity, resolution, and
frequency of seismic data, and advances in “Internet-of-Things"-like
network-attached sensors, devices, and appliances, are being com‐
bined with large amounts of historical data—both digital and

1 Karren, Charles. “Insight Report: Data Centre Developments Get up Close and Per‐

sonal.” OFFCOM News. CTLD Publishing Ltd. Web. 20 Mar. 2015.

2 Boman, Karen. “What Upstream Oil, Gas Can Learn About Big Data from Social

Media.” Rigzone News. Dice Holdings, Inc., 10 Dec. 2014. Web. 20 Mar. 2015.

1



physical—to create one of the most complex data science problems
out there, and a new industry is developing to help solve it.
In oil and gas more than in almost any other industry, efficiency and
accuracy is highly valued, and small improvements in efficiency and
productivity can make a significant economic difference. When a
typical well can cost upwards of ten million dollars—and often
much more—the cost of error is great, and managing cost versus
benefit can mean the difference between profitability and loss. And
not unlike a tech startup, where a meaningful investment upfront is
required before knowing how much the return will be—if any—you
may have to dig many holes to find a successful well. Obviously, the
more certainty you can have, the better, and incrementally increas‐
ing certainty is a place where data science and predictive analytics
promise to help. The payoff from analytics isn’t limited to explora‐
tion: once a well has been successfully drilled, production efficiency
and optimization remains important in the lifetime ultimate recov‐
ery of a well.
In addition, given crude price fluctuations and many other unpre‐
dictable outside variables, capital project planning itself is rife with
uncertainty, and large-scale projects often face significant overages.
“In 2011, upstream offshore oil and gas projects…around 28% had a
cost blowout of more than 50% and the root cause of that is…they
got the numbers wrong,” says Dominic Thasarathar, who watches
the energy sector for the Thought Leadership team at Autodesk.
“Their costs have gone up, they’re dealing in everything from fron‐
tier environments to difficulties raising finance.” According to the
International Energy Agency, capital investments in energy projects
have more than doubled since 2000, and are expected to grow by $2

trillion annually by 2035, so accurately predicting cost versus benefit
is extremely important.3 “Where we see big data fitting in,” contin‐
ues Thasarathar, “is…if you look at the performance for those big
projects, it’s pretty much a horror story in terms of how it’s dropped
off over the last 15–20 years, and the root cause of that is, there’s so
much that project teams need to understand and assimilate in terms
of information to make the right decision.”
But exploration and production aren’t the only areas that can benefit
from innovative data and data science driven solutions. From
3 “WORLD ENERGY INVESTMENT OUTLOOK 2014 FACTSHEET - OVERVIEW.”

International Energy Agency. OECD/IEA, 1 Jan. 2014. Web. 20 Mar. 2015.

2

|

Oil, Gas, and Data


health, safety, and environmental, to cyber security, to transporta‐
tion and manufacturing—opportunities to create greater efficiencies
exist throughout the entire hydrocarbon production and delivery
cycle.

Overview
The oil and gas industry is traditionally broken down into three
broader categories: upstream, which includes exploration, discovery,
and both land and sea drilling and production; midstream, which
includes transportation, wholesale markets, and manufacturing and

refinement of crude; and downstream, which is primarily concerned
with the delivery of refined products to the consumer. The majority
of big and fast data related innovation is found upstream, in the dis‐
covery and exploration phase, where risk and uncertainty are high,
conditions can be—to put it mildly—challenging, and where failure
is very expensive.
The industry is a mature and unique one, built on experience and
hard-won knowledge, and employing the world’s leading geological
scientists and engineers. They’re very good at what they do, and
they’ve been doing it for a long time, but there is an imperative to
add more big data and data science skills like machine learning and
predictive analytics into the mix, skills that oil companies haven’t
traditionally and broadly had in-house. According to Boaz Nur, for‐
mer VP of Energy at data science startup Kaggle, energy analysts
think big data and analytics are the next frontier in oil and gas, but
they’re only now in the early adoption phase. “They [oil and gas
companies] don’t shy away from technology, they’re just careful,”
Nur says. “A lot of snake oil has been sold to the oil and gas compa‐
nies over the years. They’ve also historically done a pretty good job
of producing oil. They’re [already] doing OK; what we’re proposing
will help them take it up to the next level.” Adds Nur: “They’re cau‐
tious but they’re optimistic.”
Halliburton is using big data and data science techniques to try to
solve a variety of problems in the E & P (exploration and produc‐
tion) upstream phase. “We are looking at trying to optimize seismic
space, trying to optimize drilling space, well planning,” says Dr.
Satyam Priyadarshy, Halliburton’s recently hired Chief Data Scien‐
tist. Priyadarshy is bringing some big data techniques to the space:
“For example, we are looking at how to optimize in the seismic


Overview

|

3


world through distributed computing [techniques] because it takes a
long time to process the data.” But Priyadarshy says that it’s a mis‐
take to think that data science methods and techniques are new to
oil and gas. “They’ve actually been using machine learning for many
years,” he says. “People have been using neural networks, fuzzy logic,
SVM, SVRs—pretty much any algorithm you want to talk about in
machine learning, they have been using it. But, they have been using
these in limited cases, to limited value, and the goal is now for peo‐
ple like us (data scientists) to build this into a more valuable prod‐
uct.” He says that the oil and gas industry is unique in terms of the
complexity of the data and models, and that turnkey solutions from
other traditional big data industries can’t be easily applied here. “It’s
a complex challenge. It’s not the same as the other big data players,”
says Priyadarshy, who has worked widely on big data projects in the
news, media, Internet, and insurance spaces. “The complexity in the
oil and gas industry outweighs any other.”
Because of that complexity, Priyadarshy stresses the need for
domain area expertise when dealing with petrotechnical data, and
he has his own definition of the skills a data scientist should have for
the space. “You need a person who has domain expertise, a person
who is a computer scientist, and a business person—these three
actually form a real domain data scientist” for the oil and gas space.
Another complication is legacy and historical data: some is digital,

but much is still found in binder and paper form. From a predictive
modeling standpoint, there’s value to be had, but dealing with old
systems and documents, often at isolated physical properties, or—as
often happens in the industry—inherited through acquisitions and
neglected, makes integrating these pieces into your model
challenging.
Remote standalone locations and physical records and manuals also
hamper efforts to digitally connect a company’s systems and assets—
the much discussed “digital oilfield” idea, where systems are integra‐
ted and automated to tune and optimize operations across the
breadth of the production cycle. “The move to digital operations is
increasing steadily, but there’s an awful lot of legacy out there, things
going back decades, where the drawings were done with, literally,
pen and paper,” says Neale Stidolph, Head of Information Manage‐
ment at Lockheed Martin and based in Aberdeen, where he primar‐
ily deals with North Sea oil fields, including many older legacy wells.
“A large part of the industry is very much tied to documents and
4

|

Oil, Gas, and Data


records. So, there’s still a need to maintain vast physical archives…of
boxes full of old information. And there’s a need to analyze and strip
that to get more value.” And since many of the physical sites
involved are isolated, supplying their own power, without modern
communication networks, there are additional barriers to fully digi‐
tizing operations. “One of the factors the rigs have to cope with is

what they call a black start,” says Stidolph. “If your rig goes down, it
means you’ve lost everything: you’ve lost all power generation, all
connectivity, all systems of every type. You need a flashlight and you
need a manual to be able to see how to get this thing operational
again.” Many of these rigs are in hazardous and remote environ‐
ments, so off-the-shelf connectivity solutions aren’t typically suffi‐
cient.
But, challenges and cultural resistance aside, big data methods are
changing how the industry does business, and these changes will
ultimately result in a changed oil and gas industry.

Upstream
As previously mentioned, oil and gas has long been familiar with
large and diverse datasets, and improvements in technology and
methodology are driving an exponential increase in the amount of
data being collected.
In the exploration space, for example, due to advances in seismic
acquisition methodology, storage capabilities, and processing power,
data gathered via offshore seismic acquisition has gotten both bigger
—due to increased resolution—and faster, due to increase in fre‐
quency and rate of acquisition. The result is 4D data (x/y/z space,
and time) at a far higher resolution, providing far better under‐
standing of subsurface deposits and reservoirs than previously pos‐
sible. Wide azimuth towed streamer acquisition (WATS)—seismic
exploration using multiple ships deploying a miles-wide array of
acoustic equipment—allows companies like Chevron and BP to cre‐
ate high-resolution topographic maps under the earth and beneath
salt canopies, and locate new oil fields that may not have been found
otherwise.4 Time-lapse seismic data acquisition also allows them to
see how reservoirs are behaving as oil begins to flow, allowing them

to optimize production once it begins. As the world’s energy
demands continue to grow, and exploration efforts move farther
4 “Marine Seismic Imaging.” BP.COM. British Petroleum. Web. 20 Mar. 2015.

Upstream

|

5


offshore and into deeper waters, the ability to accurately visualize
deep, complex, subsurface topography is essential. Recent deepwater
discoveries in the Gulf of Mexico have been greatly aided by new
seismic techniques, and there is a direct relationship between
improvements in data storage and data processing, and improve‐
ments in seismically generated image resolution, which in turn
results in new and better understood hydrocarbon discoveries. And
there is still room for improvement in seismic acquisition image res‐
olution: “Even at very high resolution, the images we can make
today still have gaps bigger than the size of a conference room,” says
BP’s John Etgen.

Well Optimization and Mature Wells
Although a lot of recent press and activity focus on the “shale boom”
and other unconventional extraction techniques, according to Halli‐
burton, 70% of the world’s oil and gas comes from mature wells.5 A
mature well is usually defined as one where peak production levels
have been reached, and extraction rates are on the decline, or when
the majority of the relatively “easy to get” hydrocarbons that the well

will ultimately deliver have been extracted. Typically in wells the
early oil and gas is easier and cheaper to extract, and the industry
hasn’t been enthusiastic about optimizing extraction, holding a com‐
mon belief that there is an “economic limit” where it costs more to
get the resources out of the ground than they’re worth on the mar‐
ket. However, modern EOR techniques have become more efficient,
and sensor data and predictive models play a part in that. These
wells have a wide range of factors that make them more complicated
—poor flow, poor rock formations, bore cracks, complex geological
conditions—but they still have a lot to offer in terms of hydrocar‐
bons. In an industry where small margins mean large sums of
money, getting the most from mature and end-of-life wells at the
lowest cost is another area where improved use of data can have a
significant impact on results. Again according to Halliburton, a 1%
increase in production from the mature fields currently active would
add two years to the world’s oil and gas supply.

5 “Maximizing the Value of Mature Fields.” Halliburton.com. Halliburton, 1 May 2012.

Web. 20 Mar. 2015.

6

|

Oil, Gas, and Data


Remote Sensors and Network Attached
Devices/I of T

There is already a lot of application and ongoing interest in
network-attached devices, appliances, and Internet of Things-like
connected devices in the oil and gas space. Halliburton’s Priyadarshy
prefers the term “emerging technology devices,” to the “Internet of
Things” label, which causes some confusion and resistance in the
industry. In any event, remoteness, geographic breadth of facilities
and pipelines, hazardous environments, and inaccessibility of many
aspects of the oil and gas production cycle make it highly disposed
to automation and remote monitoring and optimization. Remotely
monitored and controlled devices can help lower cost, effort, and
error in resource tracking, and can decrease workforce overhead,
improve logistics, and drive well and operations automation and
optimization. It’s a big piece of the “digital oilfield” concept, and one
that the industry has already embraced.
Sensors of all kinds are already used throughout the detection, pro‐
duction, and manufacturing cycle to better understand and monitor
processes and gather data. Sensors can capture fluid pressure, veloc‐
ity and flow, temperature, radiation levels (gamma ray energy is a
useful indication in hydrocarbon discovery), relative orientation and
position, as well as chemical and biological make-up of physical
materials. Trending toward cheaper, smaller, and connected arrays,
newer microsensors can communicate with each other and with
external networks.
From exploration to the gas pump, there are opportunities to use
networked devices. Offshore, submersible devices that gather infor‐
mation can be remotely controlled and are safer alternatives to
human-piloted crafts. Pumps can be remotely monitored and adjus‐
ted, and can be far more economical than manual maintenance.
Midstream in the transportation phase, networked devices can help
track resources through the many and various stages and handoffs

that happen throughout the crude transport process. Pipelines and
remote equipment can be monitored and even maintained remotely.
Biomonitoring workers could increase safety. Gartner has predicted

Remote Sensors and Network Attached Devices/I of T

|

7


as many as 30 billion connected devices by 2020, with 15% of those
in the manufacturing sector.6
The data gathered from all these devices will be valuable for predic‐
tive analytics and other applications: from well sensor data that can
be analyzed to help optimize productivity, to operations data that
can monitor and calibrate operational systems, to transportation
data that can help identify bottlenecks and inefficiencies, to work‐
force data that can help drive safety. But, as Halliburton’s Priyadar‐
shy points out, with those benefits also come some new challenges;
for example, sensor data veracity in different physical environments:
“Imagine a situation where you build a sensor for Texas weather. If
you were to take it to some Middle Eastern country like Kuwait,
where temperatures are [significantly] higher…if the sensor starts
sending data and you are trying to predict based on what you know
from Texas, then you may be in deep trouble.”

Security
As discussed, there is significant pressure to lower costs and opti‐
mize, and remote-controlled and network-attached devices of all

types are a means to that end. The downside is, the more connected
you are, the more vulnerable you are to network intrusions, inten‐
tional or otherwise. And while remote monitoring is also crucial to
improved security, it can open holes itself. “There are gains from the
automation; you can get more protection, you can do better sensing
of what’s happening along your line, there’s lots and lots of opportu‐
nities for managing and monitoring the line using automation,” says
industrial and oil and gas cyber-security expert Eric Byres, “but your
automation system, which is supposed to be protecting your pipe‐
line, [can become] the problem.”
A series of events and attacks have made the oil and gas industry
keenly aware of the need to dramatically improve their cybersecurity. After 9/11, the industry became more concerned about
intentional and coordinated attacks, but it wasn’t until the Stuxnet
worm attack in 2010 that they started to really address the problem.
Stuxnet hit an Iranian nuclear facility in 2010, causing the failure of
uranium-enriching centrifuges. Written specifically to exploit
6 van der Meulen, Rob. “Gartner Says Personal Worlds and the Internet of Everything

Are Colliding to Create New Markets.” Gartner.com. Gartner, Inc., 11 Nov. 2013. Web.
20 Mar. 2015.

8

| Oil, Gas, and Data


Microsoft and Siemens vulnerabilities, Stuxnet was the first promi‐
nent attack against the PLC/SCADA (programmable logic control‐
ler/supervisory control and data acquisition) systems used by indus‐
trial plants of all types, including the oil and gas industry, and previ‐

ously assumed to be safe from cyber attack. To make things even
scarier, Stuxnet—widely reported to be a joint Israeli/US made
cyber-weapon—found its way onto the Natanz enrichment facility
while not connected to the Internet, via sneakernet, on USB drives.
In the case of Stuxnet, the collateral damage—and what might even
be called friendly fire in this new battlefield—spread into the wider
industrial ecosystem, infecting Chevron, with unconfirmed reports
of at least three other major oil companies being affected as well.7,8
Given the sociopolitically charged nature of the industry, oil compa‐
nies were justifiably worried by Stuxnet. Suddenly, the ability for
remote and unaffiliated parties to influence operational and safety
systems was very real: spills, blowouts, explosions, and the potential
for loss of life. In addition to wells and refineries, pipelines, trains,
and other transportation methods are vulnerable to attack, and
beyond any human disaster, the repercussions could be environ‐
mentally catastrophic as well as disruptive to business.
Since Stuxnet, there have been other attacks, including the Shamoon
virus that hit Saudi Aramco in 2012. Initiated by a “disgruntled
insider,” Shamoon wiped out the contents of between 30,000 and
55,000 Saudi Aramco workstations. These attacks, coupled with
environmental and PR disasters like Deepwater Horizon, have given
the industry all the motivation it needs to get serious about security.
“I do think the oil and gas industry is ahead of all the other compa‐
nies [in terms of security],” continues Byres. “There is a real serious
attempt to try and get security under control…that’s the good news.”
But while the majors like Shell, Exxon, Chevron, Total, and in par‐
ticular BP (where Paul Dorey was an early and vocal security advo‐
cate) have become very serious about security, you’re only as strong
as your weakest link, and the industry is dependent on and tightly
integrated with suppliers, contractors, and vendors, many with less

sophisticated approaches to security. “That terrifies the guys at BP,
7 Sale, Richard. “Stuxnet Hit 4 Oil Companies.” Isssource.com. Industrial Safety and Secu‐

rity Source, 15 Nov. 2012. Web. 20 Mar. 2015.

8 King, Rachael. “Virus Aimed at Iran Infected Chevron Network.” Wall Street Journal.

Dow Jones and Company, Inc., 9 Nov. 2012. Web. 20 Mar. 2015.

Security

|

9


that’s why they started becoming evangelists in 2006, 2005—because
they realized they could do a good job on their site, and gain noth‐
ing because of the integration to all the other companies around
them. The other companies were so insecure.”
And it’s not just production that is threatened. The Night Dragon
attacks—thought to be started in China—targeted intellectual prop‐
erty.9 In the PR space, a Sony-type attack on internal email and pro‐
prietary information systems could also have huge ramifications in a
competitive and secretive industry.
While some facilities remain off the net by virtue of being old and
isolated, and instances of air-gapped systems may persist, in general
the digitally attached genie is out of the bottle: the industry is mov‐
ing rapidly toward digital openness, and it won’t be going back. As
Byres notes, “The reality is, modern networks in the oil and gas

industry need a steady diet of data. Data going in and out; security
patches, lab results, remote maintenance, [and] interactions with
customers. So, there’s no way you can isolate a refinery anymore.
There’s just too much need for data on the plant floor now with the
way we’ve built our systems.”
Technology might not always be the best solution in an industry as
fundamentally physical as this one. Byres relates a story of how one
Nigerian delta oil company battled the theft of sections of pipe that
were being taken and sold as scrap metal. They started making them
heavy enough to sink the boats that were used to carry them off, and
the thefts stopped. But anecdotes aside, security is now primary for
oil and gas IT, and while prevention is still important, most now
agree that 100% impenetrability is unlikely, and rapid detection is
the most important security tool. This is an area where data science
and threat analytics can possibly help. Applying machine learning
and pattern recognition to noisy and ever-larger data streams can
preemptively detect anomalies and identify attacks. But Byres thinks
the complexity of the problem means the industry is a ways off from
really leveraging big data and data science solutions in the security
space: “There is an opportunity, there’s no question…but we’re still a
few years away before anyone uses it effectively.”

9 Kirk, Jeremy. “Night Dragon” Attacks from China Strike Energy Companies.”

PCWorld.com. IDG, 12 Feb. 2011. Web. 20 Mar. 2015.

10

|


Oil, Gas, and Data


Health, Safety, and Environment
There is also a lot of optimism around the ways big data can help in
the health, safety, and environmental space (HSE), and around the
ways that predictive analytics and machine learning can be applied
to anticipate well and manufacturing downtimes, malfunctions,
accidents, and spills. As increasing energy demand pushes oil and
gas production into untapped frontiers and deeper waters, with ever
harsher and unpredictable environments, the potential for ecologi‐
cal, human, economic, and public relations disaster increases. So,
companies are highly incentivized to do everything they can to
anticipate and proactively address potential problems. This is a
space where historical data can be analyzed to predict future issues,
and where models can also tie in new data sources, like weather.
In addition, unconventional resource plays have introduced a whole
new set of environmental and safety concerns, from water, air, and
soil pollution to earthquakes. There is thought to be a significant
opportunity to tune and improve all aspects of unconventional drill‐
ing to reduce the harmful side effects. Data science and predictive
models can help drive optimization of fluid injection and more
accurate drilling, and by incorporating ever more underground sen‐
sor data into the model, further improvements can be made.

High-Performance Computing and Beyond
To handle the massive increase in the amount of data—for example,
in 2013 BP stated their computing needs were 20,000 times greater
than in 199910—companies have turned to expensive highperformance computing centers, and are building out data science
expertise in-house, or engaging data science partners. Some of the

largest private supercomputing facilities in the world are now run by
oil companies, with Italian energy company Eni, France-based Total
Group, and now BP all recently building HPC centers capable of
greater than 2 petaflops.11 Eni—which utilizes a CPU/GPU cluster—
claims upward of 3 petaflops, while BP—whose facility cost upwards
of $100 million—claims 3.8 petaflops of computing power and 23.5
10 “BP Opens New Facility in Houston to House the World’s Largest Supercomputer for

Commercial Research.” BP.com. British Petroleum, 22 Oct. 2013. Web. 20 Mar. 2015.

11 Trader, Tiffany. “Eni Joins Oil and Gas Petaflop Club.” HPCWire.com. Tabor Communi‐

cations, Inc., 20 Nov. 2013. Web. 20 Mar. 2015.

Health, Safety, and Environment

|

11


petabytes of disk space, all geared toward processing seismic imag‐
ing and hydrocarbon exploration data.12 GPU processing is now
commonplace for seismic data-crunching, and Intel, with their Xeon
Phi processor, claims similar or better cost/benefit performance.
As falling crude prices impact IT spending, and with cloud comput‐
ing prices dropping almost as fast as that of crude, it seems that
open-source distributed data management computing, in the cloud
or on commodity hardware, could be poised to become a real pres‐
ence in the oil and gas space, particularly for smaller companies who

can’t afford their own HPC center, or skunkworks projects where
resources are scarce. But it’s a cautious IT culture, with many com‐
panies waiting to see what others do before them. “It’s usually cul‐
tural problems that get in the way more than technical capabilities,”
says Stidolph, about adopting new technology solutions. “We often
call it ‘the race to be second.’ In most industries, people want to
innovate, to be the leader, which means you take certain risks, and
you make certain investments. In oil and gas, everybody kind of
queues up to see who steps out of line to make that investment and
take that risk” and follow suit only once it’s proved successful. But,
while crunching seismic data may continue to live in the HPC
world, there are many other use cases where open source and
NoSQL distributed data management systems like Hadoop could
provide cost-effective alternatives to HPC. Hadoop providers like
Hortonworks have begun working with the industry, and they see
opportunities throughout the exploration to delivery petro cycle
(read more about Hortonworks next). Meanwhile, Cloudera has
developed a “Seismic Hadoop” project to demonstrate “how to store
and process seismic data in a Hadoop cluster” on commodity
hardware.13

More Cloud and Mobile
Cloud-based processing of large datasets is also driving innovation
and disruption in the supply chain, and allowing for an untethered
workforce. Products like Autodesk’s ReCap allow customers to
create large (many billion) point-cloud datasets, and render them as
12 “Number Crunching with Big Data.” BP.com. British Petroleum, 22 Dec. 2014. Web. 20

Mar. 2015.


13 Wills, Josh. “Seismic Data Science: Reflection Seismology and Hadoop.” Cloudera.com.

25 Jan. 2012. Web. 20 Feb. 2015.

12

|

Oil, Gas, and Data


3D models to mobile devices quickly. In the oil and gas manufactur‐
ing space, this can mean visualizing wells or facilities on-site via tab‐
let or mobile device. “It’s advances like the cloud that allow things
like ReCap to be able to crunch those numbers and stitch photo‐
graphs together,” says Autodesk’s Thasarathar. And using the cloud
to crunch those datasets is becoming attractive to oil and gas com‐
panies for a few different reasons. Not only does it allow them to
lessen their capital investment in soon-to-be-obsolete hardware, but
they can also move the infrastructure cost/benefit burden to the
cloud provider. “The fact that they can do it on demand, and you
pay for what you use…that consumption-based business model is
incredibly attractive to the industry,” says Thasarathar.

Midstream and Downstream
Primarily because there is less uncertainty, and the cost of failure is
lower, there is less innovative data science activity in the midstream
and downstream sectors, but there is opportunity there as well.
Many of the same principles and techniques apply, especially in
midstream activities like crude transport and pipeline security and

safety, refinery maintenance, and failure monitoring, logistics, and
people and resource management.

Emerging Tech
Things are changing within the sector, where cluster compute plat‐
forms, massive and affordable storage, and new techniques have
enabled companies to evolve their existing tools and methods. Data
science as a practice is being adopted within the industry, but many
companies lack the needed internal data science resources. While
they have abundant expertise in geosciences and engineering,
among other things, they don’t typically have big and unstructured
data, machine learning, predictive analytics, artificial intelligence, or
other data science specific expertise. “They’re recognizing that they
have a lot of data, both historical as well as new, that they aren’t get‐
ting everything they can out of,” says Kaggle’s Nur. So oil and gas
related companies are taking different approaches, building out data
science teams internally or turning to outside companies for exper‐
tise. Let’s take a look at some of these outside companies.

Midstream and Downstream

|

13


Hortonworks
Hortonworks is a leading provider of Hadoop solutions, well known
in the tech sector, but relatively new to oil and gas. They bring tech‐
nical expertise and provide solutions with a toolset that oil and gas

isn’t familiar with, and they bring an open-source approach to a sec‐
tor that isn’t known for its openness or its willingness to share. But
that’s changing as the industry starts to understand the potential in
data science and predictive analytics. “They all want to get into a
modern data architecture, and they realize Hadoop is a cornerstone
for that,” says Ofer Mendelevitch, Hortonwork’s Director of Data
Science. Hortonworks sees opportunities to provide insight
throughout the upstream sector, from using predictive analytics to
improve production optimization by providing a better sense of
when a well might go down, to being better able to predict and pro‐
actively handle safety and environmental hazards, to providing a
broader and more multidimensional dashboard, including services
like weather and social feeds.
Mendelevitch also sees potential in niche cases, like automatically
processing LAS (Log ASCII Standard) files, using algorithms and
fitting curves to identify redundancy and greatly reduce work cur‐
rently done manually. In addition, with a lot of buzz around data
security and Internet of Things, they see companies adjusting their
IT processes to collect more and new data, and becoming more in
touch with their social media streams and presence. And there are
opportunities midstream and downstream as well, in areas like
equipment failure prediction, safety analytics, and portfolio analysis.

Kaggle
Kaggle, a startup with roots as an analytics competition platform, is
another tech sector company to have brought data science expertise
to the oil and gas space. Kaggle took a different approach, hoping to
provide expertise to the oil and gas sector by leveraging its large data
science competition community and platform to provide expertise
to an industry who might not always have the data science skills inhouse to find the solution. Though this business model ultimately

did not pan out, Kaggle did achieve successes using data science and
well logs, production data, and completion data to optimize drilling
parameters like well spacing, orientation, length, and more. “Of
course all these decisions they have to make have an economic cost

14

|

Oil, Gas, and Data


component to them,” said Boaz Nur, former VP of Energy at Kaggle.
“The longer you drill the well, the more it costs; the more proppant14
you use, the more it costs; the more fluid you use, the more it
costs…there is some sort of optimal solution,” Nur continues. “We
ingest all the data…and we basically come up with that optimal sol‐
ution. By helping guide them, they’re able to think about the param‐
eters that are most important. Our strategy is to find where data sci‐
ence solutions add the most value, where the challenging problems
are, and where data science is the most applicable solution.”

SparkBeyond
Unlike some pure data science oriented startups, SparkBeyond uses
domain area experts to work with their data scientist team, to help
ask the right questions and pick the right inputs for their models
and machine learning. They emphasize the value of expertise, and
stress sound methodology when building complex models. They use
Apache Spark, yes, but many other tools as well, and apply a broad
multidisciplinary approach and diverse datasets to their oil and gas

sector work, a sector full of uncertainty throughout the entire pro‐
duction chain. In addition to standard seismic, production, opera‐
tional, and log data, they pull from a variety of other sources. “You
need to incorporate weather data and APIs with geological data,
financial data with news articles to see how geopolitical events can
affect (production) cycles,” says Sagie Davidovich, SparkBeyond’s
CEO. They also incorporate data from other energy sectors, since it
has a direct impact on oil and gas cost/benefit models.
Currently most of the work they do is in the unconventional well
space, which makes building models challenging because of the rela‐
tively small and incomplete sample set for wells of the shale boom
era. “Wells drilled since, say 2009, 2010…are very different from the
wells drilled 20 years ago,” says Meir Maor, SparkBeyond’s Chief
Architect. “So, there’s actually less relevant data to learn from, and
most of these wells have not completed their lifetime, [which makes
ultimate recovery predictions difficult]. We’re looking for the areas
where there is a lot of uncertainty in the exploration process…how
much oil is going to be produced, how fast it’s going to come out,”
says Meir, “and we’re also looking for places that decisions can be
made, that if we can manage to lower that uncertainty with a
14 “Hydraulic Fracturing Proppants.” Wikipedia.com. Wiki Foundation, 2 Feb. 2015. Web.

20 Mar. 2015.

Emerging Tech

|

15



predictive model, it will be actionable.” And with so many new tech‐
niques and methods emerging in the unconventional oil space, how
one drills is becoming as much of an issue as if and where to drill.
“When you are talking about extracting hydrocarbons from solid
rock, it becomes exponentially more difficult. The techniques have
advanced, so there are many different ways [to drill], which can
behave differently…so the decision space is wide and there is a lot of
money at stake and a lot of uncertainty as to what will come out.”
Given the high cost of error in the industry, trust and adoption of
new technologies doesn’t come easily. Even if you have sound pre‐
dictive models, actual economic success can take years to prove. So,
in the meantime, SparkBeyond works hard to remain transparent
and to build models that are easy to understand. Says Davidovich:
“What we learned is that being 30–40% more successful than our
competitors is only the first step to get in the door, then you go
through other steps.”
But they’re seeing things change, partly due to external forces. “If
you think about it, the big data hype actually creates a lot of pressure
on companies to introduce predictive analytics…and the oil and gas
industry is no different,” says Davidovich, and that’s an exciting
prospect to him. “There are so many new undiscovered opportuni‐
ties to bring more certainty to this space, which affects every aspect
of our lives.” Adds Maor: “What’s even more interesting is, when our
client actually acts on this…when the insights we deliver drive
action to change the world in a meaningful way.”

WellWiki
Joel Gehman started the WellWiki project when he was a grad stu‐
dent, at a time when the Pennsylvania Marcellus Formation and

debates around fracking first appeared in mainstream conscious‐
ness. Hoping to become the Wikipedia for wells, WellWiki scrapes
public databases to compile a wiki of North American well info.
They then combine the database feed with contributions from the
community to create a structured dataset tied to user-driven narra‐
tive content. The goal is to have information on all the wells—4 mil‐
lion by Gehman’s estimate—drilled in North America since the
Drake Well in 1859. Neither a watchdog nor industry-backed entity,
WellWiki remains neutral while trying to provide information and
bring transparency to a fragmented and controversial space. “I think

16

|

Oil, Gas, and Data


of it as giving wells biographies.” Gehman says, “Every well has a
story.”
While the data was generally publically available, Gehman found it
difficult to consume in its nonstandardized form. He wants to har‐
monize the data and information, which is regulated and recorded
inconsistently by state and province in the United States and Can‐
ada. Landowners, community members, citizens, journalists, and
attorneys have used the site.
But maybe the real power of WellWiki is what happens behind the
scenes, where all the data that has been collected, scrubbed, and nor‐
malized can then be joined to other datasets using standardized and
unique keys, including parent company financial reports and other

business data, to help researchers, academics, journalists, and others
understand and report on the industry.
There are other well-monitoring organizations, some working to
keep an eye on the industry’s activities. FracTracker is another datadriven organization, providing maps and analysis to “shine a light”
on the impact of fracking and other oil and gas development
projects. SkyTruth is a nonprofit that uses satellite and aerial remote
sensor data and imagery to identify and quantify the effects of oil
and gas production on the environment. According to their site,
“SkyTruth was the first to publicly challenge BP’s inaccurate reports
of the rate of oil spilling into the Gulf.”

Other Disruptors
There are other relatively new technologies beginning to be adopted
that may impact niche segments of the industry. 3D printing has the
potential to bring disruption to the supply chain. Drones are begin‐
ning to be used to acquire aerial images and remote-sensor data.
Crowd-funding platforms like crudefunders.com allow individuals
to participate directly in the oil business. DIY spill cleanup and
monitoring organizations have sprung up in the aftermath of the
Deepwater Horizon spill.

Emerging Tech

|

17


Summary
“It’s early times, but one of the important things to remember is this

is iterative, [it’s] low hanging fruit at first, but over time you can
really dial in models to become really good at predicting, safety, or
maintenance,” says Hortonwork’s Mendelevitch, and the capabilities
of these new tools and platforms are in turn changing the way oil
and gas does business; for example, cheaper storage allows them to
change retention policies, and keep more data longer. “In the past, a
lot of this data was thrown away pretty quickly, because the cost of
storing it was very high. That’s the disruptive part of Hadoop—data
storage is so inexpensive.”
“It’s really important to understand the industry well and the pain
points of the industry in order to develop the appropriate solutions,”
says Kaggle’s Nur, adding that cultural differences don’t matter if you
deliver. “If you have a solution that creates value, and is proven, then
clients will use it.”
Despite it’s massive size, the oil and gas industry operates on small
margins, and depends on efficiency and optimization for profitabil‐
ity. Given extraordinary capital costs, a wide and deep array of risk,
and high cost of error, machine learning and predictive analytics—
driven by faster, distributed cluster computing and larger, cheaper
storage—becomes an increasingly important factor to address the
efficiency and optimization required to extract profits along with
hydrocarbons.

Innovation in Tough Economic Times
Recently, of course, oil prices have plummeted. The question is, how
will lower crude prices affect investments in innovation and data
science? At a time when margins are being squeezed even further,
and vendors up and down the supply chain are being asked to cut
back, will resources be found for new investment? The consensus
seems to be: yes, at first, projects will be cut. But then, innovation

becomes a necessity. “The first reaction is usually, we’ll look at our
suppliers, and we’ll look at our staff, and we’ll look at our freelance
contractors, and we’ll ask them all to take a haircut,” says Lockheed’s
Stidolph, “and then they’ll look and see ‘Have we got any projects we
can suspend or defer?'…but then they have to look at how they can

18

|

Oil, Gas, and Data


be smarter, and collaborate more, share drilling rigs, move the data
more efficiently, mine it harder.” Autodesk’s Thasarathar also sees
opportunity in collaboration, and shared IP, as well as a chance to
get leaner and smarter: “Once the dust has settled on the initial
kneejerk of ‘cut capital spending by 20%’ or whatever it might be…I
think they’re going to look to the supply chains to deliver those costs
that are going to be cut…and one way to do that is through innova‐
tion and technology.” But the current drop in prices may require
more than a “haircut,” as falling prices are causing a steady decline
in the number of rigs open for business, with over 500 US rigs hav‐
ing closed in the past year, and Halliburton, Schlumberger, and
Baker Hughes have all signaled significant layoffs to come.
While some people might see downtimes as an opportunity to inno‐
vate, others become even more risk averse. But SparkBeyond’s Maor
sees even more reason to embrace data solutions in that case:
“When you become risk averse…uncertainty becomes even more of
an issue. You want to have a really good idea of how much oil is

going to come out. They aren’t going to stop drilling…and they still
have great uncertainty. Our solution has proven to dramatically
reduce the amount of uncertainty.”
“The cost of error is higher now [and] you cannot make as many
trial and errors as you could before, because you do need to slow
down some of your activities,” adds Davidovich. “But this also cre‐
ates a higher pressure to innovate. It just takes the inherent chal‐
lenges of applying predictive analytics in such a risk averse space,
and it makes them even more acute.”
Halliburton’s Priyadarshy sees the commitment to big data as a longterm play: “Data projects are an investment, it’s not like you can get
the return tomorrow. You have to invest in it, you have to build a
team, and you have to come up with a plan,” he says. “Think of it
like building a startup within a company.”

Post-Mortem
The falling price of crude is already having an impact on data sci‐
ence forays into the industry. Kaggle—mentioned earlier in this arti‐
cle—recently eliminated their energy-industry consulting business.15
15 McMillan, Robert. “DATA-SCIENCE DARLING KAGGLE CUTS A THIRD OF ITS

STAFF.” Wired.com. Condé Nast, 9 Feb. 2015. Web. 20 Mar. 2015.

Post-Mortem

|

19



×