Drivers of imbalance in machine learning uptake across geology and geophysics

Our Data Science Director, Samuel Fielding and Lorin Davies (Managing Director) recently authored an article for the Machine Learning special issue of First Break this September. In it they use machine learning to interrogate factors driving its uptake in various geoscience domains.

September 2019

Topics covered include visualizing discipline-specific drivers of machine learning uptake (using machine learning), and exploring different ways to facilitate machine learning in undersaturated disciplines within the geosciences.

Machine Learning augments our acquisition, preparation, and interpretation of the data, so our next great challenge is to unlock these advancements to realise the next great productivity leap in data analysis.

EAGE members can read the full article here or anyone can read the abstract here.


Data science for better living

I have a confession to make. I’m lazy. Most of my working day is dedicated to finding neat ways of making my life easier. That can’t be a bad thing, right? Being lazy seems to go hand in hand with a desire to learn computer programming. And programming can certainly make your life easier, often by several orders of magnitude!

As Data Science Director at Petryx my mission is to lighten the load; to let you breeze through some of the everyday things that are necessary but, let’s face it, often stressful, repetitive, or dull. Humans have so much more to offer than being stuck doing the repetitive tasks a computer could easily perform. Also, life’s too short, and we want you to relax a little and enjoy it.

We want to help you work smart, not hard by:

Making the things you need to do easier and faster. Do boring but important things in a fraction of the time, then do something more important, or more fun, instead.
Helping the things you do to be more reproducible. Make it easy to perform exactly the same process over and over again.
Improving the usability and accuracy of results. Make your analyses more useful and allow them to stand the test of time.

At Petryx, we are striving to address some key problems faced by Oil & Gas and other data-reliant industries, starting with…

“Where is the data and why is it in such a mess?”

Our Petryx Database is designed from the ground up to be a single point of entry to cleaned and standardised data that was previously poorly connected and poorly constrained. These data can be mashed up from a mix of both non-exclusive and proprietary sources so can be used by academic institutions and businesses alike. The creation of the Petryx Database seeks to address one of the largest problems in many industries today by bringing our clients data that is ready for their needs and in a clean, easy-to-digest format.

The Petryx Data Lens web tool has been developed to let users more easily access, query, and analyse that data. It is a natural complement to the underlying database and provides access for users not wanting to query data programmatically (e.g. using SQL).

Simply having all of your data in the same place and in a ready-to-use format is a necessity before you can start automating any analysis or other downstream data-centric tasks. With the Petryx Database and Data Lens we aim to provide this essential but often overlooked platform.

Talking of automating tasks brings us to the next problem we want to address…

“I have to do that how many times!?”

Almost all the tasks we perform throughout the working day – moving data around, making a chart, writing a report, etc. – have a repetitive element which could be done more easily by a computer. To someone who isn’t tuned to thinking like a computer these programmable elements may not be readily apparent. However, not only could a computer perform any of these tasks for you, it can often perform them almost instantaneously.

Performing tasks manually has another downside – it’s not easily reproducible. If you want someone else to pick up where you left off, you’d likely write down some instructions – more manual work that we could do without! Writing code so that a computer can perform these tasks for you can almost completely eliminate this problem.

At Petryx we build products and offer services that make everyday tasks more efficient and more reproducible. We want to allow the computer to pick up some of the slack, removing headaches and bottlenecks from your workflows.

The philosophy of using computing power to take the burden off performing manual tasks is central to how we run our business, and to the skills we develop and promote. We use programming languages like R, Python and SQL to help make our workflows more efficient and more reproducible on a daily basis, and we want to help others do the same.

Finally, even if you’ve gathered all your data into the same place, and automated all of your repetitive processes, you’re going to want to analyse it. This brings us to the third problem…

“I don’t have a program that can do that”

Often, many of us don’t have the tools to get the most out of our data. For instance, if you don’t have access to the most appropriate statistical or charting techniques, and instead have to rely on analysing your data in spreadsheets, you are at a significant disadvantage. Even a simple statistical analysis (e.g. a t-test) is remarkably difficult using just a spreadsheet. Don’t get me wrong, spreadsheets are great for many things, but for doing data analysis they are only one step up from a pen, paper, and calculator. Having to use suboptimal methods can lead to erroneous or poorly defined conclusions. This can result not only in direct financial loss but may also force you to redo analyses. More time wasted doing things none of us should have to do!

As regular users of open source programming languages R & Python, Petryx has access to the cutting edge in statistical analysis, data manipulation, data visualisation, and machine learning techniques. For example, many statistical libraries are only available in R or Python. Having powerful tools at our disposal ensures any analysis and interpretation we do maximises the value we get out of our data. It also allows us to generate the most accurate reflection of reality possible with the data available.

Not only are we using tools like R to help us internally with the work we do for others, but we’re also always on the lookout for ways to help you make the most of these tools as well.

The future

Over the last 10 years the Oil & Gas and related industries have certainly moved further down the road to enlightenment. Databases are now often used instead of storing data in files. Computing power is preferred over human power for mundane, repetitive tasks.

However, we still have a way to go.

Petryx are ready to help our partners take the next step in that journey. By using tools at the cutting edge of data storage, processing, visualisation and analysis, we want to give humans more time to do what they’re good at.

By Sam Fielding, Data Science Director