Our Data Science Director, Samuel Fielding and Lorin Davies (Managing Director) recently authored an article for the Machine Learning special issue of First Break this September. In it they use machine learning to interrogate factors driving its uptake in various geoscience domains.
Topics covered include visualizing discipline-specific drivers of machine learning uptake (using machine learning), and exploring different ways to facilitate machine learning in undersaturated disciplines within the geosciences.
Machine Learning augments our acquisition, preparation, and interpretation of the data, so our next great challenge is to unlock these advancements to realise the next great productivity leap in data analysis.
EAGE members can read the full article here or anyone can read the abstract here.
Past industry experience has shown that when geoscientists start to think about sediment provenance, people’s first thoughts often go straight to traditional methods such as petrographic point counting and heavy mineral analyses. However, it has long been known that the resistance of heavy minerals to weathering and transportation is highly variable and can therefore alter provenance signals.
“Perhaps the most general problem is that of heavy-mineral resistance to both mechanical and chemical processes…The co-operation of all sedimentary petrologists is needed in solving these major problems.” – Sindowski, 1949.
The complexity of heavy mineral analyses means that only a small minority of specialist labs still focus on the method whilst fully taking into account all possible source of bias (e.g. hydraulic sorting effects on grain size, and chemical and mechanical weathering). Automated mineralogy is becoming increasingly popular but still brings its own set of problems. Just as traditional point-counting relies on the experience of the operator, automated mineralogy is highly dependent on the dictionaries used to calibrate the software. However, the reproducibility and the number of samples that can be run means that more data (some might say more noise) can be generated and more samples can be analysed in-situ, removing potential mineral separation bias.
Current academic provenance studies tend to focus more on robust single-grain geochronological techniques, whole-rock radiogenic isotopes or thermochronology. U-Pb zircon geochronology in particular continues to gain popularity when it comes to detrital provenance studies (Spencer et al., 2016).
As the popularity of zircon studies continues to rise, an increasing number of studies are also highlighting how the diagenesis of heavy mineral assemblages under burial can severely alter provenance signatures (e.g. Morton and Hallsworth, 2007; Milliken, 2007; Garzanti et al., 2010; Ando et al., 2012; Garzanti et al., 2018). Unstable minerals are rapidly leached out down-section whilst moderately stable minerals increase their relative abundance, giving a skewed representation of the original heavy mineral assemblages associated with a given source area.
“Interpretation of provenance using heavy-mineral data from sandstones likely to have suffered burial diagenesis must carefully consider the possibility that some heavy-mineral species have been eliminated through dissolution.” – Morton and Hallsworth, 2007.
Geoscientists in industry and academia alike are becoming more aware of this source of bias and are approaching the method with caution. Heavy mineral laboratories such as that at the University of Milano-Bicocca specialise in untangling the complexities of heavy mineral analyses whilst others incorporate the technique into studies using an integrated, multi-disciplinary approach.
Diagenesis aside, the method by which petrographic and heavy mineral data is arrived at has recently come under scrutiny. Dr István Dunkl from the University of Göttingen presented the findings of an Inter-laboratory Comparison for Heavy Mineral Analysis at this year’s Working Group on Sediment Generation (WGSG) in Dublin. The aim of the Heavy Mineral Round Robin was to find a common language when reporting point-counted heavy mineral data. This required each participant to point count two synthetic heavy mineral mixtures to compare the identification of heavy mineral species and quantify their proportions. Trained operator and automated mineralogy techniques were both used as a comparison and found varying results, with the automated methods proving to be much more accurate and reproducible. Several explanations were discussed as to why this could be:
1. Counting statistics varied across all laboratories.
2. Aliquot separation techniques. Numerous methods were reported when describing the preparation of the aliquot.
3. Mineral identification vs operator experience. Mineral identification was inconsistent regardless of operator experience. In many cases, some operators did not detect all 8 mineral phases and occasionally added up to 5 or 6 phases which were not present in the sample at all.
The presentation was certainly an eye opener and I believe the intention is for the findings to be submitted to Sedimentary Geology this autumn. There was talk of a second phase of comparisons where it was suggested that the samples be pre-processed to reduce variation in results based on aliquot separation techniques.
This great study highlights issues not only within the method of heavy mineral point counting but also biases that may occur within other provenance techniques. It also reinforces the need for standardisation when it comes to recording heavy mineral point counts. Wouldn’t it be easier to compare like-for-like if counts were published as points as well as percentages? In the past I have attempted to amalgamate large provenance datasets and have found petrographic and heavy mineral counts to be the most difficult method to standardise (that and fission track!).
Let’s not forget that many heavy mineral studies do work well when specific problems are addressed with a through understanding of sources of bias (e.g. Kilhams et al., 2013; Morton and Milne, 2012). Heavy mineral analyses of the Clair Group (Morton and Milne, 2012) has been very successful and enables high resolution correlation between wells. This is likely due to factors such as no operator or laboratory variability, and a well understood reservoir, where heavy minerals are a proven discriminator.
There is no one ‘silver-bullet’ for provenance studies and the multi-disciplinary approach is key when it comes to accurately recording the evolution of a source-to-sink system.
“Detangling the various interacting factors controlling mineralogical and chemical compositional variability is a fundamental pre-requisite to improve decisively not only on our ability to unravel provenance, but also to understand much about climatic, hydraulic, and diagenetic processes.” Garzanti et al., 2010
For the purposes of this article I have focussed primarily on petrography and heavy mineral analyses. However, surely all other provenance techniques can also be subjected to this kind of bias and alteration? Perhaps the new LinkedIn “Source to Sink” Group could be used as a platform to discuss other sources of bias such as:
– The controls of mineral distribution on radiogenic isotope concentrations (e.g. Garcon et al., 2014).
Andò, S., Garzanti, E., Padoan, M. and Limonta, M., 2012. Corrosion of heavy minerals during weathering and diagenesis: a catalog for optical analysis. Sedimentary Geology, 280, pp.165-178.
Cascalho, J. and Fradique, C., 2007. The sources and hydraulic sorting of heavy minerals on the northern Portuguese continental margin. Developments in Sedimentology, 58, pp.75-110.
Fielding, L., Najman, Y., Millar, I., Butterworth, P., Ando, S., Padoan, M., Barfod, D. and Kneller, B., 2017. A detrital record of the Nile River and its catchment. Journal of the Geological Society, 174(2), pp.301-317.
Garçon, M., Chauvel, C., France-Lanord, C., Limonta, M. and Garzanti, E., 2014. Which minerals control the Nd–Hf–Sr–Pb isotopic compositions of river sediments? Chemical Geology, 364, pp.42-55.
Garzanti, E., Andò, S., Limonta, M., Fielding, L. and Najman, Y., 2018. Diagenetic control on mineralogical suites in sand, silt, and mud (Cenozoic Nile Delta): Implications for provenance reconstructions. Earth-Science Reviews, 185, pp.122-139.
Kilhams, B., Morton, A., Borella, R., Wilkins, A. and Hurst, A., 2013. Understanding the provenance and reservoir quality of the Sele Formation sandstones of the UK Central Graben utilizing detrital garnet suites. Geological Society, London, Special Publications, 386, pp.SP386-16.
Milliken, K.L., 2007. Provenance and diagenesis of heavy minerals, Cenozoic units of the northwestern Gulf of Mexico sedimentary basin. Developments in Sedimentology, 58, pp.247-261.
Morton, A.C. and Hallsworth, C., 2007. Stability of detrital heavy minerals during burial diagenesis. Developments in Sedimentology, 58, pp.215-245.
Morton, A. and Milne, A., 2012. Heavy mineral stratigraphic analysis on the Clair Field, UK, west of Shetlands: a unique real-time solution for red-bed correlation while drilling. Petroleum Geoscience, 18, pp.115-128.
Nesbitt, H.W., Young, G.M., McLennan, S.M. and Keays, R.R., 1996. Effects of chemical weathering and sorting on the petrogenesis of siliciclastic sediments, with implications for provenance studies. The Journal of Geology, 104(5), pp.525-542.
Sindowski, F.K.H., 1949. Results and problems of heavy mineral analysis in Germany; a review of sedimentary-petrological papers, 1936-1948. Journal of Sedimentary Research, 19(1), pp.3-25.
Spencer, C.J., Kirkland, C.L. and Taylor, R.J., 2016. Strategies towards statistically robust interpretations of in situ U–Pb zircon geochronology. Geoscience Frontiers, 7(4), pp.581-589.
Over the last few weeks we have been spending lots of time talking to clients about some of the datasets offered by Petryx. In a few cases we have been talking to sedimentologists and geoscientists who are experienced with manipulating and interrogating datasets like ours. However, even experienced geoscientists still have questions about how to get the most out of these datasets, particularly when doing source-to-sink work. This blog explains the approach we take at Petryx when puzzling out sediment routing and prediction of reservoir quality. We outline a generalised approach, followed by some of the outputs you would expect to see from a workflow like this.
Source to sink: Detailed study of processes and products relating to clastic sediments, from their origins in the hinterland, to the sedimentary basin.
Normally, the quickest route to good interpretation is through good scientific method. In source-to-sink questions this means observing the data, be that subsurface and/or hinterland data, and formulating a question: “where and when are the good reservoir sands being deposited”, or “does my block contain good quality sand?”. Maybe we have questions about seal capacity, or maybe a group of people have a bunch of questions which we could really do with some quantifiable answers to.
The next step is to build a hypothesis based on the data we have observed. This may mean integrating drainage-basin polygons with hinterland geology and geochemical data. We can take thermochronological data and further refine our drainage basins, or adjust the expected sediment generation based upon these inputs. Paleocurrent data help us to verify where clastic systems were flowing, and further refine the drainage story. We bring in any climatic data we have and use this along with total drainage length to quantify sediment breakdown, and therefore the cleaning potential of our sands. Plate models and paleogeographic interpretations can be significant, but throughout this process it is paramount to be able to segregate the interpretative inputs from the hard data, and weight accordingly. Once we have mashed together all these data and interpretations we can make a prediction: “Given what I have observed, I think that there is lots of good quality sand going from A to B at this time”. If we have enough data, and time to work with it, we might even quantify this hypothesis with some degree of certainty (see this great book for an explanation of why you should be sceptical of predictions that people won’t put a number on).
Now the fun bit! We take some data which we have held in reserve and test our hypotheses. A great game if you are asking a service provider to give you a bespoke source-to-sink interpretation is to keep a few petrographic analyses in reserve to test some of their predictions. We sometimes get a bit upset about not getting all the data, but it is a fantastic way of testing the veracity of the predictions being made. After all, it’s better to test our hypothesis before you drill the well, rather than afterwards.
Above:One output of a hypothetical source-to-sink workflow, with mineral composition predictions from given hinterland zones. Multiple datasets can be used to give a qualitative estimate of reservoir potential, and then mixed into sediment packages. Image credit: Petryx Ltd (CC BY-SA 4.0)
Now we can start to iterate. It is OK to get it wrong as long as we find out why. Often the predictions will need to be refined and improved to reflect new data. Once we have a model which we are happy with we can take our outputs and make some quantitative statements about the subsurface. Often a report or study will stop here and not fully translate a working model into useful quantitative data, as if it is enough to say, “We have a model which explains the data well”. Let’s take that hypothetical model and share our insights with our colleagues:
Thermochronological data suggest up to 6 km of exhumed lithosphere over our target interval.
Uplift rates around our target interval suggest a significant influx of clastic material into the area around our block (15 km3) derived from hinterland A (65% certainty).
Siliciclastic packages in package α are likely to be composed of 55% quartz, 25% plagioclase and 20% K-feldspar.
Package β, whilst relatively significant in the sediment pile is likely to be largely (85%) derived from hinterland C, so could be problematic. This may mean that where it is interleaved with package α it risks washing in diagenetic fluids.
Package γ appears to be a distal equivalent of package α, but also contains chemical sediments. It is likely to contain up to 6% chert.
Package ζ is a regional seal with an expected clay content of 30%.
Finally, we want to say loud and clear what the sensitivities and testable parts of the model are. So if someone finds some data which doesn’t fit, we can iterate our model further.
Paleoclimatic conditions indicate high potential for sand cleaning and improved reservoir quality. Lower weathering rates and stable climatic conditions will result in lower chemical abrasion and reduced sand cleaning.
Little compositional data was available for the upper part of our target zone, resulting on heavy reliance on UPb data, which can be blind to mafic hinterlands.
We would expect little or no metamorphic minerals in package α.
To quote statistician George Box, who we think had it about right: “All models are wrong, but some are useful”. Hopefully this idealised workflow can help you get closer to that useful model and know what these datasets should be able to tell us. If you are interested in source-to-sink modelling, or want to discuss more, drop a comment below or head over to a short survey which we will be promoting in the coming weeks. We think we have some great workflows, but your opinions can really help us make them more relevant to you.
Welcome to the third instalment of the Petryx Blog series! After @SamFielding confessed that his programming know-how and obsession with automation was fuelled by his laziness, I will be discussing how programming isn’t just a way to save time, but a fantastic tool when it comes to making sense of large unwieldy compilations of geoscience data.
Large, multi-technique datasets (notice how I didn’t use the term big data, such a wildly overused and little understood phrase in our book) are becoming more common throughout geoscience. This is partly due to the integration of an ever-increasing number of disciplines, but mostly due to the sheer amount of raw data available to us when carrying out analyses using several different techniques on a single project.
So why exactly do we need multi-technique studies? Sometimes a single dataset will give you the specific answer you require. However, if you have a small sample size or if you are faced with a particularly complex problem, you’ll likely need to incorporate data from other methods to meet these requirements. This is where a multi-technique approach can really add value.
Let’s take provenance studies for example. In my own work I have often found U-Pb zircon geochronology to be invaluable due to zircon’s robust nature and ubiquity regardless of tectonic setting. However, when I was faced with a reduced sample size, multiple episodes of recycling and the need to record multiple geothermal events, this method on its own was not enough. The same is true for other methods too, like heavy mineral analyses or fission track, which, while great under certain circumstances, can both prove insufficient depending on what information you are trying to glean from your data.
Using a multi-technique dataset can be an art form, but like any other art, its quality, not quantity, that counts. More is not necessarily better in all cases. Choosing which techniques to use and in which combination to use them requires careful consideration and knowledge of what each technique brings to the table; something we at Petryx pride ourselves on. When done right this delicate balancing act can reveal vital information and shed light on areas that would have been otherwise obscured.
Working on multi-disciplinary projects in both academia and industry over the past 10 years has exposed me to an ever-changing array of data generated from numerous different techniques. Dealing with this constant stream of datasets often means folders full of spreadsheets which must be organised and worked through before any useful analyses can take place or any meaningful conclusions can be drawn. Simply handling multi-technique data can prove a mammoth and intimidating task which gets in the way of what really needs doing.
Here at Petryx we want to make analysing data, and getting the answers you need, easier. Our Petryx Database provides an extensive, ever-growing, compilation of data which can be easily accessed and queried from the Petryx Data Lens web front-end. Designed and developed by geoscientists and data science experts, the Database and Data Lens provide valuable tools which aim to minimise the amount of time our clients need to spend on data management. The Data Lens allows users to query, visualise, and pull together their own data alongside those from peer-reviewed datasets. Most importantly, our Data Lens isn’t a generic data handling platform, but is instead a specialist tool for geoscientists and the specific applications and problems they encounter.
In the early stages of any project, knowing what types of data are available for your area of interest can help you to plan more effectively. You may have a penchant for thermochronology, but if fission track data is looking a bit thin on the ground in your study area, and you have a limited budget, then you might want to consider either tapping into some other methods that have better coverage or enquiring about our data acquisition service.
Perhaps you are further along in your research and find yourself needing to compare the results of several different datasets to a multitude of signatures from potential source areas. Instead of laboriously plotting them individually using different tools, why not check out the different signatures on the fly or filter through thousands of rows of data in seconds to find similar samples that could be the source of your clastic reservoir at that crucial interval. Here at Petryx, we can help you do it. By having a small team of geoscientists and data scientists working closely together, we’ve created a product that takes the best bits from both worlds and puts you in control without all the hard work, allowing you to make the most out of your all-encompassing multi-technique dataset.
I have a confession to make. I’m lazy. Most of my working day is dedicated to finding neat ways of making my life easier. That can’t be a bad thing, right? Being lazy seems to go hand in hand with a desire to learn computer programming. And programming can certainly make your life easier, often by several orders of magnitude!
As Data Science Director at Petryx my mission is to lighten the load; to let you breeze through some of the everyday things that are necessary but, let’s face it, often stressful, repetitive, or dull. Humans have so much more to offer than being stuck doing the repetitive tasks a computer could easily perform. Also, life’s too short, and we want you to relax a little and enjoy it.
We want to help you work smart, not hard by:
Making the things you need to do easier and faster. Do boring but important things in a fraction of the time, then do something more important, or more fun, instead.
Helping the things you do to be more reproducible. Make it easy to perform exactly the same process over and over again.
Improving the usability and accuracy of results. Make your analyses more useful and allow them to stand the test of time.
At Petryx, we are striving to address some key problems faced by Oil & Gas and other data-reliant industries, starting with…
“Where is the data and why is it in such a mess?”
Our Petryx Database is designed from the ground up to be a single point of entry to cleaned and standardised data that was previously poorly connected and poorly constrained. These data can be mashed up from a mix of both non-exclusive and proprietary sources so can be used by academic institutions and businesses alike. The creation of the Petryx Database seeks to address one of the largest problems in many industries today by bringing our clients data that is ready for their needs and in a clean, easy-to-digest format.
The Petryx Data Lens web tool has been developed to let users more easily access, query, and analyse that data. It is a natural complement to the underlying database and provides access for users not wanting to query data programmatically (e.g. using SQL).
Simply having all of your data in the same place and in a ready-to-use format is a necessity before you can start automating any analysis or other downstream data-centric tasks. With the Petryx Database and Data Lens we aim to provide this essential but often overlooked platform.
Talking of automating tasks brings us to the next problem we want to address…
“I have to do that how many times!?”
Almost all the tasks we perform throughout the working day – moving data around, making a chart, writing a report, etc. – have a repetitive element which could be done more easily by a computer. To someone who isn’t tuned to thinking like a computer these programmable elements may not be readily apparent. However, not only could a computer perform any of these tasks for you, it can often perform them almost instantaneously.
Performing tasks manually has another downside – it’s not easily reproducible. If you want someone else to pick up where you left off, you’d likely write down some instructions – more manual work that we could do without! Writing code so that a computer can perform these tasks for you can almost completely eliminate this problem.
At Petryx we build products and offer services that make everyday tasks more efficient and more reproducible. We want to allow the computer to pick up some of the slack, removing headaches and bottlenecks from your workflows.
The philosophy of using computing power to take the burden off performing manual tasks is central to how we run our business, and to the skills we develop and promote. We use programming languages like R, Python and SQL to help make our workflows more efficient and more reproducible on a daily basis, and we want to help others do the same.
Finally, even if you’ve gathered all your data into the same place, and automated all of your repetitive processes, you’re going to want to analyse it. This brings us to the third problem…
“I don’t have a program that can do that”
Often, many of us don’t have the tools to get the most out of our data. For instance, if you don’t have access to the most appropriate statistical or charting techniques, and instead have to rely on analysing your data in spreadsheets, you are at a significant disadvantage. Even a simple statistical analysis (e.g. a t-test) is remarkably difficult using just a spreadsheet. Don’t get me wrong, spreadsheets are great for many things, but for doing data analysis they are only one step up from a pen, paper, and calculator. Having to use suboptimal methods can lead to erroneous or poorly defined conclusions. This can result not only in direct financial loss but may also force you to redo analyses. More time wasted doing things none of us should have to do!
As regular users of open source programming languages R & Python, Petryx has access to the cutting edge in statistical analysis, data manipulation, data visualisation, and machine learning techniques. For example, many statistical libraries are only available in R or Python. Having powerful tools at our disposal ensures any analysis and interpretation we do maximises the value we get out of our data. It also allows us to generate the most accurate reflection of reality possible with the data available.
Not only are we using tools like R to help us internally with the work we do for others, but we’re also always on the lookout for ways to help you make the most of these tools as well.
Over the last 10 years the Oil & Gas and related industries have certainly moved further down the road to enlightenment. Databases are now often used instead of storing data in files. Computing power is preferred over human power for mundane, repetitive tasks.
However, we still have a way to go.
Petryx are ready to help our partners take the next step in that journey. By using tools at the cutting edge of data storage, processing, visualisation and analysis, we want to give humans more time to do what they’re good at.
This week we launched Petryx Ltd, a new company with a new vision for digital services for the Oil & Gas industry. This blog post is my introduction to the company and an opportunity for me to articulate what we are doing. It will be followed in the coming days by blog posts from my fellow directors Laura Fielding and Sam Fielding who will be giving their perspectives on what we have to offer the geoscience domain.
After a challenging few years, many Oil & Gas companies are emerging from the downturn as smaller, more responsive organisations. Whereas before a team of dozens or hundreds of explorers would have worked up prospects across the globe; now just a few skilled geoscientists must assess where opportunities lie, and work to understand the subsurface. Whilst all of this has been happening, products offered by the service industry have moved ever closer to long-term subscription models. These streamlined enterprises face many challenges, and we at Petryx are here to meet them.
In order to do the job of exploration, operators therefore have to work differently, and whilst much of this change in pace and tactics must come from within, we believe that it is the role of service companies to reflect the new face of geoscience exploration in the solutions we deliver. A trend for episodic projects have taken precedent over multi-year programmes, and now more than ever our clients must answer their technical challenges in less time. As a company, Petryx offers a portfolio of innovative new data solutions and services, which, quite simply, help our clients to get to the answers they need faster.
In a few days’ time Geoscience Director at Petryx Laura Fielding will be discussing how multidisciplinary datasets are invaluable in cracking subsurface puzzles, and this is exactly what we will be helping our clients to do with our first commercial offering. We are designing a single, connected, multidisciplinary database of geoscience data, the Petryx Database. Hosted in the cloud, this database may be mirrored by our clients directly into their private data clouds, so they can leverage the data themselves; or, if they prefer, accessed through a built-for-purpose web application which allows the data to be queried and interpreted on the fly. Our clients will be able to cross-examine their own data directly against the Petryx Database, using the web application as an interpretative tool, or upload newly acquired datasets directly into their own instance of the database.
Perhaps most importantly, the web applications and interpretative tools offered with the Petryx Database are free, and the data in our database is offered as a one-off purchase rather than on a subscription basis; so that our clients have more control and are better able to manage their budgets, focusing resources on the areas which count. We understand the needs of our clients and believe that the tools we offer and the infrastructure for serving Petryx data is our responsibility, just like a restaurant is responsible for paying the rent and providing diners with a knife and fork. As long as our customers keep buying meals, we’ll keep our side of the bargain and keep the lights on.
Petryx Ltd, is a new company set up to meet the growing need for modern data science capabilities in the Oil & Gas industry. The company is based in North Wales in the UK and serves Oil & Gas operators worldwide.
Dr Lorin Davies, Managing Director of Petryx Ltd said: “The Oil & Gas industry has fundamentally changed since the heyday of the 1990’s and 00’s. Our customers are modern organisations who look to the service sector for innovative answers to their challenges. We launched Petryx to meet these challenges and to bring technologies developed for the internet software industry to the upstream industry.
The products and services in development at Petryx are truly exciting and will allow our clients to better understand their own data, as well as the answers buried in academic research.”
Petryx offers built-for-purpose data products and services. Petryx data products are designed to be light, responsive and perfectly optimised for clients’ needs. As scientists in the Oil & Gas industry tackle new hurdles with more marginal fields, in deeper water and with smaller teams, the skills and expertise offered by Petryx will help them to make better decisions.