FAIR data brings focus to industries drowning in complexity
Every year, process industry experts note the accelerating growth in their systems’ data and complexity. Much of this is due to the race to ‘get to market faster’ while building smarter, more optimized and multifaceted products and services.
To make sense of this growing intricacy, experts invest in industrial AI, the comprehensive Digital Twin and the metaverse. It is easy, however, to overlook the primary ingredient of these tools: FAIR data (findable, accessible, interoperable and repeatable data). Making this error is a bit like building ‘smart’ digital assets on pillars of salt and sand.

AI, digital twins and the metaverse need a solid foundation of FAIR data
First, it is important to stress that the tools discussed are of an industrial grade. They are not AI bots people ask to redraw photos in the style of a famous artist. Nor is this a metaverse using virtual reality (VR) headsets to walk fans through the castles of a fantasy novel. And the digital twin in question does not control something as simple as a hobby drone.
These are extraordinarily complex industrial tools built to run, simulate, model and optimize processes that affect billion-dollar budgets and millions of people. Thus, they require a lot of high-quality data as organizations build and train them to maintain high-risk industry systems.
When organizations build digital systems like AI, the comprehensive digital twin and the metaverse using bad, uncontextualized data, they are set up to make simple mistakes. And as the processes they oversee grow in complexity and collect more data, this only further elevates the risk.
For these tools to work as expected, and bring value, they must train, run, function and optimize using data that can be easily located, retrieved, used and validated by digital systems—hence FAIR data.
Why is FAIR data critical to production environments?
The purpose of FAIR data is to ensure digital systems can detect, acquire, use and reuse data with little to no human input. As the volume and complexity of data and systems continue to grow, FAIR data practices contextualize and supplement the added information in ways that facilitate long term data stewardship.
A 2016 paper in Nature’s Scientific Data[1] first outlined and codified FAIR data concepts. The document explained that for data to be FAIR it must adhere to the following principles:
- Findable: Whereas data and metadata are linked via persistent and globally unique identifiers. The metadata categorizes and contains the identifier of the data it describes. And where data and metadata are indexed and recorded within a searchable repository.
- Accessible: Whereas data and metadata are recoverable by their unique identifier via a communication protocol that is universally open, free and effective (while maintaining the ability to enact authorization and authentication procedures). And where metadata remains recoverable even if the data it describes is lost, confidential or restricted to certain individuals or systems.
- Interoperable: Whereas data and metadata are encoded via an application language that is formal, accessible, open, relevant and optimized for knowledge sharing. And where the data and metadata use terms that are compatible with FAIR data practices while also maintaining all qualified references, mentions and citations.
- Reusable/repeatable: Whereas data and metadata are adequately and accurately described via appropriate attributes, made available via clear and open data usage licenses, associated with appropriate background information and meet community standards for its given domain.
Following these guidelines, FAIR data does not just benefit machines; they also enable people to better track their systems. So, organizations with future aspirations in AI, the metaverse and the comprehensive digital twin can benefit immediately from adopting these processes. And for those that start today, they will make it easier to produce those digital tools; so much so that they might start adopting them far sooner than anticipated.
The concept of FAIR data has already proven itself so useful for large, complex, industrial applications that large organizations without industrial goals now adopt it to maintain vast data lakes. This includes global and government bodies like UNESCO, the European Commission, the NIH, Statistics Canada and more.
To learn more about FAIR data, and its role to produce AI models, the comprehensive digital twin and the industrial metaverse, listen to the podcast: The 5 best FAIR data secrets to empower process engineers.
[1] Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18


