Creating replicas of resources and processes has always made sense. Architects create cardboard models of buildings before they build them. Back in the Apollo-mission days NASA had a replica of the lunar lander so they could simulate problems and fixes. Digital twins are just that and more. They create a tangible representation of resources, processes, and even people. They allow you to test scenarios (what happens if I exchange this component), run queries (find which machine parameters optimizes my yield), and for forecasting (what will the load be in my production line in half an hour). Many large companies are starting to adopt digital twins for parts of their processes but even the best adopters still have much room to improve. One of the big reasons for lack of adoption come from all the myths that have been built up over the last 20 years where both hype and poor technology have hampered progress.
Myth 1: I need a CAD model of my process
Many digital twins were born out of highly accurate simulations. Some of the pioneers in this area were the aerospace and car industries in the 90’s. Heavy super-computers could make simulations of fluid dynamics which could be used to minimize drag. Designers could visualize their designs etc. Fact remains however that most systems and processes are about input and output. This means that CAD simulations are not the only way to go. A data-driven approach often allows much more accurate simulations as data is based on real measurements as opposed to the limitations of physical models in CAD. Today even the aerospace and car industries are increasingly using data-driven approaches. One of the biggest success stories are modern flight simulators which are now purely based on data and do not use traditional model-based approaches. So the answer is clearly No, you don’t need CAD models and if you have gone down this route you might be heading down a blind alley.
Myth 2: I need an army of statisticians to model my process
Data-driven approaches have historically been heavy to implement. Traditionally you need good statisticians to model system components and make decisions about sensor noise models and which variables relate to which variables. Thanks to modern machine learning algorithms like deep learning this is no longer needed. How different process components work can now be efficiently modelled using historic data. Deep learning takes care of modelling system components which means that you concentrate your efforts on generating insights instead of being bogged down by multi-variate statistics. This means you get access to results faster, cheaper, and in a more flexible manner without having to maintain large amounts of code relating to your process.
Myth 3: Machine learning algorithms require large amounts of data
A common misconception is that you need bucket loads of data. Some machine learning algorithms have required large amounts of data to get a good solution. Modern methods like deep learning however can be made very robust with fairly small datasets from your process. It is maybe also worth pointing out that deep learning is not synonymous with big data. You don’t need big data infrastructure like hadoop to get started. Where it does overlap is that deep learning can handle arbitrarily large datasets but it is far from a requirement. We have successfully trained systems with only a few hundred data samples but have also handled huge gigabyte datasets with the same software.