48. Mix and match: The five pillars of data science and AI

There are five pillars of data science and AI. Three make up, in combination, the foundational disciplines – mathematics, statistics and computer science; the fourth is the data – ‘big data’ as it now is; and the fifth is a many-stranded pillar – domain knowledge. The mathematicians use data to calibrate and test models and theories; the statisticians also calibrate models and seek to infer findings from data; the computer scientists develop the intelligent infrastructure (cf Blog 47). Above all, the three combine in the development of machine learning – the heart of contemporary AI and its applications. Is this already a new discipline? Not yet, I suspect – not marked by undergraduate degrees in AI (unlike, say, biochemistry). These three disciplines can be thought of as enabling disciplines and this helps us to unpick the strands of the fifth pillar: both scientists and engineers are users, as are the applied domains such as  medicine, economics and finance, law, transport and so on. As the field develops, the AI and data science knowledge will be internalised in many of these areas – in part meeting the Mike Lynch challenge (see Blog 46) incorporating prior knowledge into machine learning.

Even this brief introduction demonstrates that we are in a relatively new interdisciplinary field. It is interesting to continue the exploration by connecting to previous drivers of interdisciplinarity – to see how these persist and ‘add’ to our agenda; and then to examine examples of new interdisciplinary challenges.

It has been argued in earlier posts that the concept of a system of interest drives interdisciplinarity and this is very much the case here in the domains for which the AI toolkit is now valuable. More recently, complexity science was an important driver with challenges articulated through Weaver’s notion of ‘systems of organised complexity’. This emphasises both the high dimensionality of systems of interest and the nonlinear dynamics which drives their evolution. There are challenges here for the applications of AI in various domains. Handling ‘big data’ also drives us towards high dimensionality. I once estimated the number of variables I would like to have to describe a city of a million people at a relatively coarse grain, and the answer came out as 1013! This raises new challenges for the topologists within mathematics: how to identify structures within the corresponding data sets – a very sophisticated form of clustering! These kinds of system can be described through conditional probability distributions again with large numbers of variables – high dimensional challenges for Bayesian statisticians. One way to proceed with mathematical models that are high dimensional and hence intractable is to run them as simulations. The outputs of these models can then be treated as ‘data’ and, to my knowledge, there is an as-yet untouched research challenge: to apply unsupervised machine learning algorithms to these outputs to identify structures in a high-dimensional nonlinear space.

We begin to reveal many research challenges across both foundational, and especially, applied domains. (In fact a conjecture is that the most interesting foundational challenges emerge from these domains?) We can then make another connection – to Brian’s Arthur’s argument in his book The nature of Technology. A discovery in one domain can, sometimes following a long period, be transferred into other domains: opportunities we should look out for.

Can we optimise how we do research in data science and AI? We have starting points in the ideas of systems analysis and complexity science: define a system of interest and recognise the challenges of complexity. Seek the data to contribute to scientific and applied challenges – not the other way round – and that will lead to new opportunities? But perhaps above all, seek to build teams which combine the skills of mathematics, statistics and computer science, integrated through both systems and methods foci. This is non-trivial, not least due to the shortage of these skills. In the projects in the Turing Institute funded by the UKRI Special Priorities Fund – AI for Science and Government (ASG) and Living with machines (LWM) – we are trying to do just this. Early days and yet to be tested. Watch this space!

Alan Wilson

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s