There are
five pillars of data science and AI. Three make up, in combination, the
foundational disciplines – mathematics, statistics and computer science; the
fourth is the data – ‘big data’ as it now is; and the fifth is a many-stranded
pillar – domain knowledge. The mathematicians use data to calibrate and test
models and theories; the statisticians also calibrate models and seek to infer
findings from data; the computer scientists develop the intelligent
infrastructure (cf Blog 47). Above all, the three combine in the development of
machine learning – the heart of contemporary AI and its applications. Is this
already a new discipline? Not yet, I suspect – not marked by undergraduate
degrees in AI (unlike, say, biochemistry). These three disciplines can be
thought of as *enabling* disciplines and this helps us to unpick the
strands of the fifth pillar: both scientists and engineers are users, as are
the applied domains such as medicine,
economics and finance, law, transport and so on. As the field develops, the AI
and data science knowledge will be internalised in many of these areas – in
part meeting the Mike Lynch challenge (see Blog 46) incorporating prior
knowledge into machine learning.

Even this brief introduction demonstrates that we are in a relatively new interdisciplinary field. It is interesting to continue the exploration by connecting to previous drivers of interdisciplinarity – to see how these persist and ‘add’ to our agenda; and then to examine examples of new interdisciplinary challenges.

It has been
argued in earlier posts that the concept of a *system of interest* drives
interdisciplinarity and this is very much the case here in the domains for
which the AI toolkit is now valuable. More recently, *complexity science*
was an important driver with challenges articulated through Weaver’s notion of
‘systems of organised complexity’. This emphasises both the high dimensionality
of systems of interest and the nonlinear dynamics which drives their evolution.
There are challenges here for the applications of AI in various domains.
Handling ‘big data’ also drives us towards high dimensionality. I once
estimated the number of variables I would like to have to describe a city of a
million people at a relatively coarse grain, and the answer came out as 10^{13}!
This raises new challenges for the topologists within mathematics: how to
identify structures within the corresponding data sets – a very sophisticated
form of clustering! These kinds of system can be described through conditional
probability distributions again with large numbers of variables – high
dimensional challenges for Bayesian statisticians. One way to proceed with
mathematical models that are high dimensional and hence intractable is to run
them as simulations. The outputs of these models can then be treated as ‘data’
and, to my knowledge, there is an as-yet untouched research challenge: to apply
unsupervised machine learning algorithms to these outputs to identify structures
in a high-dimensional nonlinear space.

We begin to
reveal many research challenges across both foundational, and especially,
applied domains. (In fact a conjecture is that the most interesting
foundational challenges emerge from these domains?) We can then make another
connection – to Brian’s Arthur’s argument in his book *The nature of
Technology*. A discovery in one domain can, sometimes following a long
period, be transferred into other domains: opportunities we should look out
for.

Can we
optimise how we do research in data science and AI? We have starting points in
the ideas of systems analysis and complexity science: define a system of
interest and recognise the challenges of complexity. Seek the data to
contribute to scientific and applied challenges – not the other way round – and
that will lead to new opportunities? But perhaps above all, seek to build teams
which combine the skills of mathematics, statistics and computer science,
integrated through both systems and methods foci. This is non-trivial, not least
due to the shortage of these skills. In the projects in the Turing Institute
funded by the UKRI Special Priorities Fund – *AI for Science and Government*
(ASG) and *Living with machines* (LWM) – we are trying to do just this.
Early days and yet to be tested. Watch this space!

Alan Wilson