Impact of Multi Agent Learning in AI Research at DeepMind

By andy

Last Friday, we attended the most insightful Turing lecture so far, delivered by Thore Graepel, Research Lead at Google DeepMind and Professor of Computer Science, UCL.

Here are some step-by-step key themes we took away and our opinions on the future of AI governance.


If we want AI systems to have generality (that is, modularised and/or standardised) we need to increase the weighting of simple intelligence requirements. In other words – deliver the simpler tasks first before tackling the more complex ones. I read so many articles that appear to focus on the latter, and therefore emphasise the overwhelming challenges.


While universal intelligence is a by-product of machine learning, how can we use multi-agent architecture to link AI application governance so that there is an agreed framework for standardisation and the legal communities?

Having these in place will give much needed maturity to what is an unstructured ecosystem. The problem of diversity of universal intelligence refers to the difficulty in designing business models for AI due to a multitude and complexity of applications in a connected world without commonly accepted or emerging standards.

This creates significant challenges such as: –

·     Global goals don’t support local actions or business models;

·     Incentivisation to adopt AI applications is limited; and

·     Learning while learning doesn’t produce results that rival what humans can deliver.

Value & Policy Networks

AI does not happen in isolation. It is a cumulative culture where we enable it to compete, co-operate and co-ordinate. In unstructured environments, it sets itself an automatic curriculum of ever more challenging tasks through value and policy networks.

If we can train value networks in AI to reduce down the possible options to get to a certain position or decision, this can significantly contribute to the success of the respective application. If a policy network is established, then we can train AI according to change probabilities (policies defined by humans).

By intuitively reducing the depths of options and searches (values) and reducing breadth with policy networks, we can potentially deliver a more effective neural network training pipeline (i.e. learning to learn).


Value and policy networks can help us to define AI safety and ethical frameworks, so let’s invest in research to further develop these principles that support the definition of generic AI planning architecture. Safety and ethics will inevitably enable more powerful technology that can transform and improve within a global governance and legal framework, especially where we (and AI) need to challenge and prevent actions that pose a risk to cyber security.