The Artificial Intelligence (AI) systems that have recently permeated our lives have a serious problem: they are built in a way that makes them very hard - and sometimes impossible - to understand or interpret. Luckily, our team is tackling this problem, and we’ve just published that covers the issue in detail.
It turns out that the lack of explainability in machine learning (ML) models, such as ChatGPT or Claude, comes from the way that the systems are built. Their underlying architecture (a neural network) lacks coherent structure. While neural networks can be trained to effectively solve certain tasks, the way they do it is largely (or, from a practical standpoint, almost wholly) inaccessible. This absence of interpretability in modern ML is increasingly a major concern in sensitive areas where accountability is required, such as in finance and the healthcare and pharmaceutical sectors. The “interpretability problem in AI” is therefore a topic of grave worry for large swathes of the corporate and enterprise sector, regulators, lawmakers, and the general public.
These concerns have given birth to the field of eXplainable AI, or XAI, which attempts to solve the interpretability problem through so-called ‘post-hoc’ techniques (where one takes a trained AI model and aims to give explanations for either its overall behavior or individual outputs). This approach, while still evolving, has its own issues due to the approximate nature and fundamental limitations of post-hoc techniques.
The second approach to the interpretability problem is to employ new ML models that are, by design, inherently interpretable from the start. Such an interpretable AI model comes with explicit structure which is meaningful to us “from the outside”. Realizing this in the tech we use every day means completely redesigning how machines learn - creating a new paradigm in AI. As Sean Tull, one of the authors of the paper, stated: “In the best case, such intrinsically interpretable models would no longer even require XAI methods, serving instead as their own explanation, and one of a deeper kind.”
At ҹɫֱ, we’re continuing work to develop new paradigms in AI while also working to sharpen theoretical and foundational tools that allow us all to assess the interpretability of a given model. In , we present a new theoretical framework for both defining AI models and analyzing their interpretability. With this framework, we show how advantageous it is for an AI model to have explicit and meaningful compositional structure.
The idea of composition is explored in a rigorous way using a mathematical approach called “category theory”, which is a language that describes processes and their composition. The category theory approach to interpretability can be accomplished via a graphical calculus which was also developed in part by ҹɫֱ scientists, and which is finding use cases in everything from gravity to quantum computing.
A fundamental problem in the field of XAI has been that many terms have not been rigorously defined, making it difficult to study - let alone discuss - interpretability in AI. Our paper presents the first known theoretical framework for assessing the compositional interpretability of AI models. With our team’s work, we now have a precise and mathematically defined definition of interpretability that allows us to have these critical conversations.
After developing the framework, our team used it to analyze the full spectrum of ML approaches. We started with Transformers (the “T” in ChatGPT), which are not interpretable – pointing to a serious issue in some of the world’s most widely used ML tools. This is in contrast with (sparse) linear models and decision trees, which we found are indeed inherently interpretable, as they are usually described.
Our team was also able to make precise how other ML models were what they call 'compositionally interpretable'. These include models already studied by our own scientists including models, causal models, and .
Many of the models discussed in this paper are classical, but more broadly the use of category theory and string diagrams makes these tools very well suited to analyzing quantum models for machine learning. In addition to helping the broader field accurately assess the interpretability of various ML models, the seminal work in this paper will help us to develop systems that are interpretable by design.
This work is part of our broader AI strategy, which includes , and – in this case - using the tools of category theory and compositionality to help us better understand AI.