CfAA Director responds to review of AI safety practices
The latest findings from the Future of Life Institute’s review into the safety practices of six leading AI developers showed low scores across the board. Our Director, Professor John McDermid explains why this result is not unexpected and what we need to consider to enable the safe deployment of Frontier AI models.
At the Seoul Summit in May 2024 sixteen companies signed-up to Frontier AI Safety Commitments (FAISC). Since then, several other companies have expressed their commitments, and a conference was held at the University of California at Berkeley in November this year, under the leadership of the UK AI Safety Institute (AISI), on progress against those commitments. The conference included presentations on the frameworks the major AI suppliers are developing to meet their commitments.
The review evaluating the safety practices uses a US Grade Point Average (GPA) scheme. No company scored higher than a C on the review with all the safety frameworks scoring D+ or below.
Based on the discussions in Berkeley, these findings are not surprising. The Frontier AI models are very complex, and the frameworks emphasise “post development testing” not “design for safety” so it is hard to see how they could achieve high scores. But that is to oversimplify, and we need to consider two distinct cases when developing and assessing AI safety practices.
Where the Frontier models are used on-line, as general problem-solving tools, then the only realistic approach is to evaluate them for broad harms, e.g. whether they produce offensive outputs, and to guard against that. The UK AISI (along with sister organisations internationally) is rightly focusing on such issues including seeking to identify a better “science of evaluations”, to underpin these frameworks. It may be that experience from traditional safety engineering on how to “design for safety” would be relevant here, but this is a research issue, and it would likely need collaboration between Frontier AI model developers and experts in the adaptation of safety engineering to complex systems to see whether this is a viable approach. In the interim, care should be taken in the use of such models for many reasons, including the fact that they can be highly confident in their outputs, even where they are wrong.
For embodied AI, i.e. the use of AI in, or controlling, physical systems, where behaviour or malfunction could lead to injury or loss of life, a stronger approach is needed. Pragmatically, for use of Frontier AI in regulated industries, e.g. aviation or self-driving vehicles, the approach needs to be aligned with established assurance processes. For example, traditional safety engineering uses layers of protection for critical capabilities so that no single failure can give rise to serious adverse consequence, e.g. loss of life. Experience in safety engineering has also led to defining hierarchies of controls where the stronger mechanisms, e.g. verifiable engineered controls, are used first, backed up by weaker mechanisms, e.g. human monitoring, if the primary controls are not deemed sufficiently capable by themselves. By providing sufficient layers of protection (that do not themselves rely on Frontier AI) then the benefit of such models can be achieved whilst also containing the risks.
At the Berkeley FAISC conference we presented such an approach to so-called AI Guardrails which we think is a step towards enabling Frontier AI to be used in safety-related applications. A practical example of such an approach is that being developed by SAIF Systems, who have demonstrated the use of conventionally certifiable risk controls, e.g. to avoid no-fly zones, for unmanned aerial vehicles (UAVs) controlled using large language models (LLMs).
In summary, the challenges of evaluating safety when using Frontier AI models shouldn’t be a surprise, nor should it be underestimated. The work of bodies such as the AISI is to be commended, but it is vital to strengthen the underlying “science of evaluation” both to improve evaluations and to give a better understanding of the strengths and limitations of any evaluations. Where systems can pose risk to human life and health, then experience from traditional safety critical industries needs to be brought to bear – albeit with adaptation to work with Frontier AI. This is one of the areas that the Centre for Assuring Autonomy will be working on in 2025 and beyond.