AI safety research is increasingly global, yet much of it remains concentrated in a small number of institutions in the US and Europe. On 27 March, the Policy Innovation Lab and AI Safety South Africa co-hosted a workshop at Stellenbosch University to strengthen South Africa’s role in this space. The event brought together researchers to share work on AI safety, cooperative AI and AI governance.
The Policy Innovation Lab, based at Stellenbosch University, focuses on digital transformation and AI governance. AI Safety South Africa works on safer, more aligned AI systems and broader AI preparedness across the continent. With support from mentors at leading research groups such as the Stanford AI Lab and Google DeepMind, the organisation is currently hosting its cooperative AI fellows, who are studying the cooperative intelligence of advanced AI systems.
Participants from Stellenbosch University and AI Safety South Africa engaged with research from both organisations and took part in discussions throughout the afternoon. Dr Gray Manicom, researcher in digital transformation at the Policy Innovation Lab is exploring mechanistic interpretability for AI safety and compared its focus on causal insights with more traditional, correlation-based methods. He specifically investigates how AI models can be localised to African contexts using mechanistic interpretability techniques.
This presentation was followed by Prof Willem Fourie, chair of the Policy Innovation Lab and Isabel Ray, data science and policy innovation analyst, who introduced ASIF, a new theoretical framework for reducing value underspecification. Their research aims to shift AI alignment towards explicit, auditable governance by factorising reward models into interpretable moral components.
The remaining sessions provided a look at the diverse challenges facing the field of cooperative AI, including Omer Ebead’s work on adversarial exploitation in multi-agent systems and Yves Bicker’s investigation into pro-social dispositions using reinforcement learning. These stimulating presentations also addressed the technical and safety implications of delegating complex deliberation tasks to AI agents, as explored by Joseph Low and Oscar Duys. Further research, such as that presented by Akash Kundu, examined whether perceived similarity influences cooperative behaviour between models, while the wider group discussed improving the auditability and governance of large language models by forcing them to be morally explicit.
Perhaps the most successful part of the event was the vibrant energy felt in the room and buzzing discussions that occurred during breaks and after the workshop had ended. Workshops such as this one position the Policy Innovation Lab as a facilitator of useful, intellectually stimulating and collaborative discussions surrounding the safer development and use of AI systems. By fostering conversations between researchers across organisations, the Lab is contributing towards more governable, frontier technologies that remain aligned with human values and societal needs. Through these local partnerships and international mentorships, the initiative ensures that South Africa remains at the forefront of global AI safety discourse, bridging the gap between theoretical alignment research and practical policy implementation.
The post Policy Innovation Lab and AI Safety South Africa co-host AI safety and cooperative AI workshop appeared first on The Policy Innovation Lab.
