The Goofy Game: an Approach to Medical AI Misalignment

Puccio, B; Castagna, F; Tucker, A; Veltri, P

Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/33521

Title:	The Goofy Game: an Approach to Medical AI Misalignment
Authors:	Puccio, B Castagna, F Tucker, A Veltri, P
Keywords:	jailbreak;large language models;healthcare;misalignment;role-playing
Issue Date:	3-Jun-2026
Publisher:	Elsevier
Citation:	Puccio, B. et al. (2026) 'The Goofy Game: an Approach to Medical AI Misalignment', Journal of Information and Intelligence, 0 (in press, pre-proof), pp. 1–13. doi: 10.1016/j.jiixd.2026.05.007.
Abstract:	While Large Language Models (LLMs) offer transformative potential across domains, often outperforming human benchmarks in various tasks, they remain vulnerable to exploitation by users aiming to override their safety protocols. Despite the progress achieved through red teaming methodologies in uncovering and mitigating such vulnerabilities, one notably persistent technique, referred to here as the “Goofy Game”, which leverages role-playing strategies, continues to bypass many existing safeguards. This technique can elicit unsafe responses from LLMs, which, although seemingly benign in isolation, could lead to severe consequences when deployed within high-stakes environments such as clinical decision-making or patient communication. In this study, we build on the insights from our previous exploratory experiments and analyse how a malicious user, even without technical knowledge of the internal architecture and parameters of generative AI models, could create a role-playing prompt that coerces a language model (LLM) into generating incorrect and potentially harmful clinical suggestions. Our objective is to elucidate a particular vulnerability scenario and provide insights that will contribute to future advancements in the development of secure and reliable AI systems.
URI:	http://bura.brunel.ac.uk/handle/2438/33521
DOI:	http://dx.doi.org/10.1016/j.jiixd.2026.05.007
ISSN:	2097-2849
Appears in Collections:	Department of Computer Science Research Papers

Files in This Item:

File	Description	Size	Format
FullText.pdf	Copyright © 2026 The Authors. Publishing services by Elsevier B.V. on behalf of KeAi Communications Co. Ltd. This is an open access article under a Creative Commons license (https://creativecommons.org/licenses/by/4.0/).	1.46 MB	Adobe PDF	View/Open

Show full item record