The Goofy Game: an Approach to Medical AI Misalignment

Puccio, B; Castagna, F; Tucker, A; Veltri, P

Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/33521

Full metadata record

DC Field	Value	Language
dc.contributor.author	Puccio, B	-
dc.contributor.author	Castagna, F	-
dc.contributor.author	Tucker, A	-
dc.contributor.author	Veltri, P	-
dc.date.accessioned	2026-06-26T13:51:18Z	-
dc.date.available	2026-06-26T13:51:18Z	-
dc.date.issued	2026-06-03	-
dc.identifier.citation	Puccio, B. et al. (2026) 'The Goofy Game: an Approach to Medical AI Misalignment', Journal of Information and Intelligence, 0 (in press, pre-proof), pp. 1–13. doi: 10.1016/j.jiixd.2026.05.007.	en_US
dc.identifier.issn	2097-2849	-
dc.identifier.uri	http://bura.brunel.ac.uk/handle/2438/33521	-
dc.description.abstract	While Large Language Models (LLMs) offer transformative potential across domains, often outperforming human benchmarks in various tasks, they remain vulnerable to exploitation by users aiming to override their safety protocols. Despite the progress achieved through red teaming methodologies in uncovering and mitigating such vulnerabilities, one notably persistent technique, referred to here as the “Goofy Game”, which leverages role-playing strategies, continues to bypass many existing safeguards. This technique can elicit unsafe responses from LLMs, which, although seemingly benign in isolation, could lead to severe consequences when deployed within high-stakes environments such as clinical decision-making or patient communication. In this study, we build on the insights from our previous exploratory experiments and analyse how a malicious user, even without technical knowledge of the internal architecture and parameters of generative AI models, could create a role-playing prompt that coerces a language model (LLM) into generating incorrect and potentially harmful clinical suggestions. Our objective is to elucidate a particular vulnerability scenario and provide insights that will contribute to future advancements in the development of secure and reliable AI systems.	en_US
dc.description.sponsorship	on behalf of KeAi Communications Co. Ltd.	en_US
dc.format.extent	1 - 13	-
dc.language	English	-
dc.language.iso	en_US	en_US
dc.publisher	Elsevier	en_US
dc.subject	jailbreak	en_US
dc.subject	large language models	en_US
dc.subject	healthcare	en_US
dc.subject	misalignment	en_US
dc.subject	role-playing	en_US
dc.title	The Goofy Game: an Approach to Medical AI Misalignment	en_US
dc.type	Article	en_US
dc.date.dateAccepted	2026-05-20	-
dc.identifier.doi	http://dx.doi.org/10.1016/j.jiixd.2026.05.007	-
dc.relation.isPartOf	Journal of Information and Intelligence	-
pubs.issue	in press, pre-proof	-
pubs.publication-status	Published	-
pubs.volume	0	-
dc.identifier.eissn	2949-7159	-
dcterms.dateAccepted	2026-05-20	-
Appears in Collections:	Department of Computer Science Research Papers

Files in This Item:

File	Description	Size	Format
FullText.pdf	Copyright © 2026 The Authors. Publishing services by Elsevier B.V. on behalf of KeAi Communications Co. Ltd. This is an open access article under a Creative Commons license (https://creativecommons.org/licenses/by/4.0/).	1.46 MB	Adobe PDF	View/Open

Show simple item record