Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/33521
Full metadata record
DC FieldValueLanguage
dc.contributor.authorPuccio, B-
dc.contributor.authorCastagna, F-
dc.contributor.authorTucker, A-
dc.contributor.authorVeltri, P-
dc.date.accessioned2026-06-26T13:51:18Z-
dc.date.available2026-06-26T13:51:18Z-
dc.date.issued2026-06-03-
dc.identifier.citationPuccio, B. et al. (2026) 'The Goofy Game: an Approach to Medical AI Misalignment', Journal of Information and Intelligence, 0 (in press, pre-proof), pp. 1–13. doi: 10.1016/j.jiixd.2026.05.007.en_US
dc.identifier.issn2097-2849-
dc.identifier.urihttp://bura.brunel.ac.uk/handle/2438/33521-
dc.description.abstractWhile Large Language Models (LLMs) offer transformative potential across domains, often outperforming human benchmarks in various tasks, they remain vulnerable to exploitation by users aiming to override their safety protocols. Despite the progress achieved through red teaming methodologies in uncovering and mitigating such vulnerabilities, one notably persistent technique, referred to here as the “Goofy Game”, which leverages role-playing strategies, continues to bypass many existing safeguards. This technique can elicit unsafe responses from LLMs, which, although seemingly benign in isolation, could lead to severe consequences when deployed within high-stakes environments such as clinical decision-making or patient communication. In this study, we build on the insights from our previous exploratory experiments and analyse how a malicious user, even without technical knowledge of the internal architecture and parameters of generative AI models, could create a role-playing prompt that coerces a language model (LLM) into generating incorrect and potentially harmful clinical suggestions. Our objective is to elucidate a particular vulnerability scenario and provide insights that will contribute to future advancements in the development of secure and reliable AI systems.en_US
dc.description.sponsorshipon behalf of KeAi Communications Co. Ltd.en_US
dc.format.extent1 - 13-
dc.languageEnglish-
dc.language.isoen_USen_US
dc.publisherElsevieren_US
dc.subjectjailbreaken_US
dc.subjectlarge language modelsen_US
dc.subjecthealthcareen_US
dc.subjectmisalignmenten_US
dc.subjectrole-playingen_US
dc.titleThe Goofy Game: an Approach to Medical AI Misalignmenten_US
dc.typeArticleen_US
dc.date.dateAccepted2026-05-20-
dc.identifier.doihttp://dx.doi.org/10.1016/j.jiixd.2026.05.007-
dc.relation.isPartOfJournal of Information and Intelligence-
pubs.issuein press, pre-proof-
pubs.publication-statusPublished-
pubs.volume0-
dc.identifier.eissn2949-7159-
dcterms.dateAccepted2026-05-20-
Appears in Collections:Department of Computer Science Research Papers

Files in This Item:
File Description SizeFormat 
FullText.pdfCopyright © 2026 The Authors. Publishing services by Elsevier B.V. on behalf of KeAi Communications Co. Ltd. This is an open access article under a Creative Commons license (https://creativecommons.org/licenses/by/4.0/).1.46 MBAdobe PDFView/Open


Items in BURA are protected by copyright, with all rights reserved, unless otherwise indicated.