Please use this identifier to cite or link to this item:
http://bura.brunel.ac.uk/handle/2438/33521Full metadata record
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Puccio, B | - |
| dc.contributor.author | Castagna, F | - |
| dc.contributor.author | Tucker, A | - |
| dc.contributor.author | Veltri, P | - |
| dc.date.accessioned | 2026-06-26T13:51:18Z | - |
| dc.date.available | 2026-06-26T13:51:18Z | - |
| dc.date.issued | 2026-06-03 | - |
| dc.identifier.citation | Puccio, B. et al. (2026) 'The Goofy Game: an Approach to Medical AI Misalignment', Journal of Information and Intelligence, 0 (in press, pre-proof), pp. 1–13. doi: 10.1016/j.jiixd.2026.05.007. | en_US |
| dc.identifier.issn | 2097-2849 | - |
| dc.identifier.uri | http://bura.brunel.ac.uk/handle/2438/33521 | - |
| dc.description.abstract | While Large Language Models (LLMs) offer transformative potential across domains, often outperforming human benchmarks in various tasks, they remain vulnerable to exploitation by users aiming to override their safety protocols. Despite the progress achieved through red teaming methodologies in uncovering and mitigating such vulnerabilities, one notably persistent technique, referred to here as the “Goofy Game”, which leverages role-playing strategies, continues to bypass many existing safeguards. This technique can elicit unsafe responses from LLMs, which, although seemingly benign in isolation, could lead to severe consequences when deployed within high-stakes environments such as clinical decision-making or patient communication. In this study, we build on the insights from our previous exploratory experiments and analyse how a malicious user, even without technical knowledge of the internal architecture and parameters of generative AI models, could create a role-playing prompt that coerces a language model (LLM) into generating incorrect and potentially harmful clinical suggestions. Our objective is to elucidate a particular vulnerability scenario and provide insights that will contribute to future advancements in the development of secure and reliable AI systems. | en_US |
| dc.description.sponsorship | on behalf of KeAi Communications Co. Ltd. | en_US |
| dc.format.extent | 1 - 13 | - |
| dc.language | English | - |
| dc.language.iso | en_US | en_US |
| dc.publisher | Elsevier | en_US |
| dc.subject | jailbreak | en_US |
| dc.subject | large language models | en_US |
| dc.subject | healthcare | en_US |
| dc.subject | misalignment | en_US |
| dc.subject | role-playing | en_US |
| dc.title | The Goofy Game: an Approach to Medical AI Misalignment | en_US |
| dc.type | Article | en_US |
| dc.date.dateAccepted | 2026-05-20 | - |
| dc.identifier.doi | http://dx.doi.org/10.1016/j.jiixd.2026.05.007 | - |
| dc.relation.isPartOf | Journal of Information and Intelligence | - |
| pubs.issue | in press, pre-proof | - |
| pubs.publication-status | Published | - |
| pubs.volume | 0 | - |
| dc.identifier.eissn | 2949-7159 | - |
| dcterms.dateAccepted | 2026-05-20 | - |
| Appears in Collections: | Department of Computer Science Research Papers | |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| FullText.pdf | Copyright © 2026 The Authors. Publishing services by Elsevier B.V. on behalf of KeAi Communications Co. Ltd. This is an open access article under a Creative Commons license (https://creativecommons.org/licenses/by/4.0/). | 1.46 MB | Adobe PDF | View/Open |
Items in BURA are protected by copyright, with all rights reserved, unless otherwise indicated.