Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/29014
Full metadata record
DC FieldValueLanguage
dc.contributor.authorKirchenbauer, J-
dc.contributor.authorGeiping, J-
dc.contributor.authorWen, Y-
dc.contributor.authorShu, M-
dc.contributor.authorSaifullah, K-
dc.contributor.authorKong, K-
dc.contributor.authorFernando, K-
dc.contributor.authorSaha, A-
dc.contributor.authorGoldblum, M-
dc.contributor.authorGoldstein, T-
dc.date.accessioned2024-05-15T16:20:26Z-
dc.date.available2024-05-15T16:20:26Z-
dc.date.issued2024-05-07-
dc.identifierORCiD: Kasun Fernando https://orcid.org/0000-0003-1489-9566-
dc.identifierarXiv:2306.04634v4 [cs.LG]-
dc.identifier.citationKirchenbauer, J. et al. (2024) 'On the Reliability of Watermarks for Large Language Models', Proceedings of the Twelfth International Conference on Learning Representations, Vienna, Austria, 7- 11 May, pp. 1 - 9. doi: 10.48550/arXiv.2306.04634 [Available at: https://arxiv.org/abs/2306.04634v4 (Accessed: 15 May 2024)].en_US
dc.identifier.urihttps://bura.brunel.ac.uk/handle/2438/29014-
dc.descriptionThis is the accepted version of the conference paper archived online at arXiv:2306.04634v4 [cs.LG], https://arxiv.org/abs/2306.04634v4. Comments: 9 pages in the main body. Published at ICLR 2024 (https://iclr.cc/virtual/2024/poster/19147). Code is available at https://github.com/jwkirchenbauer/lm-watermarkingen_US
dc.description.abstractAs LLMs become commonplace, machine-generated text has the potential to flood the internet with spam, social media bots, and valueless content. Watermarking is a simple and effective strategy for mitigating such harms by enabling the detection and documentation of LLM-generated text. Yet a crucial question remains: How reliable is watermarking in realistic settings in the wild? There, watermarked text may be modified to suit a user's needs, or entirely rewritten to avoid detection. We study the robustness of watermarked text after it is re-written by humans, paraphrased by a non-watermarked LLM, or mixed into a longer hand-written document. We find that watermarks remain detectable even after human and machine paraphrasing. While these attacks dilute the strength of the watermark, paraphrases are statistically likely to leak n-grams or even longer fragments of the original text, resulting in high-confidence detections when enough tokens are observed. For example, after strong human paraphrasing the watermark is detectable after observing 800 tokens on average, when setting a 1e-5 false positive rate. We also consider a range of new detection schemes that are sensitive to short spans of watermarked text embedded inside a large document, and we compare the robustness of watermarking to other kinds of detectors.en_US
dc.description.sponsorshipThis work was made possible by the ONR MURI program, DARPA GARD (HR00112020007), the Office of Naval Research (N000142112557), and the AFOSR MURI program. Commercial support was provided by Capital One Bank, the Amazon Research Award program, and Open Philanthropy. Further support was provided by the National Science Foundation (IIS-2212182), and by the NSF TRAILS Institute (2229885).en_US
dc.format.extent1 - 9 + appendices-
dc.format.mediumElectronic-
dc.language.isoenen_US
dc.publisherICLRen_US
dc.relation.urihttp://arxiv.org/abs/2306.04634v4-
dc.relation.urihttps://github.com/jwkirchenbauer/lm-watermarking-
dc.relation.urihttps://iclr.cc/virtual/2024/poster/19147-
dc.rightsCopyright © 2024 The Authors. the submitter granted the following license to arXiv.org on submission of an article: I grant arXiv.org a perpetual, non-exclusive license to distribute this article. I certify that I have the right to grant this license. I understand that submissions cannot be completely removed once accepted. I understand that arXiv.org reserves the right to reclassify or reject any submission.-
dc.rights.urihttps://arxiv.org/licenses/nonexclusive-distrib/1.0/-
dc.subjectmachine learning cs.LGen_US
dc.subjectcomputation and language cs.CLen_US
dc.subjectcryptography and security cs.CRen_US
dc.titleOn the Reliability of Watermarks for Large Language Modelsen_US
dc.typeConference Paperen_US
dc.identifier.doihttps://doi.org/10.48550/arXiv.2306.04634-
pubs.notes9 pages in the main body. Published at ICLR 2024. Code is available at https://github.com/jwkirchenbauer/lm-watermarking-
dc.identifier.eissn2331-8422-
dc.rights.holderThe Authors-
Appears in Collections:Dept of Mathematics Research Papers

Files in This Item:
File Description SizeFormat 
FullText.pdfCopyright © 2024 The Authors. the submitter granted the following license to arXiv.org on submission of an article: I grant arXiv.org a perpetual, non-exclusive license to distribute this article. I certify that I have the right to grant this license. I understand that submissions cannot be completely removed once accepted. I understand that arXiv.org reserves the right to reclassify or reject any submission.3.04 MBAdobe PDFView/Open


Items in BURA are protected by copyright, with all rights reserved, unless otherwise indicated.