On the Reliability of Watermarks for Large Language Models

Kirchenbauer, J; Geiping, J; Wen, Y; Shu, M; Saifullah, K; Kong, K; Fernando, K; Saha, A; Goldblum, M; Goldstein, T

Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/29014

Full metadata record

DC Field	Value	Language
dc.contributor.author	Kirchenbauer, J	-
dc.contributor.author	Geiping, J	-
dc.contributor.author	Wen, Y	-
dc.contributor.author	Shu, M	-
dc.contributor.author	Saifullah, K	-
dc.contributor.author	Kong, K	-
dc.contributor.author	Fernando, K	-
dc.contributor.author	Saha, A	-
dc.contributor.author	Goldblum, M	-
dc.contributor.author	Goldstein, T	-
dc.date.accessioned	2024-05-15T16:20:26Z	-
dc.date.available	2024-05-15T16:20:26Z	-
dc.date.issued	2024-05-07	-
dc.identifier	ORCiD: Kasun Fernando https://orcid.org/0000-0003-1489-9566	-
dc.identifier	arXiv:2306.04634v4 [cs.LG]	-
dc.identifier.citation	Kirchenbauer, J. et al. (2024) 'On the Reliability of Watermarks for Large Language Models', Proceedings of the Twelfth International Conference on Learning Representations, Vienna, Austria, 7- 11 May, pp. 1 - 9. doi: 10.48550/arXiv.2306.04634 [Available at: https://arxiv.org/abs/2306.04634v4 (Accessed: 15 May 2024)].	en_US
dc.identifier.uri	https://bura.brunel.ac.uk/handle/2438/29014	-
dc.description	This is the accepted version of the conference paper archived online at arXiv:2306.04634v4 [cs.LG], https://arxiv.org/abs/2306.04634v4. Comments: 9 pages in the main body. Published at ICLR 2024 (https://iclr.cc/virtual/2024/poster/19147). Code is available at https://github.com/jwkirchenbauer/lm-watermarking	en_US
dc.description.abstract	As LLMs become commonplace, machine-generated text has the potential to flood the internet with spam, social media bots, and valueless content. Watermarking is a simple and effective strategy for mitigating such harms by enabling the detection and documentation of LLM-generated text. Yet a crucial question remains: How reliable is watermarking in realistic settings in the wild? There, watermarked text may be modified to suit a user's needs, or entirely rewritten to avoid detection. We study the robustness of watermarked text after it is re-written by humans, paraphrased by a non-watermarked LLM, or mixed into a longer hand-written document. We find that watermarks remain detectable even after human and machine paraphrasing. While these attacks dilute the strength of the watermark, paraphrases are statistically likely to leak n-grams or even longer fragments of the original text, resulting in high-confidence detections when enough tokens are observed. For example, after strong human paraphrasing the watermark is detectable after observing 800 tokens on average, when setting a 1e-5 false positive rate. We also consider a range of new detection schemes that are sensitive to short spans of watermarked text embedded inside a large document, and we compare the robustness of watermarking to other kinds of detectors.	en_US
dc.description.sponsorship	This work was made possible by the ONR MURI program, DARPA GARD (HR00112020007), the Office of Naval Research (N000142112557), and the AFOSR MURI program. Commercial support was provided by Capital One Bank, the Amazon Research Award program, and Open Philanthropy. Further support was provided by the National Science Foundation (IIS-2212182), and by the NSF TRAILS Institute (2229885).	en_US
dc.format.extent	1 - 9 + appendices	-
dc.format.medium	Electronic	-
dc.language.iso	en	en_US
dc.publisher	ICLR	en_US
dc.relation.uri	http://arxiv.org/abs/2306.04634v4	-
dc.relation.uri	https://github.com/jwkirchenbauer/lm-watermarking	-
dc.relation.uri	https://iclr.cc/virtual/2024/poster/19147	-
dc.rights	Copyright © 2024 The Authors. the submitter granted the following license to arXiv.org on submission of an article: I grant arXiv.org a perpetual, non-exclusive license to distribute this article. I certify that I have the right to grant this license. I understand that submissions cannot be completely removed once accepted. I understand that arXiv.org reserves the right to reclassify or reject any submission.	-
dc.rights.uri	https://arxiv.org/licenses/nonexclusive-distrib/1.0/	-
dc.subject	machine learning cs.LG	en_US
dc.subject	computation and language cs.CL	en_US
dc.subject	cryptography and security cs.CR	en_US
dc.title	On the Reliability of Watermarks for Large Language Models	en_US
dc.type	Conference Paper	en_US
dc.identifier.doi	https://doi.org/10.48550/arXiv.2306.04634	-
pubs.notes	9 pages in the main body. Published at ICLR 2024. Code is available at https://github.com/jwkirchenbauer/lm-watermarking	-
dc.identifier.eissn	2331-8422	-
dc.rights.holder	The Authors	-
Appears in Collections:	Dept of Mathematics Research Papers

Files in This Item:

File	Description	Size	Format
FullText.pdf	Copyright © 2024 The Authors. the submitter granted the following license to arXiv.org on submission of an article: I grant arXiv.org a perpetual, non-exclusive license to distribute this article. I certify that I have the right to grant this license. I understand that submissions cannot be completely removed once accepted. I understand that arXiv.org reserves the right to reclassify or reject any submission.	3.04 MB	Adobe PDF	View/Open

Show simple item record