Please use this identifier to cite or link to this item:
http://bura.brunel.ac.uk/handle/2438/29014
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Kirchenbauer, J | - |
dc.contributor.author | Geiping, J | - |
dc.contributor.author | Wen, Y | - |
dc.contributor.author | Shu, M | - |
dc.contributor.author | Saifullah, K | - |
dc.contributor.author | Kong, K | - |
dc.contributor.author | Fernando, K | - |
dc.contributor.author | Saha, A | - |
dc.contributor.author | Goldblum, M | - |
dc.contributor.author | Goldstein, T | - |
dc.date.accessioned | 2024-05-15T16:20:26Z | - |
dc.date.available | 2024-05-15T16:20:26Z | - |
dc.date.issued | 2024-05-07 | - |
dc.identifier | ORCiD: Kasun Fernando https://orcid.org/0000-0003-1489-9566 | - |
dc.identifier | arXiv:2306.04634v4 [cs.LG] | - |
dc.identifier.citation | Kirchenbauer, J. et al. (2024) 'On the Reliability of Watermarks for Large Language Models', Proceedings of the Twelfth International Conference on Learning Representations, Vienna, Austria, 7- 11 May, pp. 1 - 9. doi: 10.48550/arXiv.2306.04634 [Available at: https://arxiv.org/abs/2306.04634v4 (Accessed: 15 May 2024)]. | en_US |
dc.identifier.uri | https://bura.brunel.ac.uk/handle/2438/29014 | - |
dc.description | This is the accepted version of the conference paper archived online at arXiv:2306.04634v4 [cs.LG], https://arxiv.org/abs/2306.04634v4. Comments: 9 pages in the main body. Published at ICLR 2024 (https://iclr.cc/virtual/2024/poster/19147). Code is available at https://github.com/jwkirchenbauer/lm-watermarking | en_US |
dc.description.abstract | As LLMs become commonplace, machine-generated text has the potential to flood the internet with spam, social media bots, and valueless content. Watermarking is a simple and effective strategy for mitigating such harms by enabling the detection and documentation of LLM-generated text. Yet a crucial question remains: How reliable is watermarking in realistic settings in the wild? There, watermarked text may be modified to suit a user's needs, or entirely rewritten to avoid detection. We study the robustness of watermarked text after it is re-written by humans, paraphrased by a non-watermarked LLM, or mixed into a longer hand-written document. We find that watermarks remain detectable even after human and machine paraphrasing. While these attacks dilute the strength of the watermark, paraphrases are statistically likely to leak n-grams or even longer fragments of the original text, resulting in high-confidence detections when enough tokens are observed. For example, after strong human paraphrasing the watermark is detectable after observing 800 tokens on average, when setting a 1e-5 false positive rate. We also consider a range of new detection schemes that are sensitive to short spans of watermarked text embedded inside a large document, and we compare the robustness of watermarking to other kinds of detectors. | en_US |
dc.description.sponsorship | This work was made possible by the ONR MURI program, DARPA GARD (HR00112020007), the Office of Naval Research (N000142112557), and the AFOSR MURI program. Commercial support was provided by Capital One Bank, the Amazon Research Award program, and Open Philanthropy. Further support was provided by the National Science Foundation (IIS-2212182), and by the NSF TRAILS Institute (2229885). | en_US |
dc.format.extent | 1 - 9 + appendices | - |
dc.format.medium | Electronic | - |
dc.language.iso | en | en_US |
dc.publisher | ICLR | en_US |
dc.relation.uri | http://arxiv.org/abs/2306.04634v4 | - |
dc.relation.uri | https://github.com/jwkirchenbauer/lm-watermarking | - |
dc.relation.uri | https://iclr.cc/virtual/2024/poster/19147 | - |
dc.rights | Copyright © 2024 The Authors. the submitter granted the following license to arXiv.org on submission of an article: I grant arXiv.org a perpetual, non-exclusive license to distribute this article. I certify that I have the right to grant this license. I understand that submissions cannot be completely removed once accepted. I understand that arXiv.org reserves the right to reclassify or reject any submission. | - |
dc.rights.uri | https://arxiv.org/licenses/nonexclusive-distrib/1.0/ | - |
dc.subject | machine learning cs.LG | en_US |
dc.subject | computation and language cs.CL | en_US |
dc.subject | cryptography and security cs.CR | en_US |
dc.title | On the Reliability of Watermarks for Large Language Models | en_US |
dc.type | Conference Paper | en_US |
dc.identifier.doi | https://doi.org/10.48550/arXiv.2306.04634 | - |
pubs.notes | 9 pages in the main body. Published at ICLR 2024. Code is available at https://github.com/jwkirchenbauer/lm-watermarking | - |
dc.identifier.eissn | 2331-8422 | - |
dc.rights.holder | The Authors | - |
Appears in Collections: | Dept of Mathematics Research Papers |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
FullText.pdf | Copyright © 2024 The Authors. the submitter granted the following license to arXiv.org on submission of an article: I grant arXiv.org a perpetual, non-exclusive license to distribute this article. I certify that I have the right to grant this license. I understand that submissions cannot be completely removed once accepted. I understand that arXiv.org reserves the right to reclassify or reject any submission. | 3.04 MB | Adobe PDF | View/Open |
Items in BURA are protected by copyright, with all rights reserved, unless otherwise indicated.