☢️ Watermarking Makes Language Models Radioactive ☢️

This repository contains the code and additional resources related to the paper published NeurIPS 2024 (spotlight). For more detailed information, please refer to:

[arXiv] [Webpage]

Code

The code is adapted from the repository: https://github.com/facebookresearch/three_bricks which is licensed under the Creative Commons Attribution-NonCommercial 4.0 International Public License.

Requirements

We use cuda 11.7. Please adapt the following to get the appropriate pytorch version.

conda create -n "radioactive_watermark" python=3.8
conda activate radioactive_watermark
conda install pytorch pytorch-cuda=11.7 -c pytorch -c nvidia
pip install -r requirements.txt

In the following, "<your_model_path>" corresponds the path of the model you want to test radioactivity on. You can for instance put the path to a Llama model. The seed is used when generating text in combination with the hashing key: two different seeds five different watermarking schemes.

Data and models

A subset of watermarked instruction/answer pairs with (Kirchenbauer et al.), watermark window size 2, $$\delta$$=3 and $$\gamma$$ = 0.25, seed = 3 can be found in data/maryland_ngram2_seed3.jsonl. We also provide a corresponding filter computed by saving the k-tuples and their frequencies in data/maryland_ngram2_seed3_filter.pkl.
A similar dataset with a format compatible with the "reading mode" is available in data/maryland_ngram4_seed0.jsonl, but this time for a watermark window size 4 and seed 0.
We store radioactive outputs of a model trained on 5% of watermarked data in output_closed_supervised_0p05/results.jsonl (window size 2, $$\delta$$=3 and $$\gamma$$ = 0.25, seed = 0), as well as some of the training data data/used_maryland_ngram2_seed0.jsonl and a filter computed from similar watermarked data in data/used_maryland_ngram2_seed0_filter.pkl.

Usage - Closed model setting

For example, the following command analyses the radioactive outputs in output_closed_supervised_0p05/results.jsonl by concatenating the results and applying the deduplication proposed in section 4. It corresponds the closed-model/supervised setting with 5% of watermarked data in Figure 5.

Note that your model (in <your_model_path>) is not used here; the scripts notices the presence of outputs in output_closed_supervised_0p05/, so it does not generate any answers and just score the ones that are already present.

python main_watermark.py \
    --model_name <your_model_path> \
    --prompt_path "data/used_maryland_ngram2_seed0.jsonl" \
    --method none --method_detect maryland \
    --ngram 2 --scoring_method v2 \
    --nsamples 10000 --batch_size 16 \
    --output_dir output_closed_supervised_0p05/ \
    --filter_path "data/used_maryland_ngram2_seed0_filter.pkl"

Output

The previous script generates results_chunked.jsonl, which contains the following important fields, using the example of Kirchenbauer:

Field	Description
`score`	Proportion of green list tokens until that point
`num_token`	Number of analyzed tokens in the text
`num_scored`	Number of scored tokens in the text
`pvalue`	p-value of the detection test

The resulting file should look similar to output_closed_supervised_0p05/result_chunked_expected.jsonl The final result should be close to 1e-30.

Running the following command will this time generate outputs from Llama-2-7b-chat-hf from the watermarked prompts, leading to a file output_closed/results.jsonl similar to output_closed/results_expected.jsonl and compute the radioactivity detection test to produce a result result_chunked.jsonl similar to result_chunked_expected.jsonl. This time, as the model was not trained on watermarked data, the resulting p-value should be random. Without deduplication (no_dedup = 1), it will appear falsely radioactive because the prompts are watermarked. Results should be similar with or without the corresponding filter data/maryland_ngram2_seed3_filter.pkl.

python main_watermark.py \
    --model_name <your_model_path> \
    --prompt_path "data/maryland_ngram2_seed3.jsonl" \
    --method none --method_detect maryland \
    --ngram 2 --scoring_method v2 --seed 3 \
    --nsamples 1000 --batch_size 16 \
    --output_dir output_closed/ \
    --filter_path "maryland_ngram2_seed3_filter.pkl"

Open model setting

The following command will run the reading mode with deduplication on "data/maryland_ngram4.jsonl"

python main_reed_wm.py \
    --model_name  <your_model_path>\
    --dataset_path2 "data/reading_maryland_ngram4_seed0.jsonl" \
    --method_detect maryland \
    --nsamples 1000 \
    --batch_size 16 \
    --output_dir output_open/ \
    --ngram 4

If the model used is not radioactive, it should lead to a random p-value.

Compute filter

To compute a filter on the scored tokens, follow the instructions of create_filter.ipynb

Citation

If you find our work useful in your research, please consider giving a ⭐ and citing as:

@article{sander2024watermarking,
    title={Watermarking Makes Language Models Radioactive},
    author={Sander, Tom and Fernandez, Pierre and Durmus, Alain and Douze, Matthijs and Furon, Teddy},
    journal={NeurIPS},
    year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
images		images
output_closed		output_closed
output_closed_supervised_0p05		output_closed_supervised_0p05
output_open		output_open
wm		wm
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
create_filter.ipynb		create_filter.ipynb
main_eval_chunked.py		main_eval_chunked.py
main_reed_wm.py		main_reed_wm.py
main_watermark.py		main_watermark.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

☢️ Watermarking Makes Language Models Radioactive ☢️

Code

Requirements

Data and models

Usage - Closed model setting

Output

Open model setting

Compute filter

Citation

About

Releases

Packages

Languages

License

facebookresearch/radioactive-watermark

Folders and files

Latest commit

History

Repository files navigation

☢️ Watermarking Makes Language Models Radioactive ☢️

Code

Requirements

Data and models

Usage - Closed model setting

Output

Open model setting

Compute filter

Citation

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages