This repo include papers about the watermarking methods for generative AI models. Watermarking is a method for embedding an imperceptible, but recoverable signal (payload) into a digital asset (cover). With generative models, there are approaches which train the model to produce the watermark in every output and this behaviour should be hard to disable. We refer to this as "Fingerprint Rooting" or just "Rooting".
- 1. Introduction
- 2. Image Domain
- 3. Audio Domain
- 4. Text Domain
- 5. Related News
- 6. Generative Model stealing Papers
- 7. Survey Papers
- 7.1 A Systematic Review on Model Watermarking for Neural Networks
- 8 Further Links
- Deep fake detection (Is a digit asset AI-generated?)
- Counter misinformation
- Prevent data crawlers from picking up generated content and cause "Self-Consuming Generative Models Go MAD"
- Deep fake attribution (By whom (which user of A model API) has it been generated?)
- Enhanced Model Fingerprinting (By which model has it been generated?)
- IP protection
- Protect valuable models
- Protect valuable training data (e.g. style)
- Tamper Localization (Where has an asset been doctored?)
- see "EditGuard: Versatile Image Watermarking for Tamper Localization and Copyright Protection"
- Fingerprint Rooting vs. Post-Hoc Watermarking
- Fingerprint Rooting
- The model is manipulated in a way that its usual generating process produces watermarked output
- This DOES protect open models
- Post-Hoc Watermarking
- The Watermark is added after the fact, in a separate process
- This does NOT protect open models, as the watermark embedding procedure can be simply disabled
- Fingerprint Rooting
- Static vs. Dynamic Watermarking
- Static watermarking
- "[...] specific pattern in its static content, such as a particular distribution of parameters" (see "Intellectual Property Protection of Diffusion Models via the Watermark Diffusion Process")
- Dynamic Watermarking
- "[...] specific pattern in model’s dynamic contents, such as its behavior.", e.g. trigger-prompt-watermark-backdoors
- Requires at least API-access to the model, as the watermark will only be present upon specific trigger
- Examples for introducing backdoors in general (not for watermarking specifically) into diffusion models:
- "[...] specific pattern in model’s dynamic contents, such as its behavior.", e.g. trigger-prompt-watermark-backdoors
- Static watermarking
- Tuning mechanism
- Via Training data
- e.g. Artificial Fingerprinting for Generative Models: Rooting Deepfake Attribution in Training Data) -> "Plug-and-play", as it works on all architectures due to transferability
- Via joint fine-tuning of a model and a decoder (taken from a encoder/decoder-pair) on few samples
- e.g. The Stable Signature: Rooting Watermarks in Latent Diffusion Models (Requires latent generative model)
- Via Training data
- Flexibility: How quick can a model be rooted:
- Full training for each instance
- Fine-tuning for each instance
- Fine-tuning once
- e.g. using message matrix as in Flexible and Secure Watermarking for Latent Diffusion Model
- Generated asset yes/no
- Identity of watermarking party
- Identifier of the asset in provenance database (can replace perceptual hashing, mentioned in "RoSteALS: Robust Steganography using Autoencoder Latent Space")
- Watermark removal
- Removing a watermark from a given digital asset
- Attacker goals
- A fake asset can be claimed to be real
- Disable IP claims
- Misinformation
- Sensor spoofing (weird, but it was mentioned in Responsible Disclosure of Generative Models Using Scalable Fingerprinting)
- A fake asset can be claimed to be real
- Robustness property
- Removing the watermark should decrease the asset quality. This negates the usefulness of the asset for malicious goals
- Watermark forgery (referred to as spoofing by Robustness of AI-Image Detectors: Fundamental Limits and Practical Attacks)
- Adding a watermark to a given digital asset
- Attacker goals
- False IP claims
- A real asset can be denounced as fake (Misinformation)
- Reputation loss after linking obscene content with model (mentioned in Robustness of AI-Image Detectors: Fundamental Limits and Practical Attacks)
- Model purification
- A watermarked model which should only produce watermarked output, even if distributed to untrusted parties (i.e. Stable Signature), is "purified" in a way that removes the watermarks in its output.
- Attacker goals
- Obtain a model which does not produced watermarked content
- Robustness property
- Removing the watermark functionality of the model should decrease the output quality. This negates the usefulness of the asset for malicious goals
- Whitebox
- Attacker has full access to a generative AI model
- ... TODO
Paper | Proceedings / Journal | Venue Year / Last Updated | Code | Alternative PDF Source | Notes |
---|---|---|---|---|---|
Watermarking is not Cryptography | IWDW | 2006 | - | Author webpage | - TODO |
Paper | Proceedings / Journal | Venue Year / Last Updated | Code | Alternative PDF Source | Notes |
---|---|---|---|---|---|
Artificial Fingerprinting for Generative Models: Rooting Deepfake Attribution in Training Data | ICCV | 2021 | - | Arxiv | - Rooting GAN models. By embedding watermark into training data to exploit transferability |
PTW: Pivotal Tuning Watermarking for Pre-Trained Image Generators | USENIX | 2023 | Github | Arxiv | - Focus on GANs, but latent diffusion models should work too |
The Stable Signature: Rooting Watermarks in Latent Diffusion Models | ICCV | 2023 | Github | Arxiv | - Meta/FAIR author Finetune a model in accordance with encoder/decoder to reveal a secret message in its output. - robust to watermark removal and model purification (quality deterioration) - Static watermarking |
Stable Signature is Unstable: Removing Image Watermark from Diffusion Models | - | 2024 | - | Arxiv | - Stable Signature model purification via finetuning |
Flexible and Secure Watermarking for Latent Diffusion Model | ACM MM | 2023 | - | - | - References Stable Signature and improves by adding flexibility by allowing for embedding different messages w.o. finetuning |
A Training-Free Plug-and-Play Watermark Framework for Stable Diffusion | - | 2024 | - | Arxiv | - TODO |
WOUAF: Weight Modulation for User Attribution and Fingerprinting in Text-to-Image Diffusion Models | NeurIPS Workshop on Diffusion Models | 2023 | - | Arxiv | - TODO |
RoSteALS: Robust Steganography using Autoencoder Latent Space | CVPR Workshops (CVPRW) | 2023 | Github | Arxiv | - Post-hoc watermarking |
DiffusionShield: A Watermark for Copyright Protection against Generative Diffusion Models | NeurIPS Workshop on Diffusion Models | 2023 | - | Arxiv | - Not about Rooting -Data Poisoning protected images which will reproduce if used as training data in diffusion model |
A Recipe for Watermarking Diffusion Models | - | 2023 | Github | Arxiv | - Framework for 1. small unconditional/class-conditional DMs via training from scratch on watermarked data and 2. text-to-image DMs via finetuning a backdoor-trigger-output - Lots of references on watermarking discriminative models - Static watermarking |
Intellectual Property Protection of Diffusion Models via the Watermark Diffusion Process | - | 2023 | - | Arxiv | - Threat model: Check ownership of model by having access to the model - Hard to read - Explains difference between static and dynamic watermarking with many references |
Securing Deep Generative Models with Universal Adversarial Signature | - | 2023 | Github | Arxiv | - 1. Find optimal signature for an image individually. - 2. Finetune a GenAI model on these images. |
Watermarking Diffusion Model | - | 2023 | - | Arxiv | - Finetuning a backdoor-trigger-output - Static watermarking - CISPA authors |
Catch You Everything Everywhere: Guarding Textual Inversion via Concept Watermarking | - | 2023 | - | Arxiv | - Guards concepts obtained through textual inversion (An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion) from abuse by allowing to identify concepts in generated images. - Very interesting references on company and government stances on watermarking |
Generative Watermarking Against Unauthorized Subject-Driven Image Synthesis | - | 2023 | - | Arxiv | - Different from Glaze in that style synthesis from protected source images is not prevented, but recognizable via watermarks - CISPA authors |
Towards the Vulnerability of Watermarking Artificial Intelligence Generated Content | - | 2024 | - | OpenReview | - Watermark removal and forgery in one method, using GAN - References two types of watermarking: 1. Learn/finetune model to produce watermarked output and 2. post-hoc watermarking after the fact (static vs. dynamic, see "Intellectual Property Protection of Diffusion Models via the Watermark Diffusion Process") |
Robustness of AI-Image Detectors: Fundamental Limits and Practical Attacks | ICLR | 2024 | Github | Arxiv | - They show that low budget watermarking methods are beaten by diffusion purification and propose an attack that can even remove high budget watermarks by model substitution |
A Transfer Attack to Image Watermarks | - | 2024 | - | Arxiv | - Watermark removal by "no-box"-attack on detectors (no access to detector-API, instead training classifier to distinguish watermarked and vanilla images) |
EditGuard: Versatile Image Watermarking for Tamper Localization and Copyright Protection | CVPR | 2024 | Github | Arxiv | - Post-hoc watermarking with tamper localization |
Latent Watermark: Inject and Detect Watermarks in Latent Diffusion Space | - | 2024 | - | Arxiv | - Discusses 3 categories for watermarks with references: before, during, and after generation |
Stable Messenger: Steganography for Message-Concealed Image Generation | - | 2023 | - | Arxiv | - Post-hoc watermarking - Watermark embedding during generation according to "Latent Watermark: Inject and Detect Watermarks in Latent Diffusion Space", but I think it is actually post-hoc. |
Paper | Proceedings / Journal | Venue Year / Last Updated | Code | Alternative PDF Source | Notes |
---|---|---|---|---|---|
StegaStamp: Invisible Hyperlinks in Physical Photographs | CVPR | 2020 | Github | Arxiv | - Watermark in physical images that can be captured from video stream - "Towards the Vulnerability of Watermarking Artificial Intelligence Generated Content" speculates that Deepmind SynthID works similarly to this |
ChartStamp: Robust Chart Embedding for Real-World Applications | ACM MM | 2022 | Github | - | - Like StegaStamp, but it introduces less clutter in flat regions in images |
Unadversarial Examples: Designing Objects for Robust Vision | NeurIPS | 2021 | Github | Arxiv | - Perturbations to make detection easier |
Paper | Proceedings / Journal | Venue Year / Last Updated | Code | Alternative PDF Source | Notes |
---|---|---|---|---|---|
RAW: A Robust and Agile Plug-and-Play Watermark Framework for AI-Generated Images with Provable Guarantees | - | 2024 | Github | Arxiv | - Withdrawn from arxiv |
PiGW: A Plug-in Generative Watermarking Framework | - | 2024 | Did not look for it yet | Arxiv | - Withdrawn from arxiv |
Benchmarking the Robustness of Image Watermarks (Wait for ICML source) | ICML | 2024 | Github | Arxiv | - TODO |
WMAdapter: Adding WaterMark Control to Latent Diffusion Models | - | 2024 | Did not look for it yet | Arxiv | - TODO |
Steganalysis on Digital Watermarking: Is Your Defense Truly Impervious? | - | 2024 | Did not look for it yet | Arxiv | - TODO |
Finding needles in a haystack: A Black-Box Approach to Invisible Watermark Detection | - | 2024 | Did not look for it yet | Arxiv | - TODO |
ProMark: Proactive Diffusion Watermarking for Causal Attribution | CVPR | 2024 | - | Arxiv | - TODO |
Watermarking Images in Self-Supervised Latent Spaces | ICASSP | 2022 | Github | Arxiv | - TODO |
Generative Autoencoders as Watermark Attackers: Analyses of Vulnerabilities and Threats | ICML Workshop DeployableGenerativeAI | 2023 | - | - | - Attack on pixel-watermarks using LDM autoencoders |
Invisible Image Watermarks Are Provably Removable Using Generative AI | - | 2023 | Github | Arxiv | - Is not about rooting a model, but removing watermarks with diffusion purification - Evaluates stable signature and Tree-Ring Watermarks. Tree-ring is robust against their attack. - Earlier Version of Generative Autoencoders as Watermark Attackers |
WaterDiff: Perceptual Image Watermarks Via Diffusion Model | IVMSP-P2 Workshop at ICASSP | 2024 | - | - | - TODO |
Squint Hard Enough: Attacking Perceptual Hashing with Adversarial Machine Learning | USENIX | 2022 | - | - | - Attacks on perceptual hashes |
Evading Watermark based Detection of AI-Generated Content | CCS | 2023 | Github | Arxiv | - Evaluation of robustness of image watermarks + Adversarial sample for evasion |
Diffusion Models for Adversarial Purification | ICML | 2022 | Github | Arxiv | - Defense against adversarial pertubation, including imperceptible watermarks in images |
Flow-Based Robust Watermarking with Invertible Noise Layer for Black-Box Distortions | AIII | 2023 | Github | - | - Like HiDDeN, just a neural watermark encoder/extractor |
HiDDeN: Hiding Data With Deep Networks | ECCV | 2018 | Github | Arxiv | - Main tool used in Stable Signature - Contains differentiable approx. of JPEG compression - Dynamic watermarking |
Glaze: Protecting artists from style mimicry by text-to-image models | USENIX | 2023 | Github | Arxiv | - Is not about Rooting, but denying style stealing |
DUAW: Data-free Universal Adversarial Watermark against Stable Diffusion Customization | - | 2023 | - | Arxiv | - Seem similar to Glaze on first glance. Authors may have been unlucky to do parallel work |
Responsible Disclosure of Generative Models Using Scalable Fingerprinting | ICLR | 2022 | Github | Arxiv | - Rooting GAN models. Seems to have introduced the idea of scalably producing many models fast with large message space (TODO: check this later), similar to how Stable Signature did it later for stable diffusion. |
On Attribution of Deepfakes | - | 2020 | - | Arxiv | - They show that an image can be created that looks like it may have been generated by a targeted model. They also propose a framework how to achieve deniability for such cases. |
Towards Blind Watermarking: Combining Invertible and Non-invertible Mechanisms | ACM MM | 2022 | Github | Arxiv | - Is not about rooting a model, but about attacking post-hoc watermarking of images - Lots of references on invertible NNs |
DocDiff: Document Enhancement via Residual Diffusion Models | ACM MM | 2023 | Github | Arxiv | - Is not about rooting a model, but about post-hoc watermarking of images - Includes classic watermark removal |
Warfare:Breaking the Watermark Protection of AI-Generated Content | - | 2023 | Did not look for it yet | Arxiv | - Is not about rooting a model, but about attacking post-hoc watermarking - Includes 1. watermark removal and 2. forging |
Leveraging Optimization for Adaptive Attacks on Image Watermarks | ICML (Poster) | 2024 | Did not look for it yet | Arxiv | - Is not about rooting a model, but about attacking post-hoc watermarking |
A Somewhat Robust Image Watermark against Diffusion-based Editing Models | - | 2023 | Did not look for it yet | Arxiv | - Is not about rooting a model, but about post-hoc watermarking of images - Takes watermarks literally and injects hidden images |
Hey That's Mine Imperceptible Watermarks are Preserved in Diffusion Generated Outputs | - | 2023 | - | Arxiv | - Is not about rooting a model. They show that watermarks in training data are recognizable in output and allow for intellectual property claims |
Benchmarking the Robustness of Image Watermarks | - | 2024 | Github | Arxiv | - Just a benchmark/framework for testing watermarks against |
Free Fine-tuning: A Plug-and-Play Watermarking Scheme for Deep Neural Networks | ACM MM | 2023 | Did not look for it yet | Arxiv | - Is not about generative models, but discriminative models |
Adversarial Attack for Robust Watermark Protection Against Inpainting-based and Blind Watermark Removers | ACM MM | 2023 | Did not look for it yet | - | - Post-hoc watermark with enhanced robustness against inpainting |
A Novel Deep Video Watermarking Framework with Enhanced Robustness to H.264/AVC Compression | ACM MM | 2023 | Github | - | - Post-hoc watermark for videos |
Practical Deep Dispersed Watermarking with Synchronization and Fusion | ACM MM | 2023 | Did not look for it yet | Arxiv | - Post-hoc watermark for images with enhanced robustness to transformations |
Generalizable Synthetic Image Detection via Language-guided Contrastive Learning | - | 2023 | Github | Arxiv | - Is not about rooting, but GenAI image detection |
Enhancing the Robustness of Deep Learning Based Fingerprinting to Improve Deepfake Attribution | ACM MM-Asia | 2022 | - | - | - Is not about rooting, but transformation-robustness strategies for watermarks |
You are caught stealing my winning lottery ticket! Making a lottery ticket claim its ownership | NeurIPS | 2021 | Github | Arxiv | - Watermarking the sparsity mask of winning lottery tickets |
Self-Consuming Generative Models Go MAD | ICLR (Poster) | 2024 | - | Arxiv | - Contains a reason why GenAI detection is important: Removing generated content from training sets |
Paper | Proceedings / Journal | Venue Year / Last Updated | Code | Alternative PDF Source | Notes |
---|---|---|---|---|---|
Proactive Detection of Voice Cloning with Localized Watermarking | - | 2024 | Github | Arxiv | - Meta/FAIR author |
MaskMark: Robust Neural Watermarking for Real and Synthetic Speech | ICASSP | 2024 | Audio samples | IEEExplore | - |
Collaborative Watermarking for Adversarial Speech Synthesis | ICASSP | 2024 | - | Arxiv | - Meta/FAIR author |
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis | NeurIPS | 2020 | Github | Arxiv | - Very good GAN for Speech synthesis (TODO: Is this SotA?) - Can do live synthesis even on CPU - Quality is on par with autoregressive models |
Spoofed Training Data for Speech Spoofing Countermeasure Can Be Efficiently Created Using Neural Vocoders | ICASSP | 2023 | - | Arxiv | - Include vocoder generated training data to enhance detection capabilities for countermeasures |
AudioQR: Deep Neural Audio Watermarks For QR Code | IJCAI | 2023 | Github | - | - Imperceptible QR-codes in audio for the visually impaired |
Paper | Proceedings / Journal | Venue Year / Last Updated | Code | Alternative PDF Source | Notes |
---|---|---|---|---|---|
ASVspoof 2021 Challenge | - | 2021 | Github | Arxiv | - Challenge for audio spoofing detection |
ADD 2022: the first Audio Deep Synthesis Detection Challenge | ICASSP | 2022 | Github | Arxiv | - Official Chinese challenge website (NO HTTPS!) |
- For more (but older dataset see Awesome-DeepFake-Learning)
- Github topics: Audio Synthesis
- Github topic: Audio Deepfake Detection
- Awesome Deepfakes Detection
- Awesome-DeepFake-Learning
- Stable Audio: Fast Timing-Conditioned Latent Audio Diffusion (is supposedly also very fast, like HIFI-GAN)
- Meta AI: VoiceBox
- SotA Speech model with many functionalities (noise removal (barking), style transfer, ...)
- TTS libraries (including speech synthesis)
- Coqui-AI library
- Mimic3
- Amphion-AI
- Tortoise TTS
- pet project by some student
- includes tools for detection
Paper | Proceedings / Journal | Venue Year / Last Updated | Code | Alternative PDF Source | Notes |
---|---|---|---|---|---|
Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models | - | 2023 | Github | Arxiv | - |
Adversarial Watermarking Transformer: Towards Tracing Text Provenance with Data Hiding | S&P | 2021 | Github | Arxiv | - |
Resilient Watermarking for LLM-Generated Codes | - | 2024 | Github Appendix | Arxiv | - Code |
Provably Robust Multi-bit Watermarking for AI-generated Text via Error Correction Code | - | 2024 | - | Arxiv | - Error correction |
Provable Robust Watermarking for AI-Generated Text | ICLR | 2024 | Github | Arxiv | - Apparently good and robust LLM Watermarking |
Towards Codable Watermarking for Injecting Multi-Bits Information to LLMs | ICLR | 2024 | Github | Arxiv | - TODO |
- Coalition for Content Provenance and Authenticity (C2PA)
- Is based on a trust model with signing authorities which certify signer of some digital asset through a chain of trust, similar to internet PKI.
- C2PA Specifications
- Explainer
- "Provenance generally refers to the facts about the history of a piece of digital content assets (image, video, audio recording, document). C2PA enables the authors of provenance data to securely bind statements of provenance data to instances of content using their unique credentials. These provenance statements are called assertions by the C2PA. They may include assertions about who created the content and how, when, and where it was created. They may also include assertions about when and how it was edited throughout its life. The content author, and publisher (if authoring provenance data) always has control over whether to include provenance data as well as what assertions are included, such as whether to include identifying information (in order to allow for anonymous or pseudonymous assets). Included assertions can be removed in later edits without invalidating or removing all of the included provenance data in a process called redaction."
- "In the C2PA Specifications, trust decisions are made by the consumer of the asset based on the identity of the actor(s) who signed the provenance data along with the information in the assertions contained in the provenance. This signing takes place at each significant moment in an asset’s life (e.g., creation, editing, etc.) through the use of the actor’s unique credentials and ensures that the provenance data remains cryptographically bound to the newly created or updated asset."
- "Soft bindings are described using soft binding assertions such as via a perceptual hash computed from the digital content or a watermark embedded within the digital content. These soft bindings enable digital content to be matched even if the underlying bits differ, for example due to an asset rendition in a different resolution or encoding format. Additionally, should a C2PA manifest be removed from an asset, but a copy of that manifest remains in a provenance store elsewhere, the manifest and asset may be matched using available soft bindings."
- This includes ISCC - Content Codes
- -> allows authors and subsequent actors to sign assets and make this act publicly known to establish the asset's history.
- Explainer
- Open-source tools for content authenticity and provenance
- Uses manifests defined in C2PA Specifications
- Enables camera manufacturers to insert authenticity meta-data on-device at time of capture
- Deepmind SynthID
- Images (Vertex AI imagen)
- Audio (Google DeepMind's Lyria)
- Transfer artist's styles to audio prompts (style transfer)
- Most likely, the watermark will include the prompt used to generate audio (e.g. Artist name). This allows for copyright claims.
- In Beta (as of 26 Mar 2024)
- Closed off as can be (as of 26 Mar 2024)
- Google hosted a workshop in June 2023 (Identifying and Mitigating the Security Risks of Generative AI): "Watermarking was mentioned as a promising mitigation. They are robust when attacker has no access to detection algorithm"
- Watermarking is identified as tool for establishing trust in a post GenAI environment by big tech (OpenAI moving AI governance forward statement, Google "Our commitment to advancing bold and responsible AI, together") and government (Biden-Harris Administration Secures Voluntary Commitments from Leading Artificial Intelligence Companies to Manage the Risks Posed by AI )
- Watermarks can already be easily inserted into Stable Diffusion models (This method invisible-watermark repo is referenced by the official Stable Diffusion repo). This is based on Digital Watermarking and Steganography" (DwtDct and DwtDctSvd). Also see watermark option in the Stable Diffusion repo https://github.com/CompVis/stable-diffusion/blob/main/scripts/txt2img.py#L69.
- Stable Diffusion XL recommends using invisible watermarks pip package. The supported algorithms are Dwt, Dct, and RivaGAN .
- China bans GenAI without Watermarks
Paper | Proceedings / Journal | Venue Year / Last Updated | Code | Alternative PDF Source | Notes |
---|---|---|---|---|---|
Stealing Machine Learning Models: Attacks and Countermeasures for Generative Adversarial Networks | ACSAC | 2021 | - | Arxiv | - |
Model Extraction Attack and Defense on Deep Generative Models | Journal of Physics | 2022 | - | - | - |
Model Extraction and Defenses on Generative Adversarial Networks | - | 2021 | - | Arxiv | - |
Paper | Proceedings / Journal | Venue Year / Last Updated | Code | Alternative PDF Source | Notes |
---|---|---|---|---|---|
A Comprehensive Survey on Robust Image Watermarking | Neurocomputing | 2022 | - | Arxiv | - Not about model rooting |
A Systematic Review on Model Watermarking for Neural Networks | Frontiers in Big Data | 2021 | - | Arxiv | - Not about model rooting |
A Comprehensive Review on Digital Image Watermarking | - | 2022 | - | Arxiv | - Not about model rooting |
Copyright Protection in Generative AI: A Technical Perspective | - | 2024 | - | Arxiv | - About IP protection in GenAI in general |
Security and Privacy on Generative Data in AIGC: A Survey | - | 2023 | - | Arxiv | - About security aspects in GenAI in general |
Detecting Multimedia Generated by Large AI Models: A Survey | - | 2024 | - | Arxiv | - About detecting GenAI in general |
Audio Deepfake Detection: A Survey | - | 2023 | - | Arxiv | - Contains overview of spoofed audio datasets, spoofing methods, and detection methods - Very good servey |
Summarization of the systematization given in this review.
- Embedding method
- Watermark in model parameters
- Trigger-Watermark-Backdoor
- Verification access
- Whitebox (access model parameters)
- Blackbox (access via API)
- Capacity
- Zero-bit (is watermark exists)
- Multi-Bit (watermark contains arbitrary info)
- Authentication
- Model is watermarked
- By whom model is watermarked
- Uniqueness
- All model instances carry same watermark
- Different model instances carry different watermarks
Goal | Explaination | Motivation |
---|---|---|
Fidelity | High prediction quality on original tasks | model performance shouldn't significantly degrade |
Robustness | Watermark should resist removal | protects against copyright evasion |
Reliability | Minimal false negatives | ensures rightful ownership is recognized |
Integrity | Minimal false positives | prevents wrongful accusations of theft |
Capacity | Supports large information amounts | allows comprehensive watermarks |
Secrecy | Watermark must be secret and undetectable | prevents unauthorized detection |
Efficiency | Fast watermark insertion and verification | avoids computational burden |
Generality | Independent of datasets and ML algorithms | facilitates widespread application |
- Attacker Knowledge:
- existence of the watermark
- model and its parameters
- watermarking scheme used
- (parts of) the training data
- (parts of) the watermark itself or the trigger dataset
- Attacker Capabilities (irrelevant)
- passive (eavesdropping)
- active (interaction)
- Attacker Objectives
- For what is model being used by the attacker? (rather unspecific)
- Watermark Detection (weakest)
- Watermark Suppression, i.e. avoid watermark verification
- e.g. dissimulating any presence of a watermark in the model parameters and behavior
- e.g. suppressing the reactions of the model to the original watermark trigger
- Watermark Forging
- Recovering the legitimate owner’s watermark and claiming ownership (if there is no binding between the watermark and the owner)
- Adding a new watermark that creates ambiguity concerning ownership
- Identifying a fake watermark within the model that coincidentally acts like a real watermark but actually is not
- Watermark Overwriting
- Adding Watermark to model with deactivating old one (strong)
- Adding Watermark to model without deactivating old one (weak)
- Watermark Removal
- depends on the presence of a watermark
- depends on the underlying watermarking scheme
- depends on availability of additional data, e.g. for fine-tuning or retraining
- Methods
- Fine-Tuning
- Pruning
- Quantization
- Distillation
- Transfer-Learning
- Backdoor Removal
- Methods
- Embedding Watermarks into Model Parameters
- Adding patterns into model which can be verified locally
- Using Pre-Defined Inputs as Triggers
- Adding behaviour triggered by special input
- Using Model Fingerprints to Identify Potentially Stolen Instances
- No additional action needed, just recognizing a model based on some criteria
- Awesome github with resources on neural IP protection https://github.com/ZJZAC/awesome-deep-model-IP-protection