Data Masker

A flexible, configurable tool for masking sensitive information in advertising data exports while preserving analytics capabilities.

Overview

Data Masker helps you anonymize sensitive data in CSV files while maintaining the ability to perform analytics. It intelligently identifies which columns need masking and which should be preserved (like numeric metrics and dates), using either reversible encryption or consistent hashing. The tool maintains referential integrity across files to enable cross-file analysis with masked data.

Features

Multi-platform support for Google Ads, Facebook Ads, and other advertising platforms
Configurable masking rules via YAML or JSON configuration files
Smart column detection for numeric, date, and sensitive data
Reversible masking with private key encryption
Consistent masking with salted hashing to maintain relationships in data
Type-aware prefixes that indicate the type of masked data (e.g., Campaign_abc123)
Column name normalization to handle variations in column headers
Analysis mode to preview which columns will be masked before processing
Minimal dependencies - just Python with pandas, pyyaml, cryptography, and standard libraries

Installation

# Clone the repository
git clone https://github.com/revupp-ai/data-masker.git
cd data-masker

# Install dependencies
pip install pandas pyyaml cryptography

Usage

Basic Usage

Masking Data:

python masker.py mask input_file.csv

This will create a file named masked_input_file.csv with sensitive data masked using a hash-based approach.

Reversible Masking:

python masker.py mask input_file.csv --private-key "your-secret-key" --salt "your-salt"

This will create a masked file that can be later unmasked using the same private key and salt.

Unmasking Data:

python masker.py unmask masked_file.csv --private-key "your-secret-key" --salt "your-salt"

This will restore the original values in columns that were encrypted with reversible masking.

Command Line Options

Masking:

python masker.py mask input_file.csv [options]

Unmasking:

python masker.py unmask masked_file.csv --private-key KEY --salt SALT [options]

Option	Description
`--output`, `-o`	Specify the output file path
`--config`, `-c`	Path to a custom configuration file
`--salt`, `-s`	Salt string for consistent hashing/encryption
`--private-key`, `-k`	Private key for reversible encryption/decryption
`--save-config`	Save default config to specified path and exit
`--analyze-only`, `-a`	Only analyze the file without masking it

Examples

# Use a custom configuration file
python masker.py mask google_ads_report.csv --config google_ads_config.yaml

# Preview which columns will be masked without applying changes
python masker.py mask facebook_ads_report.csv --analyze-only

# Save the default configuration as a starting point
python masker.py --save-config my_config.yaml

# Specify an output file
python masker.py mask data.csv --output masked_data.csv

# Use a specific salt for consistent masking across runs
python masker.py mask data.csv --salt "my-salt-2023"

# Enable reversible masking with a private key
python masker.py mask data.csv --private-key "secret-key-2023" --salt "my-salt-2023"

# Unmask previously masked data
python masker.py unmask masked_data.csv --private-key "secret-key-2023" --salt "my-salt-2023"

Configuration

The masking behavior is controlled by a configuration file in YAML or JSON format. You can generate a default configuration file using the --save-config option.

Configuration Options

numeric_indicators: Terms that indicate a column contains numeric data (to be preserved)
date_indicators: Terms that indicate a column contains date information (to be preserved)
sensitive_indicators: Terms that indicate a column contains sensitive data (to be masked)
masking_patterns: Rules for determining the prefix of masked values based on content patterns
default_mask_prefix: Default prefix for masked values that don't match patterns
hash_length: Number of characters to use from the hash
normalize_column_names: Whether to normalize column names for consistent matching
reversible_masking: Whether to use reversible encryption when a private key is provided
iterations: Number of iterations for key derivation function (higher is more secure)

Example Configuration

# Data Masking Configuration
numeric_indicators:
  # Financial metrics
  - cost
  - spend
  - revenue
  # ... more indicators

date_indicators:
  - date
  - day
  # ... more indicators

sensitive_indicators:
  # Campaign structure
  - campaign
  - ad set
  # ... more indicators

masking_patterns:
  - pattern: "^campaign"
    prefix: "Campaign"
  - pattern: "^ad set|^adset"
    prefix: "AdSet"
  # ... more patterns

default_mask_prefix: "Item"
hash_length: 8
normalize_column_names: true
reversible_masking: true
iterations: 100000

Masking Methods

Data Masker supports two masking methods:

Hash-based Masking (default): Uses a one-way hash with a salt to create consistent but irreversible masked values.
Reversible Encryption: When a private key is provided, uses the Fernet symmetric encryption algorithm to create reversible masked values. This allows data to be unmasked later using the same private key and salt.

The format of masked values differs:

Hash-based: Prefix_a1b2c3d4
Reversible: Prefix_enc:encrypted-data-in-base64

Security Considerations

When using reversible masking:

Keep your private key secure - anyone with the key and salt can unmask the data
Use a strong private key - longer, more complex keys are more secure
Change the salt periodically for improved security
Limit access to masked files that contain reversibly masked data

Supported Advertising Platforms

The default configuration includes patterns for:

Google Ads - campaigns, ad groups, keywords, etc.
Facebook Ads - campaigns, ad sets, custom audiences, etc.
Other platforms - generic patterns that work across platforms

Programmatic Usage

You can also use the DataMasker class in your Python code:

from masker import DataMasker

# Initialize with default configuration
masker = DataMasker()

# Or with custom configuration
masker = DataMasker("config.yaml")

# Analyze a file without masking
analysis = masker.mask_file("data.csv", analyze_only=True)
print(f"Will mask {len(analysis['to_mask_columns'])} columns")

# Mask a DataFrame with reversible encryption
df = pd.read_csv("data.csv")
masked_df = masker.mask_dataframe(df, salt="my-salt", private_key="my-secret-key")

# Unmask a previously masked DataFrame
unmasked_df = masker.unmask_dataframe(masked_df, salt="my-salt", private_key="my-secret-key")

# Or mask a file directly
result = masker.mask_file("data.csv", "masked_data.csv", 
                         salt="my-salt", private_key="my-secret-key")
print(f"Masked {len(result['masked_columns'])} columns")

# Unmask a file
result = masker.unmask_file("masked_data.csv", "unmasked_data.csv", 
                           salt="my-salt", private_key="my-secret-key")
print(f"Unmasked {len(result['unmasked_columns'])} columns")

How It Works

Column names are normalized and analyzed against configuration patterns
The script identifies columns containing numeric data, dates, and sensitive information
For columns that need masking:
- With hash-based masking: values are hashed with a salt for consistency
- With reversible masking: values are encrypted with a private key and salt
Pattern matching determines appropriate type prefixes for masked values
The result maintains the structure of the original data but with sensitive info masked
For unmasking, the process is reversed using the same private key and salt

Dependencies

pandas - for data handling
pyyaml - for configuration file parsing
cryptography - for secure encryption/decryption (required for reversible masking)

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
README.md		README.md
config.yaml		config.yaml
marketing_data.csv		marketing_data.csv
masked_marketing_data.csv		masked_marketing_data.csv
masker.py		masker.py
unmasked_masked_marketing_data.csv		unmasked_masked_marketing_data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Masker

Overview

Features

Installation

Usage

Basic Usage

Command Line Options

Examples

Configuration

Configuration Options

Example Configuration

Masking Methods

Security Considerations

Supported Advertising Platforms

Programmatic Usage

How It Works

Dependencies

License

About

Releases

Packages

Languages

revupp-ai/data-masker

Folders and files

Latest commit

History

Repository files navigation

Data Masker

Overview

Features

Installation

Usage

Basic Usage

Command Line Options

Examples

Configuration

Configuration Options

Example Configuration

Masking Methods

Security Considerations

Supported Advertising Platforms

Programmatic Usage

How It Works

Dependencies

License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages