A flexible, configurable tool for masking sensitive information in advertising data exports while preserving analytics capabilities.
Data Masker helps you anonymize sensitive data in CSV files while maintaining the ability to perform analytics. It intelligently identifies which columns need masking and which should be preserved (like numeric metrics and dates), using either reversible encryption or consistent hashing. The tool maintains referential integrity across files to enable cross-file analysis with masked data.
- Multi-platform support for Google Ads, Facebook Ads, and other advertising platforms
- Configurable masking rules via YAML or JSON configuration files
- Smart column detection for numeric, date, and sensitive data
- Reversible masking with private key encryption
- Consistent masking with salted hashing to maintain relationships in data
- Type-aware prefixes that indicate the type of masked data (e.g., Campaign_abc123)
- Column name normalization to handle variations in column headers
- Analysis mode to preview which columns will be masked before processing
- Minimal dependencies - just Python with pandas, pyyaml, cryptography, and standard libraries
# Clone the repository
git clone https://github.com/revupp-ai/data-masker.git
cd data-masker
# Install dependencies
pip install pandas pyyaml cryptography
Masking Data:
python masker.py mask input_file.csv
This will create a file named masked_input_file.csv
with sensitive data masked using a hash-based approach.
Reversible Masking:
python masker.py mask input_file.csv --private-key "your-secret-key" --salt "your-salt"
This will create a masked file that can be later unmasked using the same private key and salt.
Unmasking Data:
python masker.py unmask masked_file.csv --private-key "your-secret-key" --salt "your-salt"
This will restore the original values in columns that were encrypted with reversible masking.
Masking:
python masker.py mask input_file.csv [options]
Unmasking:
python masker.py unmask masked_file.csv --private-key KEY --salt SALT [options]
Option | Description |
---|---|
--output , -o |
Specify the output file path |
--config , -c |
Path to a custom configuration file |
--salt , -s |
Salt string for consistent hashing/encryption |
--private-key , -k |
Private key for reversible encryption/decryption |
--save-config |
Save default config to specified path and exit |
--analyze-only , -a |
Only analyze the file without masking it |
# Use a custom configuration file
python masker.py mask google_ads_report.csv --config google_ads_config.yaml
# Preview which columns will be masked without applying changes
python masker.py mask facebook_ads_report.csv --analyze-only
# Save the default configuration as a starting point
python masker.py --save-config my_config.yaml
# Specify an output file
python masker.py mask data.csv --output masked_data.csv
# Use a specific salt for consistent masking across runs
python masker.py mask data.csv --salt "my-salt-2023"
# Enable reversible masking with a private key
python masker.py mask data.csv --private-key "secret-key-2023" --salt "my-salt-2023"
# Unmask previously masked data
python masker.py unmask masked_data.csv --private-key "secret-key-2023" --salt "my-salt-2023"
The masking behavior is controlled by a configuration file in YAML or JSON format. You can generate a default configuration file using the --save-config
option.
numeric_indicators
: Terms that indicate a column contains numeric data (to be preserved)date_indicators
: Terms that indicate a column contains date information (to be preserved)sensitive_indicators
: Terms that indicate a column contains sensitive data (to be masked)masking_patterns
: Rules for determining the prefix of masked values based on content patternsdefault_mask_prefix
: Default prefix for masked values that don't match patternshash_length
: Number of characters to use from the hashnormalize_column_names
: Whether to normalize column names for consistent matchingreversible_masking
: Whether to use reversible encryption when a private key is providediterations
: Number of iterations for key derivation function (higher is more secure)
# Data Masking Configuration
numeric_indicators:
# Financial metrics
- cost
- spend
- revenue
# ... more indicators
date_indicators:
- date
- day
# ... more indicators
sensitive_indicators:
# Campaign structure
- campaign
- ad set
# ... more indicators
masking_patterns:
- pattern: "^campaign"
prefix: "Campaign"
- pattern: "^ad set|^adset"
prefix: "AdSet"
# ... more patterns
default_mask_prefix: "Item"
hash_length: 8
normalize_column_names: true
reversible_masking: true
iterations: 100000
Data Masker supports two masking methods:
-
Hash-based Masking (default): Uses a one-way hash with a salt to create consistent but irreversible masked values.
-
Reversible Encryption: When a private key is provided, uses the Fernet symmetric encryption algorithm to create reversible masked values. This allows data to be unmasked later using the same private key and salt.
The format of masked values differs:
- Hash-based:
Prefix_a1b2c3d4
- Reversible:
Prefix_enc:encrypted-data-in-base64
When using reversible masking:
- Keep your private key secure - anyone with the key and salt can unmask the data
- Use a strong private key - longer, more complex keys are more secure
- Change the salt periodically for improved security
- Limit access to masked files that contain reversibly masked data
The default configuration includes patterns for:
- Google Ads - campaigns, ad groups, keywords, etc.
- Facebook Ads - campaigns, ad sets, custom audiences, etc.
- Other platforms - generic patterns that work across platforms
You can also use the DataMasker
class in your Python code:
from masker import DataMasker
# Initialize with default configuration
masker = DataMasker()
# Or with custom configuration
masker = DataMasker("config.yaml")
# Analyze a file without masking
analysis = masker.mask_file("data.csv", analyze_only=True)
print(f"Will mask {len(analysis['to_mask_columns'])} columns")
# Mask a DataFrame with reversible encryption
df = pd.read_csv("data.csv")
masked_df = masker.mask_dataframe(df, salt="my-salt", private_key="my-secret-key")
# Unmask a previously masked DataFrame
unmasked_df = masker.unmask_dataframe(masked_df, salt="my-salt", private_key="my-secret-key")
# Or mask a file directly
result = masker.mask_file("data.csv", "masked_data.csv",
salt="my-salt", private_key="my-secret-key")
print(f"Masked {len(result['masked_columns'])} columns")
# Unmask a file
result = masker.unmask_file("masked_data.csv", "unmasked_data.csv",
salt="my-salt", private_key="my-secret-key")
print(f"Unmasked {len(result['unmasked_columns'])} columns")
- Column names are normalized and analyzed against configuration patterns
- The script identifies columns containing numeric data, dates, and sensitive information
- For columns that need masking:
- With hash-based masking: values are hashed with a salt for consistency
- With reversible masking: values are encrypted with a private key and salt
- Pattern matching determines appropriate type prefixes for masked values
- The result maintains the structure of the original data but with sensitive info masked
- For unmasking, the process is reversed using the same private key and salt
- pandas - for data handling
- pyyaml - for configuration file parsing
- cryptography - for secure encryption/decryption (required for reversible masking)