Skip to content
/ HAIPW Public

Implementation for the paper "Efficient Randomized Experiments Using Foundation Models"

License

Notifications You must be signed in to change notification settings

jaabmar/HAIPW

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Efficient Randomized Experiments Using Foundation Models

arXiv Python 3.12.3 License: MIT Pytorch 2.6.0

Table of Contents

Overview

HAIPW Diagram

This repository contains the Python implementation of the Hybrid Augmented Inverse Probability Weighting (HAIPW) estimator, designed to estimate the Average Treatment Effect (ATE) as introduced in the paper "Efficient Randomized Experiments Using Foundation Models".

Key Features of HAIPW:

  • Integrates predictions from multiple foundation models (e.g. LLMs) into the standard AIPW estimator to improve statistical precision, leading to tighter confidence intervals.
  • Ensures that the asymptotic variance is as small as the most precise estimator in the combination; that is, if foundation model predictions are not informative, it performs as well as the standard AIPW estimator.
  • Maintains valid statistical inference even if the foundation model predictions are arbitrarily biased.
  • Does not require additional assumptions beyond those typically needed for standard estimation in randomized experiments.
  • Supports both open-source (e.g. LLaMA) and proprietary (e.g. GPT-4, Claude) models, making it adaptable across different research settings.

For more details, see our research paper.

Contents

The HAIPW folder contains the core package:

HAIPW/
│── faheyS78/                      # Data and preprocessing scripts for the study by Fahey et al.
│   ├── generate_outcomes_opensource.py   # LLM-based outcome generation (open models)
│   ├── generate_outcomes_propietary.py   # Outcome generation (proprietary models)
│   ├── utils_data.py                     # Data utilities
│── estimators.py                  # Implementation of AIPW, HAIPW, PPI, DiM estimators
│── utils.py                        # Logging, coverage computation, helper functions
│── create_data.py                   # Dataset generation script
│── run_experiment.py                # Main script to run experiments

Getting Started

Dependencies

This package requires at least Python 3.12.3 and the following libraries:

numpy==2.2.2
pandas==2.2.3
torch==2.6.0
openai==1.61.0
anthropic==0.45.2
scikit-learn==1.6.1
scipy==1.15.1
tqdm==4.67.1
datasets==3.2.0
transformers==4.48.2
bitsandbytes==0.45.1
accelerate==1.3.0

Installation

Step 1: Create and Activate a Conda Environment

conda create -n haipw_env python -y
conda activate haipw_env

Step 2: Install the Package (2 Options)

  1. Local Installation: Start by cloning the repository from GitHub. Then, upgrade pip to its latest version and use the local setup files to install the package.
    git clone https://github.com/jaabmar/HAIPW.git
    cd HAIPW
    pip install --upgrade pip
    pip install -e .
  2. Direct Installation from GitHub:
    pip install git+https://github.com/jaabmar/HAIPW.git

Usage

Generating outcomes with LLMs

To generate synthetic data using an LLM:

python create_data.py --model_name llama_small --study_name faheyS78

Arguments

  • --model_name: Choose from (gpt4o, claude_haiku, deepseek, llama, llama_small)
  • --study_name: The dataset/study to process

Running an experiment

The main script run_experiment.py computes HAIPW and other estimators for the specified study.

Example command:

python run_experiment.py \
    --n_rct 100 \
    --n_features 5 \
    --n_folds 30 \
    --alpha_ridge 1.0 \
    --study faheyS78 \
    --model llama gpt4o claude_haiku \
    --n_seeds 1000

Arguments

  • --n_rct: Number of (subsampled) randomized controlled trial (RCT) samples
  • --n_features: Number of features selected
  • --n_folds: Number of cross-fitting folds for AIPW
  • --alpha_ridge: Ridge regression regularization parameter
  • --study: Dataset/study name
  • --model: Model(s) to use (gpt4o, claude_haiku, deepseek, llama, llama_small)
  • --n_seeds: Number of seeds for randomization
  • --n_prompts: Number of prompts (inference-time compute)

Contributing

We welcome contributions to improve this project. Here's how you can contribute:

  1. Fork the repository
  2. Create a new branch (git checkout -b feature-branch)
  3. Make your changes and commit (git commit -m "Description of change")
  4. Push to your branch (git push origin feature-branch)
  5. Open a Pull Request

Contact

For questions or collaborations, feel free to reach out:

Citation

If you find this code useful, please consider citing our paper:

@article{debartolomeis2025efficient,
     title={Efficient Randomized Experiments Using Foundation Models}, 
     author={Piersilvio De Bartolomeis and Javier Abad and Guanbo Wang and Konstantin Donhauser and Raymond M. Duch and Fanny Yang and Issa J. Dahabreh},
     year={2025},
     journal={arXiv preprint arXiv:2502.04262},
}

Releases

No releases published

Packages

No packages published

Languages