nanoTabPFN

The purpose of this repository is to provide a fully open source playground for tabular foundation models. It contains a much smaller and simpler implementation of the TabPFNv2 architecture as well as a training loop and code for loading data that was pre-generated by a prior. We are planning to rapidly extend the repository with more features (e.g. regression, missing values, categorical features), prior interfaces and architectures. It is supposed to be a good starting point for students and researchers that are interested in learning about how TabPFN works under the hood.

Clone the repository, afterwards install dependencies via:

pip install -e .

We offer the same interface as TabPFN:

from sklearn.datasets import load_breast_cancer
from sklearn.metrics import accuracy_score, roc_auc_score
from sklearn.model_selection import train_test_split

from nanotabpfn import NanoTabPFNClassifier

# Load data
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=42)

# Initialize a classifier
clf = NanoTabPFNClassifier()
clf.fit(X_train, y_train)

# Predict probabilities
prediction_probabilities = clf.predict_proba(X_test)
print("ROC AUC:", roc_auc_score(y_test, prediction_probabilities[:, 1]))

# Predict labels
predictions = clf.predict(X_test)
print("Accuracy", accuracy_score(y_test, predictions))

Our Code

nanotabpfn/model.py contains the implementation of the architecture in less than 250 lines of code. nanotabpfn/train.py implements a simple training loop in under 100 lines and nanotabpfn/priors.py implements a dataloader that allows you to load a dump pre-generated from a prior. We will release multiple dumps of different scales soon. We also offer an interface where you can provide your own get_batch function.

Pretrain your own small nanoTabPFN

First we download 100k pre-generated datasets with 50 datapoints, 3 features and up to 3 classes each from here.

Then you can run:

python pretrain_classification.py -epochs 80 -steps 25 -batchsize 50 -priordump 50x3_3_100k_classification.h5

This should take less than 5 min on a modern NVIDIA GPU (around 10 minutes on Macbook M4 Pro GPU and around 40 min on M4 Pro CPU).

We also offer a pre-generated dataset containing 1.28M tables with 50 datapoints and 3 features each for regression here.

You can pretrain on it using python pretrain_regressor.py.

Step by Step Explanation (Classifier)

First we import our Architecture, Prior interface and training loop, etc.

from nanotabpfn.model import NanoTabPFNModel
from nanotabpfn.priors import PriorDumpDataLoader
from nanotabpfn.train import train
from nanotabpfn.utils import get_default_device
from nanotabpfn.interface import NanoTabPFNClassifier
from torch.nn import CrossEntropyLoss

then we instantiate our model and loss criterion:

model = NanoTabPFNModel(
    num_attention_heads=6,
    embedding_size=192,
    mlp_hidden_size=768,
    num_layers=6,
    num_outputs=10,
)
criterion = CrossEntropyLoss()

then we instantiate our prior:

device = get_default_device()
prior = PriorDumpDataLoader(filename='50x3_3_100k_classification.h5', num_steps=25, batch_size=50, device=device)

and finally train our model:

def epoch_callback(epoch, epoch_time, mean_loss, model):
    classifier = NanoTabPFNClassifier(model, device)
    # you can add your own eval code here that runs after every epoch
    print(f'epoch {epoch:5d} | time {epoch_time:5.2f}s | mean loss {mean_loss:5.2f}', flush=True)

trained_model, loss = train(
    model=model,
    prior=prior,
    criterion=criterion,
    epochs=80,
    device=device,
    epoch_callback=epoch_callback
)

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
nanotabpfn		nanotabpfn
LICENSE		LICENSE
README.md		README.md
pretrain_classification.py		pretrain_classification.py
pretrain_regression.py		pretrain_regression.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

nanoTabPFN

Our Code

Pretrain your own small nanoTabPFN

Step by Step Explanation (Classifier)

About

Uh oh!

Releases

Packages

Contributors 5

Languages

License

PriorLabs/nanoTabPFN

Folders and files

Latest commit

History

Repository files navigation

nanoTabPFN

Our Code

Pretrain your own small nanoTabPFN

Step by Step Explanation (Classifier)

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages