Skip to content

Files

Latest commit

 

History

History
39 lines (31 loc) · 791 Bytes

README.md

File metadata and controls

39 lines (31 loc) · 791 Bytes

Pyper

An (experimental) ETL toolkit based on DuckDB and PRQL.

Usage

Pyper relies on YAML to describe workflows. We use pydantic to model how a workflow file should look. Here's an example that showcases a simple workflow:

# myworkflow.yaml
extract:
  provider: local
  uri: file:///mnt/ssd/projects/pyper/invoices.csv
  register: my_data_source

transform:
  lang: prql
  backend: duckdb
  query: |
    from my_data_source
    filter billing_country == "USA"
    group [customer_id] (
      aggregate [
        sum total,
        count,
      ]
    )

load:
  provider: local
  uri: file:///mnt/ssd/projects/pyper/invoices_usa.csv

Then, using Python:

import pyper
pyper.workflow('myworkflow.yaml').exec()