Documentation is undergoing a significant revamp - the new documentation will be finalized as part of the v0.3 release in the late Spring or early Summer of 2025.
dft
is a batteries-included suite of DataFusion applications that provides:
- Data Source Integration: Query files from S3, local filesystems, or HuggingFace datasets
- Table Format Support: Native support for Delta Lake
- Extensibility: UDFs defined in WASM (and soon Python)
- Helper Functions: Built-in functions for JSON and Parquet data processing
The project offers four complementary interfaces:
- Text User Interface (TUI): An interactive SQL IDE with real-time query analysis, benchmarking, and catalog exploration
- Command Line Interface (CLI): A scriptable engine for executing queries from files or command line
- FlightSQL Server: A standards-compliant SQL interface for programmatic access
- HTTP Server: A REST API for SQL queries and catalog exploration
All interfaces share the same execution engine, allowing you to develop locally with the TUI and then seamlessly deploy with the server implementations.
dft
builds upon datafusion-cli
with enhanced interactivity, additional integrations, and ready-to-use server implementations.
# If you have Rust installed
cargo install datafusion-dft
# For full functionality with all features
cargo install datafusion-dft --all-features
If you don't have Rust installed, follow the installation instructions.
Common feature combinations:
# Core with S3 support
cargo install datafusion-dft --features=s3
# Data lake formats
cargo install datafusion-dft --features=deltalake
# With JSON and Parquet functions
cargo install datafusion-dft --features=function-json,functions-parquet
See the Features documentation for all available features.
# Interactive TUI (default)
dft
# CLI with direct query execution
dft -c "SELECT 1 + 2"
# CLI with file-based query
dft -f query.sql
# Benchmark a query (with stats)
dft -c "SELECT * FROM my_table" --bench
# Start FlightSQL Server (requires `flightsql` feature)
dft serve-flightsql
# Start HTTP Server (requires `http` feature)
dft serve-http
dft
can automatically load table definitions at startup, giving you a persistent "database-like" experience.
- Create a DDL file (default:
~/.config/dft/ddl.sql
) - Add your table and view definitions:
-- S3 data source (requires s3 feature)
CREATE EXTERNAL TABLE users
STORED AS NDJSON
LOCATION 's3://bucket/users';
-- Parquet files
CREATE EXTERNAL TABLE transactions
STORED AS PARQUET
LOCATION 's3://bucket/transactions';
-- Local files
CREATE EXTERNAL TABLE listings
STORED AS PARQUET
LOCATION 'file://folder/listings';
-- Create views from tables
CREATE VIEW users_listings AS
SELECT * FROM users
LEFT JOIN listings USING (user_id);
-- Delta Lake table (requires deltalake feature)
CREATE EXTERNAL TABLE delta_table
STORED AS DELTATABLE
LOCATION 's3://bucket/delta_table';
- TUI: DDL is automatically loaded at startup
- CLI: Add
--run-ddl
flag to execute DDL before your query - Custom Path: Configure a custom DDL path in your config file
[execution] ddl_path = "/path/to/my/ddl.sql"
Feature | Documentation |
---|---|
Core Features | Features Guide |
TUI Interface | TUI Guide |
CLI Usage | CLI Guide |
FlightSQL Server | FlightSQL Guide |
HTTP Server | HTTP Guide |
Configuration Options | Config Reference |