GitHub - JakeRoggenbuck/RedoxQL: 🦀 RedoxQL is an L-Store database written in Rust and Python 🚀

🦀 RedoxQL is an L-Store database written in Rust and Python 🚀

🥇 Fastest database speed in the class for milestone 2 (Winter 2025) ⚡

RedoxQL is our implementation of an L-Store database.

Important

Read the Structure section — We use both Rust and Python and they go in different places

Setup

Create a virtual environment

python3 -m venv venv

Source the virtual environment

source venv/bin/activate

Install maturin

pip install -r requirements.txt

Running

Build the Rust code

maturin build --release

Install the module (Note: the version will change so check the exact filename in target/wheels/)

pip install target/wheels/lstore* --force-reinstall

Run the database benchmark

python3 __main__.py

You should see this for milestone 1.

(venv) ecs-165a-database (main) λ p __main__.py
Inserting 10k records took:  			 0.0077650810000000035
Updating 10k records took:  			 0.020893269
Selecting 10k records took:  			 0.016048745000000003
Aggregate 10k of 100 record batch took:	 0.0039221569999999956
Deleting 10k records took:  			 0.002314741000000009
(venv) ecs-165a-database (main) λ

Attribution

Keanu - Secondary indexes, page.rs and all the page stuff, index.rs and all of the index stuff
Lucas & Andrew - update
Lucas - Merge, select_version, sum_version, matching
Abdulrasol - Merge, BaseContainer, TailContainer, PageDirectory, insert into containers, RecordAddress and Record
Jake - Persistence, RTable, RDatabase, RQuery, new RIndex, python bindings, inserts and reads for all props

Structure

File Structure

.
├── benches
│   └── db_benchmark.rs
├── Cargo.lock
├── Cargo.toml
├── docs
│   └── old-system-diagrams.md
├── LICENSE
├── main_checking.py
├── __main__.py
├── Makefile
├── pyproject.toml
├── python
│   ├── benchmarks
│   │   ├── graph_scale.py
│   │   ├── scaling_tester.py
│   │   ├── simple_example.py
│   │   ├── simple_tester.py
│   │   └── speedtests.py
│   ├── lstore
│   │   ├── db.py
│   │   ├── __init__.py
│   │   ├── query.py
│   │   ├── transaction.py
│   │   └── transaction_worker.py
│   └── tests
│       ├── __init__.py
│       └── test_main.py
├── README.md
├── requirements.txt
└── src
    ├── container.rs
    ├── database.rs
    ├── index.rs
    ├── lib.rs
    ├── pagerange.rs
    ├── page.rs
    ├── query.rs
    ├── record.rs
    ├── system.rs
    └── table.rs

lstore

The lstore (./lstore) directory is where the Python code goes. This is what gets called by the tests in __main__.py. The lstore directory can be referred to as a Python module.

src

The src (./src) directory is where the Rust code goes. This gets called by the code in the lstore module.

system.rs - all the functions and structs related to getting information from the system machine database.rs - the database struct and the related functions

Demo Project

In the demo project, we made an API that uses our database to store grades. Here we are sending a PUT and then doing a GET.

Results

Milestone 1

After a simple fix after we submitted, we were able to get everything working.

Milestone 2

Milestone 3

exam_tester_m3_part1.py and exam_tester_m3_part2.py

m3_tester_part_1.py and m3_tester_part_2.py

Testing

Rust testing

cargo test

Here is what the correct output should look like. You should see multiple tests passing.

Unit Tests

Rust unit tests are located in each Rust file and can be found in ./src

Integration Tests

The integration tests are located at ./tests and are also run with cargo test

Python testing

pytest

Python tests are located in a separate directory called tests located in ./python

Rust Docs

Rust has a way of making docs from the source code. Run cargo doc and view the produced HTML page in your browser. Adding comments to your code starting with /// will be put into these docs.

Speed Analysis

Using flamegraph to benchmark

You may need to install flamegraph with cargo install flamegraph

cargo flamegraph --test update_tests

# Open the svg (It's nice to view in a browser)
firefox flamegraph.svg

Preview:

Running cargo benchmarks

This will take a long time but you can run benchmarks separately.

cargo bench

You can use cargo bench to see if your changes significantly affect performance.

Often small changes can happen randomly. Like this has no change in the code. Try to run the bench as the only thing running on the system.

Using maturin in release mode

Maturin is what builds our Python package. Maturin has a mode called --release that significantly improves the speed. This is because release mode runs Cargo in release mode, and this does all of the optimization tricks to speed up the executable as much as possible.

Data for both graphs can be found here

Scaling of Insert Operation

Using release build settings

With compiler options

(venv) redoxql (main) λ p __main__.py
Inserting 10k records took:                      0.006803145
Updating 10k records took:                       0.018702902999999996
Selecting 10k records took:                      0.016315803000000004
Aggregate 10k of 100 record batch took:  0.005981531999999998
Deleting 10k records took:                       0.002332115999999995
(venv) redoxql (main) λ time p m1_tester.py
Insert finished

real    0m0.117s
user    0m0.106s
sys     0m0.010s
user    0m0.106s
sys     0m0.010s
(venv) redoxql (main) λ time p exam_tester_m1.py
Insert finished

real    0m0.282s
user    0m0.272s
sys     0m0.010s
(venv) redoxql (main) λ

Without compiler options

(venv) redoxql (main) λ p __main__.py
Inserting 10k records took:                      0.007401888000000002
Updating 10k records took:                       0.018957811999999997
Selecting 10k records took:                      0.015054888999999995
Aggregate 10k of 100 record batch took:  0.003300163000000002
Deleting 10k records took:                       0.002181812999999991
(venv) redoxql (main) λ time p m1_tester.py
Insert finished

real    0m0.112s
user    0m0.108s
sys     0m0.004s
(venv) redoxql (main) λ time p exam_tester_m1.py
Insert finished

real    0m0.268s
user    0m0.254s
sys     0m0.014s

Running with debug or info logging

To use the logger, import the debug, error, or info macro from the log crate. Then you can add the macros to code like debug!("Start database!");. When you go to run the code, you can set the env var RUST_LOG=debug. Docs: https://docs.rs/env_logger/latest/env_logger/.

We tried Pypy

We started to try using Pypy which is a runtime for Python that is supposedly faster. Because of Amdahl's law, we actually can't get all that much performance out of it. We also found that the newest version of Pypy cannot use the newest version of Pyo3, so future work is needed to get them to run together.

Future questions:

How much time does the Python part take up?
How do we measure the improvement from Python to Pypy
How do we downgrade Pypy to work with Py03

We opted to go for C Python because it's much faster in this use case. We show the results of some tests here.

Profiling Python

We use py-spy to profile Python and we got expected results. It shows that the main use of Python is just the update function when testM2.py is run.

Here is how we ran it:

py-spy record -o profile.svg -- python3 testM2.py

profile

Moving everything possible to Rust

We used to make a ReturnRecord object for every single row! We also would turn the result of rquery into a list, wrap each in ReturnRecord and then make that back into a new list. ReturnRecord would also do list truncation each time it was initialized (for each row). We moved all of this logic into Rust can simply return what the Rust function returns. This change improved the performance by nearly 30%. The speed of testM2.py went from 1.30 seconds to 0.92 seconds. These results we consistent across 3 different runs each done in an interwoven fashion (with change, then without change, then with change again, etc.) for a total of 6 runs. The change can be seen in PR#162.

-   class ReturnRecord:
-       def __init__(self, columns: List[int]):
-           self.columns = columns[4:]
-
-       def __str__(self):
-           return f"Record({self.columns})"

-   return [
-       ReturnRecord(
-           list(
-               self.rquery.select_version(
-                   search_key,
-                   search_key_index,
-                   projected_columns_index,
-                   relative_version,
-               )
-           )
-       )
-   ]
+   return self.rquery.select_version(
+       search_key,
+       search_key_index,
+       projected_columns_index,
+       relative_version,
+   )

This includes insert and update.

Using FxHashMap instead of default HashMap

The Rust default HashMap is a great general purpose HashMap implementation and is very fast but FxHashMap is a decent bit faster. After changing the page directory to use FxHashMap in PR#186, the speed of many reads and writes improved by over 28% and the overall speed of all the benchmarks improved by over 26%.

use crate::container::{ReservedColumns, NUM_RESERVED_COLUMNS};
use crate::index::RIndexHandle;
use pyo3::prelude::*;
use rustc_hash::FxHashMap;
use serde::{Deserialize, Serialize};
use std::collections::HashMap;
use std::sync::{Arc, Mutex, RwLock};

type RedoxQLHasher<K, V> = FxHashMap<K, V>;

#[derive(Default, Clone, Serialize, Deserialize)]
pub struct PageDirectoryMetadata {
    pub directory: HashMap<i64, RecordMetadata>,
    pub directory: RedoxQLHasher<i64, RecordMetadata>,
}

#[derive(Default, Clone)]
pub struct PageDirectory {
    pub directory: HashMap<i64, Record>,
    pub directory: RedoxQLHasher<i64, Record>,
}
// -- snip --

Using Rust compilation flags

You can set specific build flags that improve the optimization.

[profile.release]
lto = "fat"
codegen-units = 1
opt-level = 3

lto = "fat" is the max setting for "Link time optimization", meaning that when it links the binary together, it does the maximum amount of optimizations.
codegen-units = 1 means that rustc will not split up the source code into different parts to compile separately on different threads. Because of this, it's able to apply optimization tricks like inlining between libraries and our project source code.
opt-level = 3 is just setting the optimization level. The reason you might want it lower is compilation speed.

Using snakeviz to profile

After reading bits from the O'Reilly High Performance Python, 3rd Edition, I found out about profiling with snakeviz.

pip install snakeviz

python3 -m cProfile -o profile.prof testM2.py

snakeviz profile.prof

Here is the output from testM2.py

Python dis library and bytecode

Python gets parsed into bytecode and we can see it using the dis library. This can be useful to see what byte might be unneeded.

Speed Improvement PRs

Improve hashing speed for page directory by over 26%
~70% speed improvement with lazy select (Not used because the API expects an iterable)
Move select return record logic all to Rust to achieve 30% speed improvement

Name		Name	Last commit message	Last commit date
Latest commit History 658 Commits
.github/workflows		.github/workflows
benches		benches
demo-project		demo-project
docs		docs
perf-data		perf-data
python		python
redoxdata		redoxdata
src		src
test-outputs		test-outputs
tests		tests
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
__main__.py		__main__.py
exam_tester_m1.py		exam_tester_m1.py
exam_tester_m2_part1.py		exam_tester_m2_part1.py
exam_tester_m2_part2.py		exam_tester_m2_part2.py
exam_tester_m3_part1.py		exam_tester_m3_part1.py
exam_tester_m3_part2.py		exam_tester_m3_part2.py
local_pr_tester.sh		local_pr_tester.sh
m1_tester.py		m1_tester.py
m2_tester_part1.py		m2_tester_part1.py
m2_tester_part2.py		m2_tester_part2.py
m3_tester_part_1.py		m3_tester_part_1.py
m3_tester_part_2.py		m3_tester_part_2.py
main_checking.py		main_checking.py
pyproject.toml		pyproject.toml
regress.md		regress.md
requirements.txt		requirements.txt
run_all_tests.sh		run_all_tests.sh
testAbort.py		testAbort.py
testM1.py		testM1.py
testM2.py		testM2.py

License

JakeRoggenbuck/RedoxQL

Folders and files

Latest commit

History

Repository files navigation

Setup

Running

Attribution

Structure

File Structure

lstore

src

Demo Project

Results

Milestone 1

Milestone 2

Milestone 3

Testing

Rust testing

Unit Tests

Integration Tests

Python testing

Rust Docs

Speed Analysis

Using flamegraph to benchmark

Running cargo benchmarks

Using maturin in release mode

Scaling of Insert Operation

Using release build settings

Running with debug or info logging

We tried Pypy

Profiling Python

Moving everything possible to Rust

Using FxHashMap instead of default HashMap

Using Rust compilation flags

Using snakeviz to profile

Python dis library and bytecode

Speed Improvement PRs

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors 6

Uh oh!

Languages

Packages