You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-**main**: Contains the parallel implementation and the core code for training the model.
6
-
-**scripts**: Contains all shell scripts to run experiments and benchmarks. This is where you can find the scripts to set up and execute various experiments.
7
-
-**results**: The output files of experiments are stored here, along with plotting scripts to visualize the results.
8
-
-**validation**: Contains baselines used for comparison and validation purposes.
9
-
-**performance**: Holds the code for performance modeling and benchmarking.
5
+
-**benchmarking**: Contains a serial implementation using PyTorch Geometric (PyG) for validation and testing. Additionally, it includes utilities for benchmarking Sparse Matrix-Matrix Multiplication (SpMM) operations, a key component in GNN computations.
6
+
-**examples**: Offers a practical demonstration of how to leverage Plexus to parallelize a GNN model. This directory includes example scripts for running the parallelized training, as well as utilities for parsing the resulting performance data.
7
+
-**performance**: Houses files dedicated to modeling the performance characteristics of parallel GNN training. This includes models for communication overhead, computation costs (specifically SpMM), and memory utilization.
8
+
-**plexus**: Contains the core logic of the Plexus framework. This includes the parallel implementation of a Graph Convolutional Network (GCN) layer, along with utility functions for dataset preprocessing, efficient data loading, and other essential components for distributed GNN training.
This directory contains files used for validating the parallel implementation and benchmarking key operations.
4
+
5
+
## Files
6
+
7
+
-**pyg_serial.py**: This Python script provides a serial implementation of a GNN model using PyTorch Geometric (PyG). It is primarily used for validation purposes, allowing for comparison against the parallelized version. The script is configured to train a model with 3 Graph Convolutional Network (GCN) layers and a hidden dimension size of 128 on the ogbn-products dataset by default.
8
+
9
+
The script offers several command-line arguments to customize the training process:
10
+
-`--download_path`: Specifies the path to the directory where the dataset is stored.
11
+
-`--num_epochs` (optional): Determines the number of training epochs (default is 2).
12
+
-`--seed` (optional): Allows setting a specific random seed for reproducible experiments.
13
+
- Other aspects like the number of GCN layers and the hidden dimension size can be modified by adjusting the model definition within the script or by altering the dataset loading within the `get_dataset` function.
- **spmm.py**: This script is designed to test the performance of Sparse Matrix-Matrix Multiplication (SpMM), a fundamental operation in GNN computations. It provides flexibility in configuring the SpMM operation to analyze performance under various conditions.
21
+
22
+
It accepts the following command-line arguments:
23
+
- `--pt_file`: Specifies the path to a `.pt` file. This file is expected to be the output of preprocessing a dataset using Plexus, containing a tuple `(data, num_classes)` where `data` is a processed PyG `Data` object. The dimensions of the sparse matrix and the dense feature matrix used in the SpMM benchmark are derived from this data.
24
+
- `--shard_row` (optional): Optionally specifies how to shard the row dimension (M) of the sparse matrix (A, sized M x K). This allows for investigating the impact of different row sharding strategies on SpMM performance (default is 1).
25
+
- `--shard_col` (optional): Optionally specifies how to shard the column dimension (K) of the sparse matrix (A, sized M x K), which corresponds to the row dimension of the dense features matrix (F, sized K x N). This allows for investigating the impact of different sharding strategies along the shared dimension on SpMM performance (default is 1).
26
+
- `--shard_col_x` (optional): Optionally specifies how to shard the column dimension (N) of the dense feature matrix (F, sized K x N). This allows for investigating the impact of different column sharding strategies on SpMM performance (default is 1).
27
+
- `--iterations` (optional): Sets the total number of SpMM iterations to run for the benchmark (default is 25).
28
+
- `--warmup` (optional): Specifies the number of initial iterations to perform as a warmup. The timing results of these warmup iterations will be ignored to get more stable performance measurements (default is 5)).
29
+
- Note that for the arguments related to sharding the matrices, the matrices are padded so their sizes are divisible by these arguments.
0 commit comments