Parallel tc to ttgt #100

pthomadakis · 2025-04-10T04:14:02Z

This PR introduces a parallel implementation of TC-to-TTGT pass and parallel implementation for OptDenseTransposePass

As an example, the performance of the following intensli.ta kernel:

def main() {
	#IndexLabel Declarations
	IndexLabel [a, b, d] = [1024];
	IndexLabel [c] = [512];

	#Tensor Declarations
	Tensor<double> v([d, c, a], {Dense});
	Tensor<double> t2([b, d], {Dense});
	Tensor<double> i0([a, b, c], {Dense});

	#Tensor Fill Operation
	v[d, c, a] = 2.3;
	t2[b, d] = 3.4;
	i0[a, b, c] = 0.0;


        var time0 = getTime();
	i0[a, b, c] = v[d, c, a] * t2[b, d];
        var time1 = getTime();
	printElapsedTime(time0, time1);
        var out = SUM(i0[a,b,c]);
        print(out);
}

Improves ~6x on my 8-core CPU (~24s to ~4s).
However, we still need to investigate why we are behind numpy and why cometpy is way behind

Currrently falling behind numpy.

… passed to the backend

pthomadakis added 3 commits April 9, 2025 13:38

Enabled and verified correctness for parallel TC-to-TTGT pass.

1279388

Currrently falling behind numpy.

Added parallel tranpose for OptDenseTranspose pass

0363b2f

[COMETPY] FIxed bug that would prevent opt-dense-transpose from being…

f25c77e

… passed to the backend

pthomadakis merged commit f922015 into dev-new Apr 10, 2025
2 of 3 checks passed

pthomadakis deleted the parallel-tc-to-ttgt branch April 16, 2025 17:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel tc to ttgt #100

Parallel tc to ttgt #100

pthomadakis commented Apr 10, 2025

Parallel tc to ttgt #100

Parallel tc to ttgt #100

Conversation

pthomadakis commented Apr 10, 2025