generated from exaexa/better-mff-thesis
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathchEepilog.tex
12 lines (7 loc) · 2.14 KB
/
chEepilog.tex
1
2
3
4
5
6
7
8
9
10
11
12
\chapter*{Conclusion}
\addcontentsline{toc}{chapter}{Conclusion}
In this thesis, we embarked on the development and subsequent performance analysis of a Python implementation of the Ataccama Expression Language. The objective was to create a programmatic bridge allowing data engineers and analysts to utilize Ataccama's data quality rules within Python.
Analyzing the problem space revealed the need for a flexible and user-friendly solution that could be easily integrated into existing Python-based data processing pipelines while keeping compatibility with the existing Ataccama rule set. This led to the decision to implement the Ataccama Expression Language in Python. For the implementation to be relevant, a performance goal was set to not fall below an acceptable threshold, which was set relative to the execution time of similar solutions.
We designed a solution with a simple API that uses Python's code generation capabilities to translate Ataccama rules into executable Python code. To mantain compatibility with the existing Ataccama rule set, we set an implementation scope and decided to include a wide test suite to ensure the correctness of the translation.
The subsequent phase of the project involved a performance analysis to determine whether the performance goal and as such the practicality of the Python implementation in operational environments was met. This analysis was centered on execution time comparisons with established data quality platforms of Soda Core and Great Expectations, across various dataset sizes ranging from small to large scales. Despite slower execution times in certain scenarios, the results were acceptable. The Python implementation managed to perform within a tolerable slowdown range, typically less than ten times slower than the baseline. This confirmed its viability for scenarios where the ease of integration and the flexibility offered by Python are more critical than the highest possible performance.
This thorough exploration of both development and analysis not only confirms the feasibility of Ataccama's rules in Python but also opens up numerous possibilities for their application in complex data environments.