Follow-up to Components announcement: Pythonic Components #28873
Replies: 2 comments 3 replies
-
👏 👏 👏 👏 👏 👏 👏 👏 👏 🚀 🎸 |
Beta Was this translation helpful? Give feedback.
-
We use a lot of factory patterns to support a lot of environment-specific logic (as we have a number of environments and a lot of assets and jobs should only be defined in a subset of them). I'd be extremely interested in how this use case might intersect with this delayed For example, if I have an environment variable with the environment name ( |
Beta Was this translation helpful? Give feedback.
-
Pythonic Components
Context
Two weeks ago we announced Components and we heavily emphasized the YAML frontend piece of the system. While for a lot of users the YAML frontend is critical for non-technical stakeholders, a lot of you might have thought: I want nothing to do with YAML. I want to program in Python.
Components are not just a YAML interface. They are a new, first-class way you structure and manage the programmatic creation of all definitions (e.g. assets, asset checks, sensors, etc) in a project.
In fact, anyone placing a python file in the
defs
folder in thedg
project layout will be utilizing the components infrastructure under the hood, whether they know it or not, because Components drive the autoloading process.In this post I’ll discuss how to use Pythonic components and their value proposition.
How do you use a Pythonic Component?
The
scaffold
command now has a--format
option, which includespython
. If you run this command, acomponent.py
is produced, rather thancomponent.yaml
.For example
Produces a
component.py
At which point you can implement the body of
load
and create aDbtProjectComponent
directly, rather than use the YAML interface.What is the value without the YAML interface?
A reasonable question to ask is “Why use this component thing at all if I’m not going to use YAML?”
Well it’s actually a more core abstraction than that going forward, will be underlying way that we organize Dagster projects and structure the process of loading their definitions.
Eliminate the import circus and improve modularity
If you have written a Dagster project of any scale, you’ll have a file at the root of your project like this:
And this is a simple project. In the complex project this file can be hundreds of lines of code. Worse, it is a central file, often edited by all but owned by no-one and becomes a complete mess.
With components, this top-level file is effectively eliminated entirely (we include a stub
definitions.py
file right now but will be able to completely eliminate it) and definition creation becomes modular. Your resources are defined next to the assets (and other definitions) that use them, and everything is autoloaded:So instead everything happening one file, you can define the resources at the same level where your definitions will use them.
We will be adding more support to make this a little more ergonomic in the future, stay tuned.
Avoiding import-time side effects due to asset factories
Right now in guides like [these](https://docs.dagster.io/guides/build/assets/creating-asset-factories) we advocate the use of factory functions in cases where you want to programmatically create definitions from a higher-level API, a DSL, or sourced from external state (like a database or a file). Unfortunately with the way that Dagster worked this meant these computations had to be done at import time which is dangerous and undesirable. We worked around this with some convoluted abstractions but it was suboptimal. There were a number of problems with this. For example, A transient error could completely eliminate the ability to load the python process entirely. Another problem is that it was easy to invoke heavyweight operations during unit tests and other operations.
Factory patterns expressed at components mean that all calls to
build_defs
can be structured to happen after the python process starts. In the short-term this is very useful for unit tests scenarios. Longer term it opens up space for much more granular control of definitions loading, allowing for faster, more targeted reloading of definitions.Definitions organization
Components are also a powerful generalized tool for organizing definitions and applying metadata transformations in a coherent way. It is quite common to want to mass apply tags or group names to arbitrary sets of definitions based on business logic. Components and autoloading have built-in features to enable you to do that at any arbitrary level of your folder hierarchy, applying post processing to any collection of definitions.
For example if you had a large, multi-team project laid out like this:
And if one team wanted to standardize on some tag, they could do so by dropping a
component.yaml
at thedefs/team_two/component.yaml
. We have a pattern of “post processing” that any component can implement, including the “folder” component.Future work
We are building or planning a number of features that leverage this new structure
dg list defs
could be much faster if targeted to a specific submodule.Feel free to ask further questions in the discussion thread. Thanks!
Beta Was this translation helpful? Give feedback.
All reactions