Transformation
A node in the Workspace lineage graph represents a Transformation, which is essentially a Python function that accepts input data frames and outputs one data frame.
While the main purpose of a Transformation is to transform data, the system is flexible enough to also serve as Ingests, Load, Artifact Storage, Plots, and Dashboards.
On a technical level, a Transform is a Docker container running user code and generating the following resources under the path:
/data/{TRANSFORM_ID}/
code/Contains the full git repository.
datasets/Gets populated automatically by turning the user-returned LazyFrame into a Parquet file.
logs/Stores the
log.txtfile containing logging data from executing the user code.
meta/Contains
columns.jsonfile describing the column-level relationship to other Transforms.
artifacts/Can be used to store intermediate resources or HTML files.
Last updated