# Transform Function ### Parameters When specifying a transform function, all of its parameters are automatically populated with the appropriate dataframes of type `polars.LazyFrame`. ```python # input is of type LazyFrame def transform(input): ... ``` ### Return Value A transform function must return a value. The return type can be one of the following: * `polars.LazyFrame` * `polars.DataFrame` * `pandas.DataFrame` {% hint style="info" %} The recommended return type is `polars.LazyFrame`. A `LazyFrame` can be optimized by the query planner and can leverage the streaming engine to perform out-of-core computations. This results in significantly faster execution compared to the immediate mode of a `polars.DataFrame`, and is orders of magnitude faster than `pandas.DataFrame`. Additionally, DataSpace uses the query plan of the `LazyFrame` to deduce column lineage, something that’s only possible when returning a `LazyFrame`. {% endhint %} ### Metadata Input parameters also expose a special attribute called `ds_meta`. This attribute contains metadata about the `DataSnapshot` being used. The following fields are available through this attribute:

Attribute	Type	Description
transform_id	str	The Transform ID of the dataframe
artifact_dir	str	The directory path where the artifacts are stored
data_snapshot_id	str	The DataSnapshot ID of the dataframe
build_id	str	The Build ID of the dataframe
row_count	int	Number of rows
column_count	int	Number of columns
file_size	int	The size of the parquet file
creation_date	str	The date when the dataset was created
columns	list	The columns of the dataframe

This is useful when pulling artifacts from upstream transforms. In this case, you can specify the upstream transform and call the `artifact_dir`: ```python def transform(excel_ingest): artifact_path = excel_ingest.ds_meta.artifact_dir ``` ### Environment Variables DataSpace injects certain system environment variables to communicate with the runner on where to store certain files. | Name | Description | | -------------------------- | ---------------------------------------------------------------------------------------------------- | | TRANSFORM\_ID | The transformId of the current transform | | ARTIFACT\_FOLDER | The artifact folder of the current build. Should be used to persist artifacts after build | | META\_FOLDER | The meta folder of the current build. Will be populated with metadata about the dataset if generated | | DATASET\_FOLDER | The dataset folder of the current build. Will be populated with the generated parquet file | | PREVIOUS\_BUILD\_FOLDER | The current transforms previous build folder | | PREVIOUS\_DATASET\_PATH | The current transforms previous generated dataset file path | | PREVIOUS\_ARTIFACT\_FOLDER | The current transforms previous generated artifact folder | --- # Agent Instructions: Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter: ``` GET https://docs.dataspace.ch/api-reference/transform-function.md?ask= ``` The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.