To ingest a CSV or Excel file from OneDrive, you first need to generate a shareable link. Open the file in OneDrive, click Share → Copy link, and make sure the link is set to "Anyone with the link can view".
The generated share link will look something like this:
https://onedrive.live.com/:x:/g/personal/user_onedrive_live_com/EXXXXXXXXXXXXXXXXXX?e=XXXXXX
This link opens the OneDrive web interface. To get the raw file, append &download=1 to the URL:
For Excel files, replace pl.read_csv with pl.read_excel.
It is advised not to hard-code the file URL into the script but rather use the secrets store to inject it during the build.
Multiple File Extraction (via Microsoft Graph API)
For private files or bulk extraction from a OneDrive folder, use the Microsoft Graph API with an Azure App Registration. This allows programmatic, authenticated access to OneDrive without manual sharing.
Open OneDrive in a browser and navigate to the folder you want to extract from.
Copy the URL. The folder ID is the value of the id query parameter, or the path component after /root:/.
Alternatively, you can list your root folder's children programmatically in the next step to discover folder and item IDs.
6
Prepare Your DataSpace Workspace
Declare the dependency in _config.json:
7
Write the Transformation
The following example authenticates via the Graph API, lists all files in a given folder, downloads them, and saves them to the artifacts folder.
All downloaded files are automatically stored in the artifacts folder, so they persist across runs and are available for further processing downstream.
Summary
You've now successfully configured your DataSpace workspace to:
Authenticate securely via an Azure App Registration and client secret
Access a OneDrive folder using the Microsoft Graph API
Automatically download files into the artifacts folder
import polars as pl
import msal
import requests
import os
TENANT_ID = os.environ["AZURE_TENANT_ID"]
CLIENT_ID = os.environ["AZURE_CLIENT_ID"]
CLIENT_SECRET = os.environ["AZURE_CLIENT_SECRET"]
# The user principal name (email) of the OneDrive owner
ONEDRIVE_USER = os.environ["ONEDRIVE_USER"]
# The path to the folder inside OneDrive, e.g. "Documents/Reports"
FOLDER_PATH = "Documents/Reports"
GRAPH_BASE = "https://graph.microsoft.com/v1.0"
SCOPE = ["https://graph.microsoft.com/.default"]
def get_access_token():
app = msal.ConfidentialClientApplication(
client_id=CLIENT_ID,
client_credential=CLIENT_SECRET,
authority=f"https://login.microsoftonline.com/{TENANT_ID}",
)
result = app.acquire_token_for_client(scopes=SCOPE)
if "access_token" not in result:
raise RuntimeError(f"Could not obtain token: {result.get('error_description')}")
return result["access_token"]
def transform():
token = get_access_token()
headers = {"Authorization": f"Bearer {token}"}
# List files in the target folder
url = f"{GRAPH_BASE}/users/{ONEDRIVE_USER}/drive/root:/{FOLDER_PATH}:/children"
response = requests.get(url, headers=headers)
response.raise_for_status()
items = response.json().get("value", [])
# Download each file into the artifacts folder
for item in items:
if item.get("file"): # skip subfolders
file_name = item["name"]
download_url = item["@microsoft.graph.downloadUrl"]
print(f"Downloading {file_name}...")
file_response = requests.get(download_url)
file_response.raise_for_status()
dest_path = os.path.join(os.environ["ARTIFACT_FOLDER"], file_name)
with open(dest_path, "wb") as f:
f.write(file_response.content)
print(f" ✅ Saved to {dest_path}")
print("✅ All files downloaded successfully")
# Return an empty DataFrame (files are available in the artifacts folder)
return pl.DataFrame()