cloudOneDrive

Single File Extraction

To ingest a CSV or Excel file from OneDrive, you first need to generate a shareable link. Open the file in OneDrive, click Share → Copy link, and make sure the link is set to "Anyone with the link can view".

The generated share link will look something like this: https://onedrive.live.com/:x:/g/personal/user_onedrive_live_com/EXXXXXXXXXXXXXXXXXX?e=XXXXXX

This link opens the OneDrive web interface. To get the raw file, append &download=1 to the URL:

https://onedrive.live.com/:x:/g/personal/user_onedrive_live_com/EXXXXXXXXXXXXXXXXXX?e=XXXXXX&download=1

Following is the full code:

import polars as pl
import requests
from io import BytesIO

url = f"{os.environ['ONEDRIVE_FILE_URL']}&download=1"

def transform():
    response = requests.get(
        f"{ONEDRIVE_FILE_URL}&download=1",
        headers={"User-Agent": "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; DataSpace-User/1.0;"},
        allow_redirects=True
    )
    response.raise_for_status()

    df = pl.read_csv(BytesIO(response.content))
    return df

For Excel files, replace pl.read_csv with pl.read_excel.

circle-info

It is advised not to hard-code the file URL into the script but rather use the secrets storearrow-up-right to inject it during the build.


Multiple File Extraction (via Microsoft Graph API)

For private files or bulk extraction from a OneDrive folder, use the Microsoft Graph API with an Azure App Registration. This allows programmatic, authenticated access to OneDrive without manual sharing.

1

Create an Azure App Registration

  • In the search bar, find and open Azure Active Directory → App registrations.

  • Click New registration.

  • Give it a name like DataSpace OneDrive Access.

  • Under Supported account types, select "Accounts in this organizational directory only" (or "any directory" if accessing a personal OneDrive).

  • Leave the Redirect URI blank and click Register.

2

Grant API Permissions

  • Open your new app registration.

  • Go to API permissions → Add a permission → Microsoft Graph.

  • Choose Application permissions (for server-to-server access without a logged-in user).

  • Add the following permission:

    • Files.Read.All

  • Click Add permissions, then click Grant admin consent and confirm.

3

Create a Client Secret

  • In the app registration, go to Certificates & secrets.

  • Click New client secret.

  • Give it a description and choose an expiry period.

  • Click Add, then immediately copy the Value — it will not be shown again.

4

Collect Your Credentials

You will need three values from the app registration overview page:

  • Tenant ID — shown on the app overview page

  • Client ID — shown on the app overview page as "Application (client) ID"

  • Client Secret — the value you copied in the previous step

Store all three in the DataSpace secrets storearrow-up-right.

5

Find Your OneDrive Folder ID

  • Open OneDrive in a browser and navigate to the folder you want to extract from.

  • Copy the URL. The folder ID is the value of the id query parameter, or the path component after /root:/.

Alternatively, you can list your root folder's children programmatically in the next step to discover folder and item IDs.

6

Prepare Your DataSpace Workspace

Declare the dependency in _config.json:

7

Write the Transformation

The following example authenticates via the Graph API, lists all files in a given folder, downloads them, and saves them to the artifacts folder.

All downloaded files are automatically stored in the artifacts folder, so they persist across runs and are available for further processing downstream.

Summary

You've now successfully configured your DataSpace workspace to:

  • Authenticate securely via an Azure App Registration and client secret

  • Access a OneDrive folder using the Microsoft Graph API

  • Automatically download files into the artifacts folder

Last updated