# OneDrive

## Single File Extraction

To ingest a CSV or Excel file from OneDrive, you first need to generate a shareable link. Open the file in OneDrive, click **Share → Copy link**, and make sure the link is set to **"Anyone with the link can view"**.

The generated share link will look something like this:\
`https://onedrive.live.com/:x:/g/personal/user_onedrive_live_com/EXXXXXXXXXXXXXXXXXX?e=XXXXXX`

This link opens the OneDrive web interface. To get the raw file, append `&download=1` to the URL:

`https://onedrive.live.com/:x:/g/personal/user_onedrive_live_com/EXXXXXXXXXXXXXXXXXX?e=XXXXXX&download=1`

Following is the full code:

```python
import polars as pl
import requests
from io import BytesIO

url = f"{os.environ['ONEDRIVE_FILE_URL']}&download=1"

def transform():
    response = requests.get(
        f"{ONEDRIVE_FILE_URL}&download=1",
        headers={"User-Agent": "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; DataSpace-User/1.0;"},
        allow_redirects=True
    )
    response.raise_for_status()

    df = pl.read_csv(BytesIO(response.content))
    return df
```

For Excel files, replace `pl.read_csv` with `pl.read_excel`.

{% hint style="info" %}
It is advised not to hard-code the file URL into the script but rather use the [secrets store](https://docs.dataspace.ch/platform/secrets-store) to inject it during the build.
{% endhint %}

***

## Multiple File Extraction (via Microsoft Graph API)

For private files or bulk extraction from a OneDrive folder, use the **Microsoft Graph API** with an Azure App Registration. This allows programmatic, authenticated access to OneDrive without manual sharing.

{% stepper %}
{% step %}

#### Create an Azure App Registration

* Go to <https://portal.azure.com/>.
* In the search bar, find and open **Azure Active Directory → App registrations**.
* Click **New registration**.
* Give it a name like `DataSpace OneDrive Access`.
* Under **Supported account types**, select **"Accounts in this organizational directory only"** (or "any directory" if accessing a personal OneDrive).
* Leave the Redirect URI blank and click **Register**.
  {% endstep %}

{% step %}

#### Grant API Permissions

* Open your new app registration.
* Go to **API permissions → Add a permission → Microsoft Graph**.
* Choose **Application permissions** (for server-to-server access without a logged-in user).
* Add the following permission:
  * `Files.Read.All`
* Click **Add permissions**, then click **Grant admin consent** and confirm.
  {% endstep %}

{% step %}

#### Create a Client Secret

* In the app registration, go to **Certificates & secrets**.
* Click **New client secret**.
* Give it a description and choose an expiry period.
* Click **Add**, then immediately copy the **Value** — it will not be shown again.
  {% endstep %}

{% step %}

#### Collect Your Credentials

You will need three values from the app registration overview page:

* **Tenant ID** — shown on the app overview page
* **Client ID** — shown on the app overview page as "Application (client) ID"
* **Client Secret** — the value you copied in the previous step

Store all three in the DataSpace [secrets store](https://docs.dataspace.ch/platform/secrets-store).
{% endstep %}

{% step %}

#### Find Your OneDrive Folder ID

* Open OneDrive in a browser and navigate to the folder you want to extract from.
* Copy the URL. The folder ID is the value of the `id` query parameter, or the path component after `/root:/`.

Alternatively, you can list your root folder's children programmatically in the next step to discover folder and item IDs.
{% endstep %}

{% step %}

#### Prepare Your DataSpace Workspace

Declare the dependency in `_config.json`:

{% code title=":config.json" %}

```json
{
  "packages": [
    "msal"
  ]
}
```

{% endcode %}
{% endstep %}

{% step %}

#### Write the Transformation

The following example authenticates via the Graph API, lists all files in a given folder, downloads them, and saves them to the artifacts folder.

```python
import polars as pl
import msal
import requests
import os

TENANT_ID     = os.environ["AZURE_TENANT_ID"]
CLIENT_ID     = os.environ["AZURE_CLIENT_ID"]
CLIENT_SECRET = os.environ["AZURE_CLIENT_SECRET"]

# The user principal name (email) of the OneDrive owner
ONEDRIVE_USER = os.environ["ONEDRIVE_USER"]

# The path to the folder inside OneDrive, e.g. "Documents/Reports"
FOLDER_PATH   = "Documents/Reports"

GRAPH_BASE    = "https://graph.microsoft.com/v1.0"
SCOPE         = ["https://graph.microsoft.com/.default"]

def get_access_token():
    app = msal.ConfidentialClientApplication(
        client_id=CLIENT_ID,
        client_credential=CLIENT_SECRET,
        authority=f"https://login.microsoftonline.com/{TENANT_ID}",
    )
    result = app.acquire_token_for_client(scopes=SCOPE)
    if "access_token" not in result:
        raise RuntimeError(f"Could not obtain token: {result.get('error_description')}")
    return result["access_token"]

def transform():
    token = get_access_token()
    headers = {"Authorization": f"Bearer {token}"}

    # List files in the target folder
    url = f"{GRAPH_BASE}/users/{ONEDRIVE_USER}/drive/root:/{FOLDER_PATH}:/children"
    response = requests.get(url, headers=headers)
    response.raise_for_status()
    items = response.json().get("value", [])

    # Download each file into the artifacts folder
    for item in items:
        if item.get("file"):  # skip subfolders
            file_name = item["name"]
            download_url = item["@microsoft.graph.downloadUrl"]
            print(f"Downloading {file_name}...")
            file_response = requests.get(download_url)
            file_response.raise_for_status()
            dest_path = os.path.join(os.environ["ARTIFACT_FOLDER"], file_name)
            with open(dest_path, "wb") as f:
                f.write(file_response.content)
            print(f"  ✅ Saved to {dest_path}")

    print("✅ All files downloaded successfully")

    # Return an empty DataFrame (files are available in the artifacts folder)
    return pl.DataFrame()
```

All downloaded files are automatically stored in the **artifacts folder**, so they persist across runs and are available for further processing downstream.
{% endstep %}
{% endstepper %}

### Summary

You've now successfully configured your DataSpace workspace to:

* Authenticate securely via an Azure App Registration and client secret
* Access a OneDrive folder using the Microsoft Graph API
* Automatically download files into the artifacts folder
