# Google Sheets

## Single Sheet Extraction

To ingest data from a public Google Sheet, you first need to enable sharing on the file. Open the sheet in Google Sheets, click **Share → Change to anyone with the link**, and set the role to **Viewer**.

The share link will look like this:\
`https://docs.google.com/spreadsheets/d/1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgVE2upms/edit?usp=sharing`

Google Sheets supports a direct CSV export URL. Extract the **spreadsheet ID** from the link and use the following export format:

`https://docs.google.com/spreadsheets/d/1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgVE2upms/export?format=csv`

If your sheet has multiple tabs, you can target a specific one by appending the `gid` parameter (found in the URL when the tab is selected):

`https://docs.google.com/spreadsheets/d/1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgVE2upms/export?format=csv&gid=0`

Following is the full code:

```python
import polars as pl
import os

spreadsheet_id = os.environ['SPREADSHEET_ID']
url = f'https://docs.google.com/spreadsheets/d/{spreadsheet_id}/export?format=csv'

def transform():
    df = pl.read_csv(url)
    return df
```

{% hint style="info" %}
Typically, a spreadsheet URL will contain an access token. It is advised not to hard-code the ID into the string but rather use the [secrets store](https://docs.dataspace.ch/platform/secrets-store) to inject the ID during the build.
{% endhint %}

***

## Single Sheet Extraction (Private Sheets via Service Account)

If your Google Sheet is private, you will need to authenticate using a Google Service Account.

{% stepper %}
{% step %}

#### Create a Google Cloud Project

* Go to <https://console.cloud.google.com/>.
* Click the project selector (top left) and choose **"New Project."**
* Give it a descriptive name, for example `DataSpace Sheets Access`.
* Click **Create**.
  {% endstep %}

{% step %}

#### Enable the Google Sheets API

* In the left sidebar, go to **APIs & Services → Library**.
* Search for **Google Sheets API**.
* Click **Enable**.
* Optionally, also enable the **Google Drive API** if you need to look up spreadsheets by name.
  {% endstep %}

{% step %}

#### Create a Service Account

* Go to **APIs & Services → Credentials**.
* Click **Create Credentials → Service Account**.
* Enter a name like `dataspace-sheets-access`.
* Leave **Permissions** and **Principals with access** **empty** — no roles or users are needed.
* Click **Done**.
  {% endstep %}

{% step %}

#### Create a Key File

* In the service account list, click your new account.
* Open the **Keys** tab.
* Click **Add Key → Create New Key → JSON**.
* Save the downloaded `.json` file (for example, `mcp.json`).

You'll need to upload this file to your **DataSpace workspace**.
{% endstep %}

{% step %}

#### Share Your Google Sheet with the Service Account

* Open the Google Sheet.
* Click **Share**.
* Copy the **client email** from the service account JSON file (it looks like `dataspace-sheets-access@your-project-id.iam.gserviceaccount.com`).
* Add that email as a **Viewer**.
* Copy the **spreadsheet ID** from the URL — it's the long string between `/spreadsheets/d/` and the next `/`.

Example:

```
https://docs.google.com/spreadsheets/d/1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgVE2upms/edit
→ Spreadsheet ID: 1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgVE2upms
```

{% endstep %}

{% step %}

#### Prepare Your DataSpace Workspace

Declare the dependencies in `_config.json`:

{% code title=":config.json" %}

```json
{
  "packages": [
    "google-api-python-client",
    "google-auth"
  ]
}
```

{% endcode %}

Make sure your service account key (`mcp.json`) is uploaded to the workspace root.
{% endstep %}

{% step %}

#### Write the Transformation

```python
import polars as pl
from google.oauth2 import service_account
from googleapiclient.discovery import build
import io

# Spreadsheet ID and target sheet/tab name
SPREADSHEET_ID = "<REPLACE_WITH_SPREADSHEET_ID>"
SHEET_NAME = "Sheet1"

# Path to your service account credentials
SERVICE_ACCOUNT_FILE = "./mcp.json"

# Sheets API scope
SCOPES = ["https://www.googleapis.com/auth/spreadsheets.readonly"]

def transform():
    # Authenticate using the service account
    creds = service_account.Credentials.from_service_account_file(
        SERVICE_ACCOUNT_FILE, scopes=SCOPES
    )
    service = build("sheets", "v4", credentials=creds)

    # Read all values from the target sheet
    result = (
        service.spreadsheets()
        .values()
        .get(spreadsheetId=SPREADSHEET_ID, range=SHEET_NAME)
        .execute()
    )
    rows = result.get("values", [])

    # First row is treated as the header
    headers = rows[0]
    data = rows[1:]

    df = pl.DataFrame(data, schema=headers, orient="row")
    return df
```

{% hint style="info" %}
All cell values returned by the Sheets API are strings by default. Use Polars [cast()](https://docs.pola.rs/api/python/stable/reference/series/api/polars.Series.cast.html) to convert columns to the appropriate data types after loading.
{% endhint %}
{% endstep %}
{% endstepper %}

### Summary

You've now successfully configured your DataSpace workspace to:

* Authenticate securely via a Google service account
* Access a private Google Sheet using the Sheets API
* Load sheet data directly into a Polars DataFrame
