Google Drive
Single File Extraction
To ingest a CSV file from Google Drive, you first have to enable sharing on the file by following the instructions on the Google Drive Help Page.
The generated share link will look something like this:
https://drive.google.com/file/d/1Se7_LKZykBWweXpBths1oCmgGTGK4yyD/view?usp=sharing
This link is meant to open the Google Drive web interface. However, since we want the file itself, we have to modify the link. The file ID needs to be extracted from the original URL and combined with the direct file access link:
https://drive.google.com/uc?id=1Se7_LKZykBWweXpBths1oCmgGTGK4yyD
Following is the full code
import polars as pl
import os
url = 'https://drive.google.com/uc?id={os.environ['CSV_FILE_ID']}'
def transform():
df = pl.read_csv(url)
return dfMultiple File Extraction
If you have a folder with multiple files you would like to extract, it is not feasible to share every single file manually. In this case, we can leverage Google's API to programmatically access the shared drive, index the files, and download all.
Share Your Google Drive Folder with the Service Account
Go to Google Drive.
Right-click your folder and choose Share.
Copy the client email from the service account JSON file (it looks like
[email protected]).Add that email as a Viewer.
Copy the folder ID from the URL — it’s the long string between
/folders/and the next/.
Example:
Summary
You’ve now successfully configured your DataSpace workspace to:
Authenticate securely via a Google service account
Access a shared Google Drive folder
Automatically download Excel files into the artifacts folder