使用Python从共享谷歌驱动器文件夹中读取多个csv

我想创建一个函数，从共享的Google Drive文件夹中读取文件，并将它们连接到一个df中。如果可能的话，我更愿意不使用任何身份验证程序。

我使用了我在这里找到的这个代码：

url = 'https://drive.google.com/drive/folders/1_Hwl0tm6rfechhTF-YGVsOA0cbzu4jUd'
path = 'https://drive.google.com/uc?export=download&id='+url.split('/')[-2]
df = pd.read_csv(path)

我想使用glob读取文件夹中的所有文件，并将它们连接到一个df中但是我得到HTTPError:HTTP错误404:找不到，错误。任何帮助都将被视为

您不能直接下载文件夹，驱动器API中的文件夹被视为文件，具有MIME类型application/vnd.google-apps.folder的差异

正如驱动器API文档所说：

一个容器，可以用来组织驱动器上的其他类型的文件。文件夹是只包含元数据的文件，MIME类型为application/vnd.google-apps.folder。

注意：存储在My Drive上的单个文件可以包含在多个文件夹中。存储在共享驱动器上的单个文件只能有一个父文件夹。

作为一种变通方法，您可以列出文件夹中包含的所有文件，然后逐一下载。为了构建以下示例，我已经在此基础上：

`do.py`

def list_and_download():
service = drive_service()
folder_id = FOLDER_ID
# List all files within the folder
results = service.files().list(q="'{}' in parents".format(folder_id), includeItemsFromAllDrives= true, supportsAllDrive=true).execute()
items = results.get("files", [])
print(items)
fh = io.BytesIO()
for item in items:
# download file one by one using MediaIoBaseDownload
if item["mimeType"] != "text/csv":
return
request = service.files().get_media(fileId=item["id"])
fh = io.BytesIO()
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
status, done = downloader.next_chunk()
print("Download {}%.".format(int(status.progress() * 100)))
print("Download Complete!")
with open(item["name"], "wb") as f:
f.write(fh.read())
# Do whatever you want with the csv

文档

MediaIOBaseDownload
实施共享支持

文档

MediaIOBaseDownload

您应该使用Google-API列出共享文件夹中的文件。https://developers.google.com/drive/api/v2/reference/children/list

API列出文件的示例用法https://i.ibb.co/pyx8mKG/drive-list.png

之后，如果您从json文件中获得子列表，则可以读取并连接数据帧


import pandas as pd
response = {
"kind": "drive#childList",
"etag": ""9NuiSicPg_3yRScMQO3pipPxwvs"",
"selfLink": "https://www.googleapis.com/drive/v2/files/1IkO_nB83mUfKLopEtYsNT7RbMOIcWDAK/children",
"items": [
{
"kind": "drive#childReference",
"id": "1YtG84A9ZJNM7A3OgD3nOQk8V9bz_mYQ1",
"selfLink": "https://www.googleapis.com/drive/v2/files/1IkO_nB83mUfKLopEtYsNT7RbMOIcWDAK/children/1YtG84A9ZJNM7A3OgD3nOQk8V9bz_mYQ1",
"childLink": "https://www.googleapis.com/drive/v2/files/1YtG84A9ZJNM7A3OgD3nOQk8V9bz_mYQ1"
},
{
"kind": "drive#childReference",
"id": "14P3NAdGid-iJl1JSFDHLb8U3BRb0tOQO",
"selfLink": "https://www.googleapis.com/drive/v2/files/1IkO_nB83mUfKLopEtYsNT7RbMOIcWDAK/children/14P3NAdGid-iJl1JSFDHLb8U3BRb0tOQO",
"childLink": "https://www.googleapis.com/drive/v2/files/14P3NAdGid-iJl1JSFDHLb8U3BRb0tOQO"
}
]
}
item_arr = []
for item in response["items"]:
print(item["id"])
download_url = 'https://drive.google.com/uc?id=' + item["id"]
item_arr.append(pd.read_csv(download_url))
df = pd.concat(item_arr, axis=0)
print(df.head())

`do.py`

文档

文档

相关内容

最新更新

热门标签：