使用Python从共享谷歌驱动器文件夹中读取多个csv



我想创建一个函数,从共享的Google Drive文件夹中读取文件,并将它们连接到一个df中。如果可能的话,我更愿意不使用任何身份验证程序。

我使用了我在这里找到的这个代码:

url = 'https://drive.google.com/drive/folders/1_Hwl0tm6rfechhTF-YGVsOA0cbzu4jUd'
path = 'https://drive.google.com/uc?export=download&id='+url.split('/')[-2]
df = pd.read_csv(path)

我想使用glob读取文件夹中的所有文件,并将它们连接到一个df中但是我得到HTTPError:HTTP错误404:找不到,错误。任何帮助都将被视为

您不能直接下载文件夹,驱动器API中的文件夹被视为文件,具有MIME类型application/vnd.google-apps.folder的差异

正如驱动器API文档所说:

一个容器,可以用来组织驱动器上的其他类型的文件。文件夹是只包含元数据的文件,MIME类型为application/vnd.google-apps.folder

注意:存储在My Drive上的单个文件可以包含在多个文件夹中。存储在共享驱动器上的单个文件只能有一个父文件夹。

作为一种变通方法,您可以列出文件夹中包含的所有文件,然后逐一下载。为了构建以下示例,我已经在此基础上:

do.py
def list_and_download():
service = drive_service()
folder_id = FOLDER_ID
# List all files within the folder
results = service.files().list(q="'{}' in parents".format(folder_id), includeItemsFromAllDrives= true, supportsAllDrive=true).execute()
items = results.get("files", [])
print(items)
fh = io.BytesIO()
for item in items:
# download file one by one using MediaIoBaseDownload
if item["mimeType"] != "text/csv":
return
request = service.files().get_media(fileId=item["id"])
fh = io.BytesIO()
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
status, done = downloader.next_chunk()
print("Download {}%.".format(int(status.progress() * 100)))
print("Download Complete!")
with open(item["name"], "wb") as f:
f.write(fh.read())
# Do whatever you want with the csv
文档
  • MediaIOBaseDownload
  • 实施共享支持
文档
  • MediaIOBaseDownload

您应该使用Google-API列出共享文件夹中的文件。https://developers.google.com/drive/api/v2/reference/children/list

API列出文件的示例用法https://i.ibb.co/pyx8mKG/drive-list.png

之后,如果您从json文件中获得子列表,则可以读取并连接数据帧


import pandas as pd
response = {
"kind": "drive#childList",
"etag": ""9NuiSicPg_3yRScMQO3pipPxwvs"",
"selfLink": "https://www.googleapis.com/drive/v2/files/1IkO_nB83mUfKLopEtYsNT7RbMOIcWDAK/children",
"items": [
{
"kind": "drive#childReference",
"id": "1YtG84A9ZJNM7A3OgD3nOQk8V9bz_mYQ1",
"selfLink": "https://www.googleapis.com/drive/v2/files/1IkO_nB83mUfKLopEtYsNT7RbMOIcWDAK/children/1YtG84A9ZJNM7A3OgD3nOQk8V9bz_mYQ1",
"childLink": "https://www.googleapis.com/drive/v2/files/1YtG84A9ZJNM7A3OgD3nOQk8V9bz_mYQ1"
},
{
"kind": "drive#childReference",
"id": "14P3NAdGid-iJl1JSFDHLb8U3BRb0tOQO",
"selfLink": "https://www.googleapis.com/drive/v2/files/1IkO_nB83mUfKLopEtYsNT7RbMOIcWDAK/children/14P3NAdGid-iJl1JSFDHLb8U3BRb0tOQO",
"childLink": "https://www.googleapis.com/drive/v2/files/14P3NAdGid-iJl1JSFDHLb8U3BRb0tOQO"
}
]
}
item_arr = []
for item in response["items"]:
print(item["id"])
download_url = 'https://drive.google.com/uc?id=' + item["id"]
item_arr.append(pd.read_csv(download_url))
df = pd.concat(item_arr, axis=0)
print(df.head())

最新更新