从sharepoint读取excel到python时出现ValueError

我正在尝试从sharepoint读取excel文件到python。

Q1:该文件有两个url。如果我直接复制文件的链接，我得到:

https://company.sharepoint.com/:x:/s/project/letters-numbers?e=lettersnumbers

如果我从网页上一个接一个地点击文件夹，直到我点击并打开excel文件，URL现在是:

https://company.sharepoint.com/:x:/r/sites/project/_layouts/15/Doc.aspx?sourcedoc=letters-numbers&file=Table.xlsx&action=default&mobileredirect=true

我应该用哪一个?

Q2:我的代码如下:

import pandas as pd
from office365.runtime.auth.authentication_context import AuthenticationContext
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.files.file import File
URL = "https://company.sharepoint.com/:x:/s/project/letters-numbers?e=lettersnumbers"
USERNAME = "abc@a.com"
PASSWORD = "abcd"
ctx_auth = AuthenticationContext(URL)
if ctx_auth.acquire_token_for_user(USERNAME, PASSWORD):
ctx = ClientContext(URL, ctx_auth)
web = ctx.web
ctx.load(web)
ctx.execute_query()
print("Authentication successful")
else:
print(ctx_auth.get_last_error())
response = File.open_binary(ctx, URL)
bytes_file_obj = io.BytesIO()
bytes_file_obj.write(response.content)
bytes_file_obj.seek(0)
df = pd.read_excel(bytes_file_obj, sheet_name="Sheet2")

它工作直到pd.read_excel()，在那里我得到ValueError。

ValueError: Excel file format cannot be determined, you must specify an engine manually.

我不知道哪里出了问题，如果加载会有进一步的问题。这将是高度赞赏,如果有人能提醒我的问题或离开一个例子。

如果您查看一下' read_excel ' (https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html)的pandas文档，您将看到有一个' engine '参数。

尝试不同的选项，看看哪一个工作，因为你的错误是说一个引擎必须手动指定。

如果这是正确的，在将来，从字面上理解错误消息并检查文档

我尝试了不同的url(以及如何获得它们)，并收到了不同的二进制文件。它们要么是一行代码状态(如403)，要么是警告，要么是看起来像头的东西。所以我认为问题出在URL的格式上。

(github.com/vgrem)我找到了答案。

对于ClientContext你需要一个绝对URL，

URL = "https://company.sharepoint.com/:x:/r/sites/project"

对于File，您需要一个相对路径，但与URL重叠:

RELATIVE_PATH = "/sites/project/Shared%20Documents/Folder/Table.xlsx"

RELATIVE_PATH可以这样找到:

进入Teams(或网页)中文件所在的文件夹
选择Open in app(Excel).
在Excel中，File->Property，复制路径和适应以上格式.
用"%20"代替Space

ctx_auth = AuthenticationContext(URL)
if ctx_auth.acquire_token_for_user(USERNAME, PASSWORD):
ctx = ClientContext(URL, ctx_auth)
web = ctx.web
ctx.load(web)
ctx.execute_query()
print("Authentication successful")
else:
print(ctx_auth.get_last_error())
response = File.open_binary(ctx, RELATIVE_PATH)
bytes_file_obj = io.BytesIO()
bytes_file_obj.write(response.content)
bytes_file_obj.seek(0)
df = pd.read_excel(bytes_file_obj, sheet_name='Sheet2')

如果没有指定sheet_name，并且原来的.xlsx有多个表，则pd.read_excel()将生成警告，而这里的df实际上是dict。

相关内容

最新更新

热门标签：