Python-如何阅读Sharepoint excel工作表特定的工作表



在Python中,我使用Office 365 REST Python客户端库来访问和读取包含许多工作表的excel工作簿。

当身份验证成功时,我无法将工作表名称的正确路径附加到文件名以按名称访问第一个或第二个工作表,这就是为什么工作表的输出不是JSON,而是我的代码无法处理的IO字节。

我的最终目标是简单地通过名称"employee_list"访问特定的工作表,并将其转换为JSON或Pandas数据框架以供进一步使用。

下面的代码片段-

import io
import json
import pandas as pd
from office365.runtime.auth.authentication_context import AuthenticationContext
from office365.runtime.auth.user_credential import UserCredential
from office365.runtime.http.request_options import RequestOptions
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.files.file import File
from io import BytesIO

username = 'abc@a.com'
password = 'abcd'
site_url = 'https://sample.sharepoint.com/sites/SAMPLE/_layouts/15/Doc.aspx?OR=teams&action=edit&sourcedoc={739271873}'      
# HOW TO ACCESS WORKSHEET BY ITS NAME IN ABOVE LINE
ctx = ClientContext(site_url).with_credentials(UserCredential(username, password))
request = RequestOptions("{0}/_api/web/".format(site_url))
response = ctx.execute_request_direct(request)
json_data = json.loads(response.content) # ERROR ENCOUNTERED JSON DECODE ERROR SINCE DATA IS IN BYTES

您可以通过工作表索引访问它,检查以下代码。。。。

import xlrd

loc = ("File location") 
wb = xlrd.open_workbook(loc) 
sheet = wb.sheet_by_index(0) 
# For row 0 and column 0 
print(sheet.cell_value(1, 0))

您可以尝试像这样将组件"sheetname"添加到url中。

https://site/lib/workbook.xlsx#'Sheet1'!A1

似乎为访问数据而构建的URL不正确。您应该在浏览器中测试完整的URL,然后修改代码以开始工作。您可以尝试进行一些更改,我已经验证了使用此逻辑形成的URL将返回JSON数据。

import io
import json
import pandas as pd
from office365.runtime.auth.authentication_context import AuthenticationContext
from office365.runtime.auth.user_credential import UserCredential
from office365.runtime.http.request_options import RequestOptions
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.files.file import File
from io import BytesIO

username = 'abc@a.com'
password = 'abcd'
site_url = 'https://sample.sharepoint.com/_vti_bin/ExcelRest.aspx/RootFolder/ExcelFileName.xlsx/Model/Ranges('employee_list!A1%7CA10')?$format=json'      
# Replace RootFolder/ExcelFileName.xlsx with actual path of excel file from the root.
# Replace A1 and A10 with actual start and end of cell range.
ctx = ClientContext(site_url).with_credentials(UserCredential(username, password))
request = RequestOptions(site_url)
response = ctx.execute_request_direct(request)
json_data = json.loads(response.content) 

来源:https://learn.microsoft.com/en-us/sharepoint/dev/general-development/sample-uri-for-excel-services-rest-api

我使用的更新(Office365-REST-Python-Client==2.3.11(允许更简单地访问SharePoint存储库中的Excel文件。

# from original_question import pd,
#                               username,
#                               password,
#                               UserCredential,
#                               File,
#                               BytesIO
user_credentials = UserCredential(user_name=username, 
password=password)
file_url = ('https://sample.sharepoint.com'
'/sites/SAMPLE/{*recursive_folders}'
'/sample_worksheet.xlsx') 
## absolute path of excel file on SharePoint
excel_file = BytesIO() 
## initiating binary object
excel_file_online = File.from_url(abs_url=file_url)
## requesting file from SharePoint
excel_file_online = excel_file_online.with_credentials(
credentials=user_credentials)
## validating file with accessible credentials
excel_file_online.download(file_object=excel_file).execute_query()
## writing binary response of the 
## file request into bytes object

我们现在有一个Excel文件的二进制副本BytesIO,名为excel_file。继续,将其读取为pd.DataFrame是直接的,就像存储在本地驱动器中的常见Excel文件一样。例如:

pd.read_excel(excel_file) # -> pd.DataFrame

因此,如果你对'employee_list'这样的特定纸张感兴趣,你最好把它读成

employee_list = pd.read_excel(excel_file,
sheet_name='employee_list')
# -> pd.DataFrame

data = pd.read_excel(excel_file,
sheet_name=None) # -> dict
employee_list = data.get('employee_list') 
# -> [pd.DataFrame, None]

我知道你说过你不能使用BytesIO对象,但对于那些像我所寻找的那样以BytesIO物体的形式读取文件的人来说,你可以在pd.read_excel:中使用sheet_namearg

url = "https://sharepoint.site.com/sites/MySite/MySheet.xlsx"
sheet_name = 'Sheet X'
response = File.open_binary(ctx, relative_url)
bytes_file_obj = io.BytesIO()
bytes_file_obj.write(response.content)
bytes_file_obj.seek(0)
df = pd.read_excel(bytes_file_obj, sheet_name = sheet_name)  //call sheet name

最新更新