我正试图从dropbox链接解析一个表(https://www.dropbox.com/s/i77mern7joxc9ur/TestResultCodelistVoC.xlsx)。这是一个.xlsx
表,到目前为止我已经尝试了两种方法
方法1
codeID_url = 'https://www.dropbox.com/s/i77mern7joxc9ur/TestResultCodelistVoC.xlsx'
tables = pd.read_html(codeID_url)
df_codeID = tables[0]
给出
ValueError: No tables found
这是有道理的,因为最后,我不是从html页面解析表。上面的命令适用于本页中的表(https://www.ecdc.europa.eu/en/covid-19/variants-concern)
方法2
codeID_url = 'https://www.dropbox.com/s/i77mern7joxc9ur/TestResultCodelistVoC.xlsx'
data = pd.read_excel(codeID_url,'TestResultCodelistVoC')
给出:
XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'<!DOCTYP'
我确实在这里找到了一个关于这个错误的主题,尽管所有的答案都涉及一个本地.xls
文件,在我的情况下,我试图解析一个网页/链接,它最终是一个.xls
文件。
我还遇到了一个使用dropbox令牌的解决方案,不过如果可能的话,我首先想尝试在没有dropbox帐户的情况下下载上述表格。
将?dl=1
添加到URL的末尾。
>>> import pandas as pd
>>>
>>> url = 'https://www.dropbox.com/s/i77mern7joxc9ur/TestResultCodelistVoC.xlsx?dl=1'
>>> df = pd.read_excel(url)
>>> print(df)
Codelistname Codesystem name ... Short label DE 1st Release
0 TestResultCodelistVoC NaN ... Confirmed 501Y.V1 NaN
1 TestResultCodelistVoC NaN ... Confirmed 501Y.V2 NaN
2 TestResultCodelistVoC NaN ... Confirmed 501Y.V3 NaN
3 TestResultCodelistVoC NaN ... Confirmed 501Y.V3.P1 NaN
4 TestResultCodelistVoC NaN ... Confirmed 501Y.V3.P2 NaN
5 TestResultCodelistVoC NaN ... Confirmed not one of the listed VOC NaN
6 TestResultCodelistVoC NaN ... Compatible with 501Y.V1 NaN
7 TestResultCodelistVoC NaN ... Compatible with 501Y.V2 NaN
8 TestResultCodelistVoC NaN ... Compatible with 501Y.V3 NaN
9 TestResultCodelistVoC NaN ... Compatible with 501Y.V3.P1 NaN
10 TestResultCodelistVoC NaN ... Compatible with 501Y.V3.P2 NaN
11 TestResultCodelistVoC NaN ... Compatible with 501Y.V2-3 NaN
12 TestResultCodelistVoC NaN ... Compatible with a VOC NaN
13 TestResultCodelistVoC NaN ... Confirmed MinkCluster5 NaN
14 TestResultCodelistVoC NaN ... Compatible with MinkCluster5 NaN
15 TestResultCodelistVoC NaN ... Not compatible with 501Y.V1 NaN
16 TestResultCodelistVoC NaN ... Not compatible with 501Y.V2-3 NaN
17 TestResultCodelistVoC NaN ... No compatibility with VOC detected (VOC not fu... NaN
18 TestResultCodelistVoC NaN ... Other variant of concern NaN
[19 rows x 12 columns]
>>>