从dropbox链接解析.xls文件



我正试图从dropbox链接解析一个表(https://www.dropbox.com/s/i77mern7joxc9ur/TestResultCodelistVoC.xlsx)。这是一个.xlsx表,到目前为止我已经尝试了两种方法

方法1

codeID_url = 'https://www.dropbox.com/s/i77mern7joxc9ur/TestResultCodelistVoC.xlsx'
tables = pd.read_html(codeID_url)
df_codeID = tables[0]

给出

ValueError: No tables found

这是有道理的,因为最后,我不是从html页面解析表。上面的命令适用于本页中的表(https://www.ecdc.europa.eu/en/covid-19/variants-concern)

方法2

codeID_url = 'https://www.dropbox.com/s/i77mern7joxc9ur/TestResultCodelistVoC.xlsx'
data = pd.read_excel(codeID_url,'TestResultCodelistVoC')

给出:

XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'<!DOCTYP'

我确实在这里找到了一个关于这个错误的主题,尽管所有的答案都涉及一个本地.xls文件,在我的情况下,我试图解析一个网页/链接,它最终是一个.xls文件。

我还遇到了一个使用dropbox令牌的解决方案,不过如果可能的话,我首先想尝试在没有dropbox帐户的情况下下载上述表格。

?dl=1添加到URL的末尾。

>>> import pandas as pd
>>>
>>> url = 'https://www.dropbox.com/s/i77mern7joxc9ur/TestResultCodelistVoC.xlsx?dl=1'
>>> df = pd.read_excel(url)
>>> print(df)
Codelistname  Codesystem name  ...                                     Short label DE 1st Release
0   TestResultCodelistVoC              NaN  ...                                  Confirmed 501Y.V1         NaN
1   TestResultCodelistVoC              NaN  ...                                  Confirmed 501Y.V2         NaN
2   TestResultCodelistVoC              NaN  ...                                  Confirmed 501Y.V3         NaN
3   TestResultCodelistVoC              NaN  ...                               Confirmed 501Y.V3.P1         NaN
4   TestResultCodelistVoC              NaN  ...                               Confirmed 501Y.V3.P2         NaN
5   TestResultCodelistVoC              NaN  ...                Confirmed not one of the listed VOC         NaN
6   TestResultCodelistVoC              NaN  ...                            Compatible with 501Y.V1         NaN
7   TestResultCodelistVoC              NaN  ...                            Compatible with 501Y.V2         NaN
8   TestResultCodelistVoC              NaN  ...                            Compatible with 501Y.V3         NaN
9   TestResultCodelistVoC              NaN  ...                         Compatible with 501Y.V3.P1         NaN
10  TestResultCodelistVoC              NaN  ...                         Compatible with 501Y.V3.P2         NaN
11  TestResultCodelistVoC              NaN  ...                          Compatible with 501Y.V2-3         NaN
12  TestResultCodelistVoC              NaN  ...                              Compatible with a VOC         NaN
13  TestResultCodelistVoC              NaN  ...                             Confirmed MinkCluster5         NaN
14  TestResultCodelistVoC              NaN  ...                       Compatible with MinkCluster5         NaN
15  TestResultCodelistVoC              NaN  ...                        Not compatible with 501Y.V1         NaN
16  TestResultCodelistVoC              NaN  ...                      Not compatible with 501Y.V2-3         NaN
17  TestResultCodelistVoC              NaN  ...  No compatibility with VOC detected (VOC not fu...         NaN
18  TestResultCodelistVoC              NaN  ...                           Other variant of concern         NaN
[19 rows x 12 columns]
>>>

最新更新