从Python3中的网站读取csv文件



我正试图直接从网站读取csv文件。下面是Python3代码:

import pandas as pd
url = "https://www.w3resource.com/python-exercises/pandas/plotting/alphabet_stock_data.csv"
data = pd.read_csv(url)

但我得到了以下错误:

---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
Input In [6], in <cell line: 3>()
1 import pandas as pd
2 url = "https://www.w3resource.com/python-exercises/pandas/plotting/alphabet_stock_data.csv"
----> 3 data = pd.read_csv(url)
File ~/opt/anaconda3/lib/python3.9/site-packages/pandas/util/_decorators.py:311, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
305 if len(args) > num_allow_args:
306     warnings.warn(
307         msg.format(arguments=arguments),
308         FutureWarning,
309         stacklevel=stacklevel,
310     )
--> 311 return func(*args, **kwargs)

有线索吗?非常感谢。

您应该指定storage_options参数:

import pandas as pd
url = "https://www.w3resource.com/python-exercises/pandas/plotting/alphabet_stock_data.csv"
storage_options = {'User-Agent': 'Mozilla/5.0'}
df = pd.read_csv(url, storage_options=storage_options)

取自:https://stackoverflow.com/a/68816828/5304366

我喜欢对panda使用请求。

from io import StringIO
import pandas as pd
import requests

def get_data() -> pd.DataFrame:
url = "https://www.w3resource.com/python-exercises/pandas/plotting/alphabet_stock_data.csv"
with requests.Session() as request:
response = request.get(url)
if response.status_code != 200:
print(response.raise_for_status())
return pd.read_csv(StringIO(response.text), sep=",")

print(get_data())

最新更新