我正试图直接从网站读取csv文件。下面是Python3代码:
import pandas as pd
url = "https://www.w3resource.com/python-exercises/pandas/plotting/alphabet_stock_data.csv"
data = pd.read_csv(url)
但我得到了以下错误:
---------------------------------------------------------------------------
HTTPError Traceback (most recent call last)
Input In [6], in <cell line: 3>()
1 import pandas as pd
2 url = "https://www.w3resource.com/python-exercises/pandas/plotting/alphabet_stock_data.csv"
----> 3 data = pd.read_csv(url)
File ~/opt/anaconda3/lib/python3.9/site-packages/pandas/util/_decorators.py:311, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
305 if len(args) > num_allow_args:
306 warnings.warn(
307 msg.format(arguments=arguments),
308 FutureWarning,
309 stacklevel=stacklevel,
310 )
--> 311 return func(*args, **kwargs)
有线索吗?非常感谢。
您应该指定storage_options
参数:
import pandas as pd
url = "https://www.w3resource.com/python-exercises/pandas/plotting/alphabet_stock_data.csv"
storage_options = {'User-Agent': 'Mozilla/5.0'}
df = pd.read_csv(url, storage_options=storage_options)
取自:https://stackoverflow.com/a/68816828/5304366
我喜欢对panda使用请求。
from io import StringIO
import pandas as pd
import requests
def get_data() -> pd.DataFrame:
url = "https://www.w3resource.com/python-exercises/pandas/plotting/alphabet_stock_data.csv"
with requests.Session() as request:
response = request.get(url)
if response.status_code != 200:
print(response.raise_for_status())
return pd.read_csv(StringIO(response.text), sep=",")
print(get_data())