我正在尝试从url 下载csv文件
https://qubeshub.org/publications/1220/supportingdocs/1#supportingdocs。
文件是Elephant Morphometrics and Tusk Size-originaldata-3861.csv
我已经尝试使用pd.read_csv()
和
import pandas as pd
import io
import requests
url="https://qubeshub.org/publications/1220/supportingdocs/1#supportingdocs/Elephant Morphometrics and Tusk Size-originaldata-3861.csv"
s=requests.get(url).content
c=pd.read_csv(io.StringIO(s.decode('utf-8')))
尝试:
import requests
url = "https://qubeshub.org/publications/1220/serve/1/3861?el=1&download=1"
r = requests.get(url)
filename = r.headers["Content-Disposition"].split('"')[1]
with open(filename, "wb") as f_out:
print(f"Downloading {filename}")
f_out.write(r.content)
打印:
Downloading Elephant Morphometrics and Tusk Size-originaldata-3861.csv
并保存该文件。
这应该下载文件并将行和列解析为csv文件
import requests
import csv
url = "https://qubeshub.org/publications/1220/serve/1/3861?el=1&download=1"
req=requests.get(url)
rows = req.content.decode('utf-8').split("rn")
rows.pop()
csv_local_filename = "test.csv"
with open(csv_local_filename, 'w') as fs:
writer = csv.writer(fs, delimiter = ',')
for row in rows:
entries = row.split(',')
b=writer.writerow(entries)
在开始使用这些列之前,您可能需要将它们转换为所需的类型。上面的示例代码将所有内容都保留为字符串。
在我运行以上代码后,我看到:
>tail test.csv
2005-13,88,m,32.5,290,162.3,40
2005-13,51,m,37.5,270,113.2,40
2005-13,86,m,37.5,310,175.3,38
和
>head test.csv
Years of sample collection,Elephant ID,Sex,Estimated Age (years),shoulder Height in cm,Tusk Length in cm,Tusk Circumference in cm
1966-68,12,f,0.08,102,,
1966-68,34,f,0.08,89,,
1966-68,162,f,0.083,89,,
1966-68,292,f,0.083,92,,
在Firefox中,在浏览器中下载文件后,您可以检查该文件的链接,它会显示
https://qubeshub.org/publications/1220/serve/1/3861?el=1&下载=1
这个链接你应该在代码中使用
import pandas as pd
df = pd.read_csv('https://qubeshub.org/publications/1220/serve/1/3861?el=1&download=1')
print(df)