与熊猫一起阅读CSV有困难



我是数据挖掘领域的新手。我正在尝试计算大约 500 行的数据集中 16 个变量之间的相关性。 我必须和熊猫一起做这件事。但是我在读取 csv 文件时也有问题(我在 mac 上,我不知道这是否是问题所在(! 这是我使用的代码:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv('https://www.dropbox.com/s/2ps64ditghqj4xv/industrial_project.csv?dl=0', index_col=0)
corr = data.corr()
fig = plt.figure()
ax = fig.add_subplot(111)
cax = ax.matshow(corr,cmap='coolwarm', vmin=-1, vmax=1)
fig.colorbar(cax)
ticks = np.arange(0,len(data.columns),1)
ax.set_xticks(ticks)
plt.xticks(rotation=90)
ax.set_yticks(ticks)
ax.set_xticklabels(data.columns)
ax.set_yticklabels(data.columns)
plt.show()

错误是:

Traceback (most recent call last):
File "/Users/myname/eclipse2-workspace/Prova/ciao.py", line 4, in <module>
data = pd.read_csv('https://www.dropbox.com/s/2ps64ditghqj4xv/industrial_project.csv?dl=0', index_col=0)
File "/Users/myname/Library/Python/2.7/lib/python/site-packages/pandas/io/parsers.py", line 678, in parser_f
return _read(filepath_or_buffer, kwds)
File "/Users/myname/Library/Python/2.7/lib/python/site-packages/pandas/io/parsers.py", line 446, in _read
data = parser.read(nrows)
File "/Users/myname/Library/Python/2.7/lib/python/site-packages/pandas/io/parsers.py", line 1036, in read
ret = self._engine.read(nrows)
File "/Users/myname/Library/Python/2.7/lib/python/site-packages/pandas/io/parsers.py", line 1848, in read
data = self._reader.read(nrows)
File "pandas/_libs/parsers.pyx", line 876, in pandas._libs.parsers.TextReader.read
File "pandas/_libs/parsers.pyx", line 891, in pandas._libs.parsers.TextReader._read_low_memory
File "pandas/_libs/parsers.pyx", line 945, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 932, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 2112, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 3, saw 2

我已经尝试了很多方法,但我做不到!

您尝试下载的不是 csv 文件,而是一个 html 页面,其中显示一个表格,其中包含从 csv 文件中提取的信息。 Tou 必须使用单击右上角的 su 下载时创建的链接,并将该链接传递给 .read_csv((。它应该看起来像这样:

url = 'https://UGLYUGLYTHINGS.dl.dropboxusercontent.com/cd/0/get/MOREUGLYTHINGSHERE/file?_download_id=ENCODED_ID_OF_THE_FILE&_notify_domain=www.dropbox.com&dl=1'

上面以大写字母书写的字符串部分对应于 Dropbox 后端所做的任何内容。
另外,不要忘记将 char';'作为sep参数提供给 .read_csv(( 个字符,如下所示:

data = pd.read_csv(url,sep=';')

如果使用正确的 url,则其余代码有效。

另外,如上面的评论中所述,请更改问题的标题/标题,因为它可能会误导某人。问题在于读取远程文件,而不是计算相关性。

最新更新