我已从https://archive.ics.uci.edu/ml/machine-learning-databases/arrhythmia/。如您所见,它具有.data格式。如何在Python中将其读作pandas数据框?
我试试这个。但它会起作用:
with open("arrhythmia.data", "r") as f:
arryth_df = pd.DataFrame(f.read())
它说ValueError:DataFrame构造函数没有正确调用!
您可以将文件的url
传递给read_csv
,因为这里的.data
是csv格式,但没有标头,所以添加了header=None
:
#if want see all data
pd.options.display.max_columns = None
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/arrhythmia/arrhythmia.data'
df = pd.read_csv(url, header=None)
print (df.head())
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 75 0 190 80 91 193 371 174 121 -16 13 64 -2 ? 63 0
1 56 1 165 64 81 174 401 149 39 25 37 -17 31 ? 53 0
2 54 0 172 95 138 163 386 185 102 96 34 70 66 23 75 0
3 55 0 175 94 100 202 380 179 143 28 11 -5 20 ? 71 0
4 75 0 190 80 88 181 360 177 103 -16 13 61 3 ? ? 0
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
0 52 44 0 0 32 0 0 0 0 0 0 0 44 20 36
1 48 0 0 0 24 0 0 0 0 0 0 0 64 0 0
2 40 80 0 0 24 0 0 0 0 0 0 20 56 52 0
3 72 20 0 0 48 0 0 0 0 0 0 0 64 36 0
4 48 40 0 0 28 0 0 0 0 0 0 0 40 24 0
...
...
...
如果还想将?
转换为缺失值NaN
s,则添加na_values='?'
参数:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/arrhythmia/arrhythmia.data'
df = pd.read_csv(url, header=None, na_values='?')
print (df.head())
0 1 2 3 4 5 6 7 8 9 10 11 12 13
0 75 0 190 80 91 193 371 174 121 -16 13.0 64.0 -2.0 NaN
1 56 1 165 64 81 174 401 149 39 25 37.0 -17.0 31.0 NaN
2 54 0 172 95 138 163 386 185 102 96 34.0 70.0 66.0 23.0
3 55 0 175 94 100 202 380 179 143 28 11.0 -5.0 20.0 NaN
4 75 0 190 80 88 181 360 177 103 -16 13.0 61.0 3.0 NaN
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
0 63.0 0 52 44 0 0 32 0 0 0 0 0 0 0 44
1 53.0 0 48 0 0 0 24 0 0 0 0 0 0 0 64
2 75.0 0 40 80 0 0 24 0 0 0 0 0 0 20 56
3 71.0 0 72 20 0 0 48 0 0 0 0 0 0 0 64
4 NaN 0 48 40 0 0 28 0 0 0 0 0 0 0 40
...
...
用StringIO
:这样做
from io import StringIO
import pandas as pd
with open("arrhythmia.data", "r") as f:
data = StringIO(f.read())
arryth_df = pd.read_csv(data)