我正在使用np.genfromtxt来读取csv。我不确定为什么它会在数据上引发 ValueError(errmsg)。当我在 excel 中读取文件时,它为文件中的所有 23 行显示总共 33 列
这是代码和错误:
csv = np.genfromtxt (fname, delimiter=",",names=True)
以下是 csv 记录的片段:
,mean_fit_time,mean_score_time,mean_test_score,mean_train_score,param_NN__alpha,param_NN__hidden_layer_sizes,params,rank_test_score,split0_test_score,split0_train_score,split1_test_score,split1_train_score,split2_test_score,split2_train_score,split3_test_score,split3_train_score,split4_test_score,split4_train_score,std_fit_time,std_score_time,std_test_score,std_train_score
0,0.34166226387023924,0.0010362625122070312,0.842927342927343,0.8468980402379758,0.1,"(7,)","{'NN__alpha': 0.1, 'NN__hidden_layer_sizes': (7,)}",25,0.8420706295240185,0.8475292052871167,0.8398771660451854,0.8463774474853288,0.845360824742268,0.846158065046893,0.8385256691531373,0.8486892618185806,0.8488040377441299,0.8457362215519605,0.05093153997183547,0.00018195987247183776,0.0037378988316037944,0.0010747322296072162
1,0.5543142318725586,0.0018250465393066407,0.8465250965250966,0.8527554135893668,0.1,"(25, 7)","{'NN__alpha': 0.1, 'NN__hidden_layer_sizes': (25, 7)}",5,0.846018863785918,0.8530137662480118,0.846018863785918,0.8589919376953875,0.8479929809168677,0.8496681840618658,0.8400614304519526,0.851486234506965,0.8525345622119815,0.8506169454346038,0.10835399357094619,0.00018853748087819175,0.004013613789285713,0.003306836154659678
2,0.5266880512237548,0.0013680458068847656,0.8437609687609687,0.8478413817137904,0.1,"(11, 7)","{'NN__alpha': 0.1, 'NN__hidden_layer_sizes': (11, 7)}",17,0.842509322219785,0.8479679701639884,0.8354902390875192,0.8431964021280096,0.8455801710901514,0.8520265452750507,0.8433523475208424,0.851595919710431,0.8518762343647136,0.8444200712914725,0.1041624682160838,0.0003233587082439388,0.005278162504355272,0.0036030369022985215
3,0.49459095001220704,0.0011162281036376954,0.8406458406458407,0.845428443186931,0.1,"(7, 5)","{'NN__alpha': 0.1, 'NN__hidden_layer_sizes': (7, 5)}",32,0.8383417416100022,0.848461580650469,0.8429480149155516,0.8501617945483464,0.8468962491774512,0.8514780891789612,0.8312856516015796,0.8381046396841066,0.8437568575817423,0.8389361118727722,0.10397613499936685,0.00018889068500539376,0.005421511394261151,0.005726975087304059
4,0.6175418376922608,0.0024899959564208983,0.8449017199017199,0.8508140227747922,0.1,"(25, 11, 7)","{'NN__alpha': 0.1, 'NN__hidden_layer_sizes': (25, 11, 7)}",11,0.8414125904803685,0.8493939560138211,0.8427286685676684,0.8546591345362804,0.8501864443957008,0.8519716996654417,0.8459850811759544,0.8564769112646704,0.8441957428132544,0.8415684123937482,0.1940231074769015,0.00047604030307216253,0.003049662553913791,0.005209439647677219
收到的错误:
ValueError: Some errors were detected !
Line #2 (got 26 columns instead of 22)
Line #3 (got 26 columns instead of 22)
Line #4 (got 26 columns instead of 22)
Line #5 (got 26 columns instead of 22)
Line #6 (got 28 columns instead of 22)
Line #7 (got 26 columns instead of 22)
Line #8 (got 28 columns instead of 22)
Line #9 (got 26 columns instead of 22)
Line #10 (got 26 columns instead of 22)
Line #11 (got 26 columns instead of 22)
Line #12 (got 26 columns instead of 22)
Line #13 (got 26 columns instead of 22)
Line #14 (got 28 columns instead of 22)
Line #15 (got 26 columns instead of 22)
Line #16 (got 28 columns instead of 22)
Line #17 (got 26 columns instead of 22)
Line #18 (got 26 columns instead of 22)
Line #19 (got 26 columns instead of 22)
Line #20 (got 26 columns instead of 22)
Line #21 (got 26 columns instead of 22)
Line #22 (got 28 columns instead of 22)
Line #23 (got 26 columns instead of 22)
Line #24 (got 28 columns instead of 22)
Line #25 (got 26 columns instead of 22)
Line #26 (got 26 columns instead of 22)
Line #27 (got 26 columns instead of 22)
Line #28 (got 26 columns instead of 22)
Line #29 (got 26 columns instead of 22)
Line #30 (got 28 columns instead of 22)
Line #31 (got 26 columns instead of 22)
Line #32 (got 28 columns instead of 22)
Line #33 (got 26 columns instead of 22)
您传递,
作为分隔符,而许多列值本身都包含元素。您需要指定一个显式引号才能使其正常工作。
幸运的是,pandas
在没有太多帮助的情况下很好地处理了这个问题。您可以尝试使用 read_csv
加载数据,然后将加载的数据帧转换为数组。
import pandas as pd
array = pd.read_csv(name, index_col=[0]).values
加载的数据帧(在调用.values
之前获得的数据帧)如下所示:
df = pd.read_csv(name, index_col=[0])
print(df)
mean_fit_time mean_score_time mean_test_score mean_train_score
0 0.341662 0.001036 0.842927 0.846898
1 0.554314 0.001825 0.846525 0.852755
2 0.526688 0.001368 0.843761 0.847841
3 0.494591 0.001116 0.840646 0.845428
4 0.617542 0.002490 0.844902 0.850814
param_NN__alpha param_NN__hidden_layer_sizes
0 0.1 (7,)
1 0.1 (25, 7)
2 0.1 (11, 7)
3 0.1 (7, 5)
4 0.1 (25, 11, 7)
params rank_test_score
0 {'NN__alpha': 0.1, 'NN__hidden_layer_sizes': (... 25
1 {'NN__alpha': 0.1, 'NN__hidden_layer_sizes': (... 5
2 {'NN__alpha': 0.1, 'NN__hidden_layer_sizes': (... 17
3 {'NN__alpha': 0.1, 'NN__hidden_layer_sizes': (... 32
4 {'NN__alpha': 0.1, 'NN__hidden_layer_sizes': (... 11
split0_test_score split0_train_score ... split2_test_score
0 0.842071 0.847529 ... 0.845361
1 0.846019 0.853014 ... 0.847993
2 0.842509 0.847968 ... 0.845580
3 0.838342 0.848462 ... 0.846896
4 0.841413 0.849394 ... 0.850186
split2_train_score split3_test_score split3_train_score
0 0.846158 0.838526 0.848689
1 0.849668 0.840061 0.851486
2 0.852027 0.843352 0.851596
3 0.851478 0.831286 0.838105
4 0.851972 0.845985 0.856477
split4_test_score split4_train_score std_fit_time std_score_time
0 0.848804 0.845736 0.050932 0.000182
1 0.852535 0.850617 0.108354 0.000189
2 0.851876 0.844420 0.104162 0.000323
3 0.843757 0.838936 0.103976 0.000189
4 0.844196 0.841568 0.194023 0.000476
std_test_score std_train_score
0 0.003738 0.001075
1 0.004014 0.003307
2 0.005278 0.003603
3 0.005422 0.005727
4 0.003050 0.005209
[5 rows x 22 columns
是的,列会自动转换为适当的数据类型。
print(df.dtypes)
mean_fit_time float64
mean_score_time float64
mean_test_score float64
mean_train_score float64
param_NN__alpha float64
param_NN__hidden_layer_sizes object
params object
rank_test_score int64
split0_test_score float64
split0_train_score float64
split1_test_score float64
split1_train_score float64
split2_test_score float64
split2_train_score float64
split3_test_score float64
split3_train_score float64
split4_test_score float64
split4_train_score float64
std_fit_time float64
std_score_time float64
std_test_score float64
std_train_score float64
dtype: object
法定警告:由于其性质,这些数据作为 python 列表可能对您来说比 numpy 数组(经过优化以与标量一起使用)更有用。