ValueError: 无法将字符串转换为浮点数: 'A1'使用 np.loadtxt



我有一个需要处理CSV文件的程序。该文件需要转换为数据集。我正在使用的示例来自流行的带有虹膜数据集的python教程。我正试图用一种方法来读取CSV 'A1-md.csv'来取代datasets.load_iris()。

预期:

程序处理CSV文件并加载数据。

实际:

Traceback (most recent call last):
File ".example.py", line 38, in <module>
main()
File ".example.py", line 11, in main
dataset = np.loadtxt(fname = 'A1-dm.csv', delimiter = ',')
File "C:Program Files (x86)Microsoft Visual StudioSharedPython36_64libsite-packagesnumpylibnpyio.py", line 1134, in loadtxt
for x in read_data(_loadtxt_chunksize):
File "C:Program Files (x86)Microsoft Visual StudioSharedPython36_64libsite-packagesnumpylibnpyio.py", line 1061, in read_data
items = [conv(val) for (conv, val) in zip(converters, vals)]
File "C:Program Files (x86)Microsoft Visual StudioSharedPython36_64libsite-packagesnumpylibnpyio.py", line 1061, in <listcomp>
items = [conv(val) for (conv, val) in zip(converters, vals)]
File "C:Program Files (x86)Microsoft Visual StudioSharedPython36_64libsite-packagesnumpylibnpyio.py", line 768, in floatconv
return float(x)
ValueError: could not convert string to float: 'A1'
这个实现的代码是
from sklearn import datasets
from sklearn.model_selection import train_test_split
from MDLP import MDLP_Discretizer
def main():
######### USE-CASE EXAMPLE #############
#read dataset
dataset = np.loadtxt(fname = 'A1-dm.csv', delimiter = ',')
X, y = dataset['A1'], dataset['Class']
# feature_names, class_names = dataset['feature_names'], dataset['target_names']
# numeric_features = np.arange(X.shape[1])  # all fetures in this dataset are numeric. These will be discretized
# #Split between training and test
# X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
# #Initialize discretizer object and fit to training data
# discretizer = MDLP_Discretizer(features=numeric_features)
# discretizer.fit(X_train, y_train)
# X_train_discretized = discretizer.transform(X_train)
# #apply same discretization to test set
# X_test_discretized = discretizer.transform(X_test)
# #Print a slice of original and discretized data
# print('Original dataset:n%s' % str(X_train[0:5]))
# print('Discretized dataset:n%s' % str(X_train_discretized[0:5]))
# #see how feature 0 was discretized
# print('Feature: %s' % feature_names[0])
# print('Interval cut-points: %s' % str(discretizer._cuts[0]))
# print('Bin descriptions: %s' % str(discretizer._bin_descriptions[0]))
if __name__ == '__main__':
main()

CSV文件的示例如下:

A1,A2,A3,Class
2,0.4631338,1.5,3
8,0.7460648,3.0,3
6,0.264391038,2.5,2
5,0.4406713,2.3,1
2,0.410438159,1.5,3
2,0.302901816,1.5,2
6,0.275869396,2.5,3
8,0.084782428,3.0,3
2,0.53226533,1.5,2

我该如何解决这个问题?

CSV文件的第一行是显示文本的标题。为了操作string到float的转换,您应该跳过这一行。

请查看:numpy loadtxt跳过第一行

相关内容

最新更新