ValueError:发现样本数量不一致的输入变量:[2935848,2935849]



当我运行这段代码时:

feature_names = ["date","shop_id", "item_id", "item_price", "item_cnt_day"]
feature_names
X_train = train[feature_names]
print(X_train.shape)
X_train.head()
X_sales = sales[feature_names]
print(X_sales.shape)
X_sales.head()
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
X_train, X_sales, y_train, y_sales = train_test_split(X_train, X_sales, test_size=0.3)

feature_names = ["date","shop_id", "item_id", "item_price", "item_cnt_day"]
feature_names
​
X_train = train[feature_names]
print(X_train.shape)
X_train.head()
​
X_sales = sales[feature_names]
print(X_sales.shape)
X_sales.head()
​
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
​
X_train, X_sales, y_train, y_sales = train_test_split(X_train, X_sales, test_size=0.3)
​
(2935848, 5)
(2935849, 5)
我得到这个ValueError:

ValueError回溯(最近的调用)最后一个)源自sklearn。度量导入mean_squared_error14——比;15 X_train, X_sales, y_train, y_sales = train_testrongplit(X_train, X_sales, testrongize=0.3)16

~/anaconda3/env/aiffel/lib/python3.7/网站/sklearn/model_selection/_split.py在train_testrongplit(*arrays, **options)中抛出2125类型错误("传入无效参数:% "% str(options)) 2126→2128 2129 n_samples = _num_samples(arrays[0])

~/anaconda3/env/aiffel/lib/python3.7/网站/sklearn/validation.py跑龙套在可转位(* iterable)291年""292 result = [_make_indexable(X) for X in iterables]——比;293年check_consistent_length (*)294返回结果295

~/anaconda3/env/aiffel/lib/python3.7/网站/sklearn/validation.py跑龙套在check_consistent_length(*数组)255 if len(unique)>1:256抛出ValueError("发现输入变量与"数目不一致;——比;257年 & ";样品:% r"% [int(l) for l的长度])258259

ValueError: Found input variables with inconsistent number of samples: [2935848, 2935849]

您的问题已经解决了,因为您的两个数据框(train和sales)长度不同。您的火车数据集有2935848个样本,销售数据集有2935849个样本。两个数据集必须具有相同的长度才能正常工作. 检查为什么这个长度不匹配,并添加或删除一行来匹配它们。

其次,但并非最不重要的是,您应该了解您正在使用train_test_split做什么以及您的目标是什么。该函数输入为X、Y,输出X_trainX_testy_trainy_test。阅读您的代码,您正在输入两个具有相同5个功能的X (X_trainX_sales)。我希望你这样做是出于某种原因,请注意这一点。

X是所有样本及其特征,Y是你想要预测的相应输出值。检查并评估是否使用train_test_split是您正在寻找的函数。

当我试图做我的混淆矩阵时,我有这个错误:发现样本数量不一致的输入变量:[1527,1]

这是我的代码:

x = df[['gender', 'age', 'hypertension', 'ever_married', 'work_type', 'Residence_type', 'avg_glucose_level', 'bmi', 'smoking_status', 'work_type_cat', 'gender_cat', 'Residence_type_cat']]
y = df['stroke']
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25, random_state=20)
print(x_train.shape, y_train.shape)
print(x_test.shape, y_test.shape)
scaler = StandardScaler()
x_train_scale = scaler.fit_transform(x_train)
x_test_scale = scaler.fit_transform(x_test)
KNN = KNeighborsClassifier()
x = df[['gender', 'age', 'hypertension', 'heart_disease', 'ever_married', 'work_type', 'Residence_type', 'avg_glucose_level', 'bmi', 'smoking_status', 'work_type_cat', 'gender_cat', 'Residence_type_cat']]
y = df['stroke']
print(x.head())
print(y.head())
KNN = KNN.fit(x, y)
test = pd.DataFrame()
test['gender'] = [2]
test['age'] = [3]
test['hypertension'] = [0]
test['heart_disease'] = [0]
test['ever_married'] = [2]
test['work_type'] = [4]
test['Residence_type'] = [2]
test['avg_glucose_level'] = [95.12]
test['bmi'] = [18]
test['smoking_status'] = [2]
test['work_type_cat'] = [4]
test['gender_cat'] = [1]
test['Residence_type_cat'] = [1]
y_predict = KNN.predict(test)
print(y_predict)
from sklearn.metrics import confusion_matrix
print(confusion_matrix(y_test, y_predict))

最新更新