我的可解释模型有问题,发生以下情况:
我定义了编码的muy标签
最后一列,通用名称是一个分类值。标签将其编码为数值。
label_encoder = LabelEncoder()
x= merged_data_df.iloc[:, 1:14]
y = label_encoder.fit_transform(merged_data_df['common name'])
print(x.shape, y.shape)
启动列车试验
x_train, x_test, y_train, y_test = train_test_split(x.values, y,test_size = 0.2, random_state= 0)
print(f"Train Data: {x_train.shape}, {y_train.shape}")
print(f"Test Data: {x_test.shape}, {y_test.shape}")
随机森林分类器
from sklearn.ensemble import RandomForestClassifier
rf_pipeline=make_pipeline(StandardScaler(),RandomForestClassifier(random_state=1502))
rf_pipeline.fit(x_train,y_train)
到目前为止还不错,然后我想让模型可以解释,但我遇到了一个错误。
import shap
explainer = shap.KernelExplainer(rf_pipeline.predict, x_test)
instance = x_test.loc[[95]]
shap_instance = explainer.shap_values(instance)
shap.initjs()
shap.force_plot(explainer.expected_value,shap_instance, instance)
AttributeError Traceback (most recent call last)
/tmp/ipykernel_1058/159078831.py in <module>
----> 1 instance = x_test.loc[[95]]
2 shap_instance = explainer.shap_values(instance)
3 shap.initjs()
4 shap.force_plot(explainer.expected_value,shap_instance, in
stance)
AttributeError: 'numpy.ndarray' object has no attribute 'loc'
您的x_test变量不是pandas数据帧,因此无法使用.loc.对其进行索引