如何在 for 循环中更新熊猫数据帧中的值?

我正在尝试制作一个可以在每次迭代后存储可变系数值的数据框。我能够在每次迭代后绘制图形。但是当我尝试在每次迭代后在数据框中插入值时。

我收到此错误。

没有 [Int64Index([ 3169， 3170， 3171， 3172， 3173， 3174， 3175， 3176， 3177， 3178， ... 31671， 31672， 31673， 31674， 31675， 31676， 31677， 31678， 31679， 31680]，
dtype='int64'， length=28512(] 在 [列] 中

这是我使用的代码：

from sklearn.model_selection import KFold
kf = KFold(n_splits=10)
cvlasso= Lasso(alpha=0.001)
count = 1
var = pd.DataFrame()

for train, _ in kf.split(X, Y):
cvlasso.fit(X.iloc[train, :], Y.iloc[train])
importances_index_desc = cvlasso.coef_
feature_labels = list(X.columns.values)
importance = pd.Series(importances_index_desc, feature_labels)
plt.figure()
plt.bar(feature_labels, importances_index_desc)
plt.xticks(feature_labels, rotation='vertical')
plt.ylabel('Importance')
plt.xlabel('Features')
plt.title('Fold {}'.format(count))
count = count + 1
var[train] = importances_index_desc
plt.show()

还有一件事，我的数据集中总共有 33000 个观测值，但在循环结束时，训练值是 28512？有谁知道为什么火车价值不是33000？

>train是从KFold返回的火车数据索引列表。您将train作为访问列放入var[train]，这将导致错误，因为索引值都不是数据帧列。

IMO，将复杂值设置为索引不是好主意，只需使用简单值作为索引，例如

var.loc[count] = importances_index_desc
count += 1

另一种解决方案可能是使用熊猫。DataFrame.append(pandas.数据帧(：

important_index_desc = pd.DataFrame(important_index_desc)
var = var.append(important_index_desc)

让我知道这是否有帮助！

尝试以下操作。

而不是

var = pd.DataFrame()

创建带标题的数据框

var = pd.DataFrame(columns=['impt_idx_desc'])

然后在循环中使用"loc"函数作为，

var.loc[count] = [importances_index_desc]

其中计数在循环中增加 +1。

相关内容

最新更新

热门标签：