将一些列表和字典数据合并到一个ndarray中

我有一些从RandomForestClassifier()和从CSV编码字符串数据返回的数据结构。根据一些天气数据，我预测了某些犯罪发生的可能性。模型部分工作得很好，但我有点像Python迷，无法将这些数据合并在一起。

这是我的一个简化版本：

#this line is pseudo code
data = from_csv_file
label_dict = { 'Assault': 0, 'Robbery': 1 }
# index 0 of each cell in predictions is Assault, index 1 is Robbery
encoded_labels = [0, 1]
# Probabilities of crime being assault or robbery
predictions = [
[0.4, 0.6], 
[0.1, 0.9], 
[0.8, 0.2], 
...
]

我想为每个犯罪标签在data中添加一个新列，单元格内容是概率，例如称为prob_Assault和prob_Robbery的新列。最后，我想添加一个布尔列(True/False(来显示预测是否正确。

我该怎么做？使用Python 3.10，pandas、numpy和scikit学习。

编辑：如果你看到我实际代码的重要部分，对一些人来说可能会更容易

# Training data X, Y
tr_y = tr['offence']
tr_x = tr.drop('offence', axis=1)
# Test X (what to predict)
test_x = test_data.drop('offence', axis=1)
clf = RandomForestClassifier(n_estimators=40)
fitted = clf.fit(tr_x, tr_y)
pred = clf.predict_proba(test_x)
encoded_labels = fitted.classes_
# I also have the encodings dictionary that shows the encodings for crime types

你走在了正确的轨道上。您需要的是将predictions从列表重新格式化为numpy数组，然后访问其列：

import numpy as np
predictions = np.array(predictions)
data["prob_Assault"] = predictions[:,0]
data["prob_Robbery"] = predictions[:,1]

我假设data是熊猫数据帧。我不确定你想如何评估这些概率，但你也可以在Panda中使用逻辑语句：

data["prob_Assault"] == 0.8 # For example, 0.8 is the correct probability

上面的代码将返回一系列布尔值，例如：

0     True
1    False
2    False
...

您可以将这些值作为新列分配给数据帧：

data["check"] = data["prob_Assault"] == 0.8

或者甚至选择数据帧的True行：

data[data["prob_Assault"] == 0.8]

也许我误解了你的问题，但如果不是，那可能是一个解决方案：

创建一个包含两列的数据帧：prob_Assault和prob_Robbery。

predictions_df=pd.DataFrame(预测，列=['prob_Assault'，'prob_Robbery'](
将predictions_df加入您的数据

相关内容

最新更新

热门标签：