用来自另一列的值填充数据框中的行子集/折叠多个列

第一次在这里发帖。我希望有更好的方法来做我正在做的事情。这几天我一直在原地打转，如果你能帮帮我，我会很感激的。

我正在研究关于囚犯及其判决的调查数据。每个囚犯都有一个用于调查的类型，存储在"囚犯_type"列中。对于每种囚犯类型，有一组5栏，可以记录他们的罪行(并非所有栏都必须使用)。我想将这些列组折叠成一组5列，并将它们添加到数据集中，这样，在每一行中，都有一组5列，我可以找到违规行为。

我创建了一个字典，用于查找存储每种囚犯类型的犯罪代码和犯罪类型的列名。外部字典中的键是囚犯类型。以下是节选版:

offense_variables= 
{  3={'codes':{1:'V0114',2:'V0115',3:'V0116',4:'V0117',5:'V0118'},
'off_types':{1:'V0124',2:'V0125',3:'V0126',4:'V0127',5:'V0128'}}
8={'codes':{1:'V0270',2:'V0271',3:'V0272',4:'V0273',5:'V0274'},
'off_types': {1:'V0280',2:'V0281',3:'V0282',4:'V0283',5:'V0285'}}  }

我首先创建10个新列:offense_1…"Offense_5和type_1…type_5.

我正在尝试:

使用pandas iloc查找给定囚犯类型的所有行
通过查找下面每个违规编号的变量来设置新列的值该囚犯在字典中键入，并将该列赋值为新值。

问题:

代码不终止。我不知道为什么它一直在运行。
我收到错误消息"一个值正在试图从一个数据帧切片的副本上设置。尝试使用。loc[row_indexer,col_indexer] = value代替">

pris_types=[3,8]
for pt in pris_types:
#five offenses are listed in the survey, so we need five columns to hold offence codes
#and five to hold offence types
#1 and 2 are just placeholder values    
for item in [i+1 for i in range(5)]:
dataset[f'off_{item}_code']='1'
dataset[f'off_{item}_type']='2'

#then use .loc to get indexes for this prisoner type
#look up the variable of the column that we need to take the values from 
#using the dictionary shown above 
for item in [i+1 for i in range(5)]:                
dataset.loc[dataset['prisoner_type'] == pt, 
dataset[f'off_{item}_code']] = 
dataset[offense_variables[pt]['codes'][item]]

dataset.loc[dataset[prisoner_type] == pt, 
dataset[f'off_{item}_type']] = 
dataset[offense_variables[pt]['types'][item]]

问题在于，在您的.loc[]部分，您只需要使用列标签(字符串对象)来识别要设置值的列，而不是整个系列/列对象，正如您目前所做的那样。使用当前的代码，您将创建新的列，并使用存储在dataset[f'off_{item}_type']列中的值命名。所以，不是:

for item in [i+1 for i in range(5)]:                
dataset.loc[dataset['prisoner_type'] == pt, 
dataset[f'off_{item}_code']] = 
dataset[offense_variables[pt]['codes'][item]]

dataset.loc[dataset[prisoner_type] == pt, 
dataset[f'off_{item}_type']] = 
dataset[offense_variables[pt]['types'][item]]

使用:

for item in range(1,6):                
(dataset.loc[dataset['prisoner_type'] == pt, 
f'off_{item}_code'] = 
dataset[offense_variables[pt]['codes'][item]]

dataset.loc[dataset[prisoner_type] == pt, 
f'off_{item}_type'] = 
dataset[offense_variables[pt]['types'][item]]

(我也简化了你的range循环行)

同样，您不需要在囚犯类型的循环中创建10个新列的语句，您可以将它们移出该循环。实际上你不需要像那样手动创建它们。.loc[]代码将为您创建它们。

相关内容

最新更新

热门标签：