如何将 lambda 函数应用于一行中的 2 个值(lambda>缺少 1 个必需参数)?



我有一个包含 2 列的数据集:age_group、target(0,1)。 我想创建第 3 列"计数"(age_group 的值计数)。它必须查找目标是好是坏,并输入相应的计数。

5个年龄垃圾箱:

df['age_group'] = pd.cut(df['age'], [17,22,26,32,45,50,60])

40 行:

age_group       target
0   (45, 50]    bad
1   (45, 50]    bad
2   (32, 45]    good
3   (32, 45]    good
4   (50, 60]    bad
5   (32, 45]    bad
6   (26, 32]    good
7   (50, 60]    good
8   (32, 45]    bad
9   (17, 22]    good
10  (32, 45]    good

我可以按目标分组:

df.groupby('target').age_group.value_counts().to_frame()
age_group
target       age_group  
bad          (32, 45]      7
(26, 32]      3
(45, 50]      3
(50, 60]      3
(17, 22]      2
good         (32, 45]      8
(17, 22]      4
(50, 60]      4
(45, 50]      3
(26, 32]      2
(22, 26]      1

但在此数据帧中,只有 age_group 是主要的可访问列。我无法访问"目标"列和好目标、坏目标的特定值。

我想查找每个age_group及其目标,并将相应的值放在"计数"列中。

所以我正在做这个丑陋的解决方法功能。

def get_value_count_for_age_group_category(age_group, target):
bad_vals = df[df['bad']==1]['age_group'].value_counts().sort_index()
good_vals = df[df['good']==1]['age_group'].value_counts().sort_index()
values = age_freq.values.tolist()
keys = age_freq.keys()
if target == 'bad':
for k in keys:        
if age_group == pd.Interval(32,45):            
return bad_vals[0]    
elif age_group == pd.Interval(50, 60):
return bad_vals[1]
elif age_group == pd.Interval(45, 50):
return bad_vals[2]
elif age_group == pd.Interval(26, 32):
return bad_vals[3]
elif age_group == pd.Interval(22, 26):
return bad_vals[4]
elif age_group == pd.Interval(17,22):
return bad_vals[5]
else:
for k in keys:        
if age_group == pd.Interval(32,45):            
return good_vals[0]    
elif age_group == pd.Interval(50, 60):
return good_vals[1]
elif age_group == pd.Interval(45, 50):
return good_vals[2]
elif age_group == pd.Interval(26, 32):
return good_vals[3]
elif age_group == pd.Interval(22, 26):
return good_vals[4]
elif age_group == pd.Interval(17,22):
return good_vals[5]

这是行不通的,将 2 个值 - age_group 及其目标传递给 Lambda 函数:

n['count'] = n[['age_group', 'target']].apply(lambda num:get_value_count_for_age_group_category(num, target) )

lambda>() 缺少 1 个必需的位置参数:

这对你有用吗?

df.groupby('target').age_group.value_counts().reset_index(name='count')

输入

age_group   target
0   (45, 50)    bad
1   (45, 50)    bad
2   (32, 45)    good
3   (32, 45)    good
4   (50, 60)    bad
5   (32, 45)    bad
6   (26, 32)    good
7   (50, 60)    good
8   (32, 45)    bad
9   (17, 22)    good
10  (32, 45)    good

输出

target    age_group   count
0   bad     (32, 45)    2
1   bad     (45, 50)    2
2   bad     (50, 60)    1
3   good    (32, 45)    3
4   good    (17, 22)    1
5   good    (26, 32)    1
6   good    (50, 60)    1

如果需要"零"谷,请在下面使用

df1=df.groupby('target').age_group.value_counts().reset_index(name='count')
df1.set_index(['target','age_group']).unstack(fill_value=0).stack().reset_index()

输出

target    age_group   count
0   bad     (17, 22)    0
1   bad     (26, 32)    0
2   bad     (32, 45)    2
3   bad     (45, 50)    2
4   bad     (50, 60)    1
5   good    (17, 22)    1
6   good    (26, 32)    1
7   good    (32, 45)    3
8   good    (45, 50)    0
9   good    (50, 60)    1

相关内容

最新更新