Lambda应用程序在新列中包含NaN值



我正在基于条件函数在数据帧中创建一个新列。在我映射的列中,有多个NaN值。如果NaN值出现在原始列中,我也希望它出现在我的新列中。举个例子,我的出发点是:

Original
0   1
1   2
2   3
3   4
4   5
5   6
6   Nan
7   8
8   9
9   10

以下是我最初运行的代码示例,它(清楚地(给出了以下结果:

def get_value(range):
if range < 2:
return 'Below 2'
elif range < 8:
return 'Between 2 and 8'
else:
return 'Above 8'
df_sample['new_col'] = df_sample.apply(lambda x: get_value(x['Original']), axis=1)
Original    new_col
0   1.0 Below 2
1   2.0 Between 2 and 8
2   3.0 Between 2 and 8
3   4.0 Between 2 and 8
4   5.0 Between 2 and 8
5   6.0 Between 2 and 8
6   NaN Above 8
7   8.0 Above 8
8   9.0 Above 8
9   10.0    Above 8

这里,索引6应该显示NaN。

我试过在我的函数中包含elif range==np.Nan:,但没有成功。

然后,我根据Stackoverflow的建议尝试了以下操作:

df_sample['new_col'] = df_sample.apply(lambda x: get_value(x) if(np.all(pd.notnull(x['Original']))) else x, axis = 1)

但这在我的数据帧中的第一个NaN索引处返回了一个错误。

Déjàvu在这里,但根据我的上一个解决方案,只需为不满足条件的地方添加default

import numpy as np 
condlist = [
df['Original'].lt(2),
df['Original'].lt(8),
df['Original'].ge(8)]
choicelist = ['Below 2', 'Between 2 and 8', 'Above 8']
df['new_col'] = np.select(condlist, choicelist, default=np.nan)
print(df)

[out]

Original          new_col
0       1.0          Below 2
1       2.0  Between 2 and 8
2       3.0  Between 2 and 8
3       4.0  Between 2 and 8
4       5.0  Between 2 and 8
5       6.0  Between 2 and 8
6       NaN              nan
7       8.0          Above 8
8       9.0          Above 8
9      10.0          Above 8

对您的代码发表评论:

当您使用else语句时,所有不低于8的内容都将显示为"高于8"。即使在原始数据集中有字符串"helloworld"。

要保持代码的简单性,您可以执行以下操作:

def get_value(range):
if range < 2:
return 'Below 2'
elif range < 8:
return 'Between 2 and 8'
elif range >= 8:
return 'Above 8'
else:
return np.nan

通常,不要使用apply。在这种情况下,cut是一个更好的选择:

pd.cut(df.Original, [-np.inf, 2, 8, np.inf],
labels = ['below 2', 'between 2 and 8', 'above 8'],
right=False)

输出:

0            below 2
1    between 2 and 8
2    between 2 and 8
3    between 2 and 8
4    between 2 and 8
5    between 2 and 8
6                NaN
7            above 8
8            above 8
9            above 8
Name: Original, dtype: category
Categories (3, object): [below 2 < between 2 and 8 < above 8]

最新更新