如何替换数据帧列中大于特定值的值

我有一个数据集，在年龄字段中有一些异常值，这是我的数据排序的唯一值

unique = df_csv['AGE'].unique()
print (sorted(unique))

[21，22，23，24，25，26，27，28，29，30，31，32，33，34，35，36，37，38，39，40，41，42，43，44，45，46，47，48，49，50，51，52，53，54，55，56，57，58，59，60，61，62，63，64，65，66，67，68，69，70，71，72，73，74，75，79，126，140，149，152，228，235，267]

如何将大于80的值替换为"年龄"列的平均值或中值？

由于您想处理数据帧中的列，因此应该解析为loc:

# replace `median` with `mean` if you want
df_csv.loc[df_csv['AGE']>80,'AGE'] = df_csv['AGE'].median()

你可以做：

series[series > 80] = series.median()
print(series)

输出

0     21
1     22
2     23
3     24
4     25
..
58    52
59    52
60    52
61    52
62    52
Length: 63, dtype: int64

median = df_csv['AGE'].median()
# using apply 
df_csv['AGE'].apply(lambda x: median if x>80 else x)

其他方法：此处

相关内容

最新更新

热门标签：