我有一个列中包含患者诊断的数据框架,我想使用Panda对诊断进行二分===>ISM,非ISM。我试过这个
df["initial_diagnosis"] = df["initial_diagnosis"].apply(lambda x: x if x=="ISM" else "non ISM")
但它正在分配";非ISM";也适用于缺失的值。有没有办法做到这一点,并保持缺失的值不变?
我试图编写的专栏如下:
initial_diagnosis I
ISM
ISM
WDSM
NaN
ISM
SSM
CM
ASM
ISM
我认为它应该可以工作。可能缺少的值是空字符串或只有None,我只能猜测
missing_values = {...} # Set of values you want to keep
df["initial_diagnosis"] = df["initial_diagnosis"].apply(lambda x: x if x=="ISM" or x in missing_values else "non ISM")
编辑:
import pandas as pd
from numpy import nan
data = pd.read_csv("test.csv")
print(data['initial_diagnosis'])
#0 ISM
#1 ISM
#2 WDSM
#3 NaN
#4 ISM
#5 SSM
#6 CM
#7 ASM
#8 ISM
#Name: initial_diagnosis, dtype: object
missing_values = {nan}
data["initial_diagnosis"] = data["initial_diagnosis"].apply(lambda x: x if x =="ISM" or x in missing_values else "non ISM")
print(data['initial_diagnosis'])
#0 non ISM
#1 ISM
#2 non ISM
#3 NaN
#4 ISM
#5 non ISM
#6 non ISM
#7 non ISM
#8 ISM