我有一个索引列,它的前导2或3个字符表示我想创建标签的类别。考虑以下数据:
Index | NDP2207342 |
---|
OC22178098 |
SD88730948 |
OC39002847 |
PTP9983930 |
NDP9110876 |
尝试:
import re
pat = re.compile(r"d+")
def labelApply(x):
"""
Applying labels for categories
"""
x = pat.sub("", x)
if x == "OC":
return "OCAR"
elif x == "NDP":
return "NCAR"
elif x == "PTP":
return "SWAR"
elif x == "SD":
return "SDAR"
else:
return "Out of Area"
df["labels"] = df["Index"].apply(labelApply)
print(df)
打印:
Index labels
0 NDP2207342 NCAR
1 OC22178098 OCAR
2 SD88730948 SDAR
3 OC39002847 OCAR
4 PTP9983930 SWAR
5 NDP9110876 NCAR
# Identify the pairings
replace_dict = {
"OC": "OCAR",
"NDP": "NCAR",
"PTP": "SWAR",
"SD": "SDAR",
}
# Make the new column
df['labels'] = df['index'].str.extract("([A-Z]+)")
# Replace Unknown
df.loc[~df['labels'].isin(replace_dict), 'labels'] = 'Out of Area'
# Replace Known
df['labels'] = df['labels'].replace(replace_dict)
print(df)
输出:
index labels
0 NDP2207342 NCAR
1 OC22178098 OCAR
2 SD88730948 SDAR
3 OC39002847 OCAR
4 PTP9983930 SWAR
5 NDP9110876 NCAR