在Python中基于部分字符串匹配创建标签



我有一个索引列,它的前导2或3个字符表示我想创建标签的类别。考虑以下数据:

tbody> <<tr>
Index
NDP2207342
OC22178098
SD88730948
OC39002847
PTP9983930
NDP9110876

尝试:

import re
pat = re.compile(r"d+")
def labelApply(x):
"""
Applying labels for categories
"""
x = pat.sub("", x)
if x == "OC":
return "OCAR"
elif x == "NDP":
return "NCAR"
elif x == "PTP":
return "SWAR"
elif x == "SD":
return "SDAR"
else:
return "Out of Area"

df["labels"] = df["Index"].apply(labelApply)
print(df)

打印:

Index labels
0  NDP2207342   NCAR
1  OC22178098   OCAR
2  SD88730948   SDAR
3  OC39002847   OCAR
4  PTP9983930   SWAR
5  NDP9110876   NCAR

# Identify the pairings
replace_dict = {
"OC": "OCAR",
"NDP": "NCAR",
"PTP": "SWAR",
"SD": "SDAR",
}
# Make the new column
df['labels'] = df['index'].str.extract("([A-Z]+)")
# Replace Unknown
df.loc[~df['labels'].isin(replace_dict), 'labels'] = 'Out of Area'
# Replace Known
df['labels'] = df['labels'].replace(replace_dict)
print(df)

输出:

index labels
0  NDP2207342   NCAR
1  OC22178098   OCAR
2  SD88730948   SDAR
3  OC39002847   OCAR
4  PTP9983930   SWAR
5  NDP9110876   NCAR

相关内容

  • 没有找到相关文章

最新更新