我一直在寻找从字符串中提取一个10个字符的单词,如果它存在的话。
需要检查前5个字符是否来自给定列表,最后3个字符是否为数字。
Input Data (Data.xlsx):
Description Number
CHQ -AQBCN2Q546 from India Federation Pvt Ltd
CHQN#DJBNK0Q329 from Indiana Basics Software Ltd -BC003
CASH-NJRQC5J987 from US Fertilizers LLP
CHQ - from India Bulls Pvt Ltd
CHQ -AQBCN2Q989 from India Bulls Pvt Ltd
CHQ -AQBCN2Q546 from India Federation Pvt Ltd
list_Character - ['AQBCN','PUCNQ','DJBNK','ADJBC','NJRQC']
预期输出:
Description Number
CHQ -AQBCN2Q546 from India Federation Pvt Ltd AQBCN2Q546
CHQN#DJBNK0Q329 from Indiana Basics Software Ltd -BC003 DJBNK0Q329
CASH-NJRQC5J987 from US Fertilizers LLP NJRQC5J987
CHQ - from India Bulls Pvt Ltd
CHQ -AQBCN2Q989 from India Bulls Pvt Ltd AQBCN2Q989
CHQ -AQCCN2Q546 from India Federation Pvt Ltd
Code:
import pandas as pd
import re
df = pd.read_excel(r'D:/Users/Data.xlsx')
list_Character - ['AQBCN','PUCNQ','DJBNK','ADJBC','NJRQC']
for i in df['Description']:
list = re.findall("[ae]w+", i)
我不知道如何找到解决方案,请建议。
我想你要:
list_Character = ['AQBCN', 'PUCNQ', 'DJBNK', 'ADJBC', 'NJRQC']
regex = r'[#-]((?:' + r'|'.join(list_Character) + r')w{5})b'
df["Number"] = df["Description"].str.extract(regex)