python、reg-ex模式转换为数据帧



我的正则表达式模式有问题。我在这里错过了什么?

pattern = r'^(?P<filename>Cycle Narrative(?P<nlookup>[A-Z0-9-]+).docx?$)' 
dfCycleNarratives = dftemp[dftemp.columns[0]].str.extract(pattern, expand=False, flags=re.IGNORECASE)

df温度如下:

0
0    Cycle Narrative - Louis Stevens.docx
1  Cycle Narrative - Steve Stevens.docx

我正在尝试使我的dfcyclenarratives看起来像:

filename                nlookup
0      Cycle Narrative     Louis Stevens
1      Cycle Narrative     Steve Stevens

我的dfcyclenarratives当前看起来像:

filename nlookup
0      NaN     NaN
1      NaN     NaN

如有任何帮助,我们将不胜感激。谢谢

编辑:

如果这个名字在df-temp中第一个出现会是什么样子,我怎么能让nlookup仍然是这个名字呢?

0
0    Louis Stevens - Cycle Narrative.docx
1  Steve Stevens- Cycle Narrative.docx

尝试str.extractall:

pattern = r'(?P<filename>.*)s+-s+(?P<nlookup>.*).docx'
dfCycleNarratives = dftemp[dftemp.columns[0]].str.extractall(pattern).reset_index(drop=True)
print(dfCycleNarratives)
# Output
filename        nlookup
0  Cycle Narrative  Louis Stevens
1  Cycle Narrative  Steve Stevens

最新更新