我的正则表达式模式有问题。我在这里错过了什么?
pattern = r'^(?P<filename>Cycle Narrative(?P<nlookup>[A-Z0-9-]+).docx?$)'
dfCycleNarratives = dftemp[dftemp.columns[0]].str.extract(pattern, expand=False, flags=re.IGNORECASE)
df温度如下:
0
0 Cycle Narrative - Louis Stevens.docx
1 Cycle Narrative - Steve Stevens.docx
我正在尝试使我的dfcyclenarratives
看起来像:
filename nlookup
0 Cycle Narrative Louis Stevens
1 Cycle Narrative Steve Stevens
我的dfcyclenarratives
当前看起来像:
filename nlookup
0 NaN NaN
1 NaN NaN
如有任何帮助,我们将不胜感激。谢谢
编辑:
如果这个名字在df-temp中第一个出现会是什么样子,我怎么能让nlookup仍然是这个名字呢?
0
0 Louis Stevens - Cycle Narrative.docx
1 Steve Stevens- Cycle Narrative.docx
尝试str.extractall
:
pattern = r'(?P<filename>.*)s+-s+(?P<nlookup>.*).docx'
dfCycleNarratives = dftemp[dftemp.columns[0]].str.extractall(pattern).reset_index(drop=True)
print(dfCycleNarratives)
# Output
filename nlookup
0 Cycle Narrative Louis Stevens
1 Cycle Narrative Steve Stevens