希望从数据框架上剥离一块可预测的文本



我有一个检查结果的数据框架&看起来像:

的违规行为
Results                 Violations
Pass w/ Conditions  3. MANAGEMENT, FOOD EMPLOYEE AND CONDITIONAL E
Pass                    36. THERMOMETERS PROVIDED & ACCURATE Comment...

我需要做的是通过此PANDAS DATAFRAME专门在违规列中循环python循环,并确定所有场景'从数字开始,然后以评论结尾:'

我能够使用Regex用这条代码线剥离数字

df_new['Violations'] = df_new['Violations'].map(lambda x: 
    x.lstrip('0123456789.- ').rstrip('[^a-zA-Z]Comments[^a-zA-Z]'))

您可以看到,我尝试通过RSTRE REGEX命令来实现评论结束,但这似乎无能为力。然后输出看起来像这样

Results Violations
0   Pass w/ Conditions  MANAGEMENT, FOOD EMPLOYEE AND CONDITIONAL EMPL...
1   Pass    THERMOMETERS PROVIDED & ACCURATE - Comments: 4...

基本上要说的正则命令是什么:寻找一个数字并删除数字和注释之间的所有内容:

是否有一种简单的方法?

基本上要说的是什么是:寻找一个数字并删除数字和注释之间的所有内容:

foo = '''
Results                 Violations
Pass w/ Conditions  3. MANAGEMENT, FOOD EMPLOYEE AND CONDITIONAL E
Pass                    36. THERMOMETERS PROVIDED & ACCURATE Comment...'''


>>> print(foo)
    Results                 Violations
    Pass w/ Conditions  3. MANAGEMENT, FOOD EMPLOYEE AND CONDITIONAL E
    Pass                    36. THERMOMETERS PROVIDED & ACCURATE Comment...
>>>


import re
bar = re.sub('(d+.).*(Comment.*)', '\1', foo)


>>> print(bar)
    Results                 Violations
    Pass w/ Conditions  3. MANAGEMENT, FOOD EMPLOYEE AND CONDITIONAL E
    Pass                    36.
>>>

参考:

  • https://regex101.com/
  • re.sub(( - 以更换字符串中最后发生的正则

a-a-a-string-a-substring

最新更新