我有一个检查结果的数据框架&看起来像:
的违规行为Results Violations
Pass w/ Conditions 3. MANAGEMENT, FOOD EMPLOYEE AND CONDITIONAL E
Pass 36. THERMOMETERS PROVIDED & ACCURATE Comment...
我需要做的是通过此PANDAS DATAFRAME专门在违规列中循环python循环,并确定所有场景'从数字开始,然后以评论结尾:'
我能够使用Regex用这条代码线剥离数字
df_new['Violations'] = df_new['Violations'].map(lambda x:
x.lstrip('0123456789.- ').rstrip('[^a-zA-Z]Comments[^a-zA-Z]'))
您可以看到,我尝试通过RSTRE REGEX命令来实现评论结束,但这似乎无能为力。然后输出看起来像这样
Results Violations
0 Pass w/ Conditions MANAGEMENT, FOOD EMPLOYEE AND CONDITIONAL EMPL...
1 Pass THERMOMETERS PROVIDED & ACCURATE - Comments: 4...
基本上要说的正则命令是什么:寻找一个数字并删除数字和注释之间的所有内容:
是否有一种简单的方法?
基本上要说的是什么是:寻找一个数字并删除数字和注释之间的所有内容:
foo = '''
Results Violations
Pass w/ Conditions 3. MANAGEMENT, FOOD EMPLOYEE AND CONDITIONAL E
Pass 36. THERMOMETERS PROVIDED & ACCURATE Comment...'''
>>> print(foo)
Results Violations
Pass w/ Conditions 3. MANAGEMENT, FOOD EMPLOYEE AND CONDITIONAL E
Pass 36. THERMOMETERS PROVIDED & ACCURATE Comment...
>>>
import re
bar = re.sub('(d+.).*(Comment.*)', '\1', foo)
>>> print(bar)
Results Violations
Pass w/ Conditions 3. MANAGEMENT, FOOD EMPLOYEE AND CONDITIONAL E
Pass 36.
>>>
参考:
- https://regex101.com/
- re.sub(( - 以更换字符串中最后发生的正则
a-a-a-string-a-substring