使用 Python REGEX 查找问题短语



我正在尝试使用 python 正则表达式查找每个问题短语,所以基本上我需要找到一个初始位置符号,并检测里面的所有内容,直到问号,避免中间的其他浮标。

所以我带着代码来了:

questionRegex = re.compile(r'[?.!][A-Za-zs]*?')

然后我使用此正则表达式在此文本中查找问题:

text = '''
Maybe the barista’s looking at me because she thinks I’m attractive. I am in my blue shirt. So she has stringy hair? Who am I to complain about stringy hair? Who do I think I am? Cary Grant?
And now John was doing temp work at the law firm of Fleurstein and Kaplowitz to get himself righted again. He had a strong six-month plan: he would save some money to pay Rebecca’s parents back for the house and be able to take some time off to focus on his writing—on his painting. In a few months, he would be back on his feet, probably even engaged to someone new. Maybe even that barista. Yes, almost paradoxically, temp work provided John with the stability he craved.
This is shit. It is utter shit. What are you talking about? Are you serious about this?
'''

喜欢这个:

process = questionRegex.findall(text)

但我得到的结果是这样的:

  • .所以她有一头乱发?

  • ?我以为自己是谁?

  • .你在谈论什么?

问题是本文中有 5 个问题。这意味着这个正则表达式无法捕捉到问题:

  • 我是谁来抱怨头发乱?
  • 你是认真的吗?

我的代码出了什么问题,为什么它不像其他问题那样抓住这两个问题?

我弄清楚了为什么您的正则表达式模式无法返回所有结果。

以下字符串:

  • 我是谁来抱怨头发乱?
  • 你是认真的吗?

事实上,任何下一个问题陈述都是在空格字符之后。

因此,与其指定一组[?.!],不如简单地使用s

模式变为:

In [20]: pattern = re.compile(r's[A-Za-zs]*?')
In [21]: pattern.findall(text)
Out[21]:
[' So she has stringy hair?',
' Who am I to complain about stringy hair?',
' Who do I think I am?',
' Cary Grant?',
' What are you talking about?',
' Are you serious about this?']

你可以试试这个:

(?<=[?.!]s)[^?n.]+??

比赛:

所以她有一头乱发?

我是谁来抱怨头发乱?

我以为自己是谁?

加里格兰特?

你在谈论什么?

你是认真的吗?

如果文本以问题开头,上面提到的正则表达式将跳过第一个问题。要解决此问题,请在s后添加一个问号。

正则表达式:

/s<strong>?</strong>[A-Za-zs]*?/

在后者中,回头看组之后的问号

/(?<=[?.!]s)<strong>?</strong>[^?n.]+??/

最新更新