检查文本中是否存在2个或多个由括号包围的字符或数字,其中至少第一个字符为大写



contains_acronym函数检查文本中是否存在2个或多个由括号包围的字符或数字,至少第一个字符为大写(如果是字母(,如果满足条件则返回True,否则返回False。例如,"Instant messaging (IM) is a set of communication technologies used for text-based communication"应该返回True,因为(IM)满足匹配条件。填写此函数中的正则表达式:

import re
def contains_acronym(text):
pattern = ___ 
result = re.search(pattern, text)
return result != None
print(contains_acronym("Instant messaging (IM) is a set of communication technologies used for text-based communication")) # True
print(contains_acronym("American Standard Code for Information Interchange (ASCII) is a character encoding standard for electronic communication")) # True
print(contains_acronym("Please do NOT enter without permission!")) # False
print(contains_acronym("PostScript is a fourth-generation programming language (4GL)")) # True
print(contains_acronym("Have fun using a self-contained underwater breathing apparatus (Scuba)!")) # True

我尝试过这种模式,但它不适用于所有给定的输入情况:

pattern = r"(([A-Z0-9_]+))"

最后尝试了下面的模式,它用下面的代码覆盖了以上所有场景

import re
def contains_acronym(text):
pattern = r"([A-Za-z0-9]{2,})"
result = re.search(pattern, text)
return result != None
print(contains_acronym("Instant messaging (IM) is a set of communication technologies used for text-based communication")) # True
print(contains_acronym("American Standard Code for Information Interchange (ASCII) is a character encoding standard for electronic communication")) # True
print(contains_acronym("Please do NOT enter without permission!")) # False
print(contains_acronym("PostScript is a fourth-generation programming language (4GL)")) # True
print(contains_acronym("Have fun using a self-contained underwater breathing apparatus (Scuba)!")) # True

将其用作模式

pattern = r"(w.*w)"

"w"的意思是字母和数字。

import re
def contains_acronym(text):
pattern = r'([A-Za-z0-9]{2,})' 
result = re.search(pattern, text)
return result != None
print(contains_acronym("Instant messaging (IM) is a set of communication technologies used for text-based communication")) # True
print(contains_acronym("American Standard Code for Information Interchange (ASCII) is a character encoding standard for electronic communication")) # True
print(contains_acronym("Please do NOT enter without permission!")) # False
print(contains_acronym("PostScript is a fourth-generation programming language (4GL)")) # True
print(contains_acronym("Have fun using a self-contained underwater breathing apparatus (Scuba)!")) # True
import re
def contains_acronym(text):
pattern = r"([A-Z0-9][A-Z0-9a-z]+)"
result = re.search(pattern, text)
return result != None

您的正则表达式几乎是正确的。您只是忘记了它必须有至少2,所以只需将第一个范围作为字符串的常量部分,并用小写字母和+(一个或多个(重复相同的匹配:

pattern = r"(([A-Z0-9_][A-Za-z0-9_]+))"

这应该可以工作

pattern = r"([A-Z0-9].*)"

注意

  1. "+"是一个匹配一次或多次的元字符
  2. 应使用"*">
  3. 也不需要双括号
  4. \w还包括小写字符以及大写和整数
pattern = r"(w{2,})"

这种模式有效而且看起来更好。

pattern = r"([A-Z0-9])*"

这个对我有用。很简单。

转义开括号和闭括号,将第一个字符定义为数字或大写字符,然后将后续字符定义为大写、小写或数字。我在这之后用一个*来表明这可以重复很多次,或者根本不重复。即,您将使用(IM(和(i(获得正确的输出。

pattern = "([A-Z0-9][A-Za-z0-9]*)"

最新更新