我想在第一次出现一组字符串的索引(例如" ->"或" -x"或" -xx")中搜索一些文本一旦找到,我需要知道发现字符串的开始位置以及找到的特定字符串(更具体地说是已识别的字符串的长度)
)这是我到目前为止所拥有的..但还不够。请帮忙。
arrowlist = {"->x","->","->>","-","\-","//--","->o","o\--","<->","<->o"}
def cxn(line,arrowlist):
if any(x in line for x in arrowlist):
print("found an arrow {} at position {}".format(line.find(arrowlist),2))
else:
return 0
也许将正则是更容易的,但是我真的很挣扎,因为箭头列表可能是动态的,并且箭头字符串的长度也可能是可变的。
谢谢!
以及示例的逻辑之后,这是找到"第一个"匹配箭头并打印位置的最优势的方法。但是,集合的顺序不是FIFO,因此,如果您要保留订单,我建议您替换列表,而不是将set替换为arrowlist,以便可以保留该订单。
arrowlist = {"->x","->", "->>", "-\", "\-","//--","->o","o\--","<->","<->o"}
def cxn(line, arrowlist):
try:
result = tuple((x, line.find(x)) for x in arrowlist if x in line)[0]
print("found an arrow {} at position {} with length {}".format(result[0], result[1], len(result[0])))
# Remember in general it's not a great idea to use an exception as
# broad as Exception, this is just for example purposes.
except Exception:
return 0
如果您正在寻找提供的字符串(行)中的第一场比赛,则可以这样做:
arrowlist = {"->x","->", "->>", "-\", "\-","//--","->o","o\--","<->","<->o"}
def cxn(line, arrowlist):
try:
# key first sorts on the position in string then shortest length
# to account for multiple arrow matches (i.e. -> and ->x)
result = sorted([(x, line.find(x)) for x in arrowlist if x in line], key=lambda r: (r[1],len(r[0])))[0]
# if you would like to match the "most complete" (i.e. longest-length) word first use:
# result = sorted([(x, line.find(x)) for x in arrowlist if x in line], key=lambda r: (r[1], -len(r[0])))[0]
print("found an arrow {} at position {} with length {}".format(result[0], result[1], len(result[0])))
except Exception:
return 0
或,如果您可以访问标准库,则可以使用操作员。项目效果几乎相同,并从较少的函数调用中提高效率:
from operator import itemgetter
arrowlist = {"->x","->", "->>", "-\", "\-","//--","->o","o\--","<->","<->o"}
def cxn(line, arrowlist):
try:
# key first sorts on the position in string then alphanumerically
# on the arrow match (i.e. -> and ->x matched in same position
# will return -> because when sorted alphanumerically it is first)
result = sorted([(x, line.find(x)) for x in arrowlist if x in line], key=(itemgetter(1,0)))[0]
print("found an arrow {} at position {} with length {}".format(result[0], result[1], len(result[0])))
except Exception:
return 0
***注意:我正在使用与您的示例略有不同的箭头列表,因为您提供的示例似乎弄乱了默认的代码格式(可能是由于报价封闭问题)。请记住,您可以像这样的" r"预处字符串: r"Text that can use special symbols like the escape and be read in as a 'raw' string literal"
。有关原始字符串文字的更多信息,请参见此问题。
您可以做
之类的事情count = 0
for item in arrowlist:
count += 1
if item in line:
print("found an arrow {} at position {}".format(item,count))
我喜欢此解决方案,灵感来自此文章:
如何在列表中使用re匹配对象
import re
arrowlist = ["xxx->x", "->", "->>", "-"","\-"," // --","x->o", "-> ->"]
lines = ["xxx->x->->", "-> ->", "xxx->x", "xxxx->o"]
def filterPick(list,filter):
return [(m.group(), item_number, m.start()) for item_number,l in enumerate(list) for m in (filter(l),) if m]
if __name__ == '__main__':
searchRegex = re.compile(r''+ '|'.join(arrowlist) ).search
x = filterPick(lines, searchRegex)
print(x)
输出显示:
[('xxx->x', 0, 0), ('->', 1, 0), ('xxx->x', 2, 0), ('x->o', 3, 3)]
第一个数字是列表索引,其次是字符串的开始索引。
想要发布我想出的答案(来自反馈的组合)如您所见,此结果 - 确实是冗长的,并且效率非常低,将返回正确的位置索引处的正确箭头字符串。 -
arrowlist = ["xxx->x", "->", "->>", "xxx->x","x->o", "xxx->"]
doc =""" @startuml
n1 xxx->xx n2 : should not find
n1 ->> n2 : must get the third arrow
n2 xxx-> n3 : last item
n3 -> n4 : second item
n4 ->> n1 : third item"""
def checkForArrow(arrows,line):
for a in arrows:
words = line.split(' ')
for word in words:
if word == a:
return(arrows.index(a),word,line.index(word))
for line in iter(doc.splitlines()):
line = line.strip()
if line != "":
print (checkForArrow(arrowlist,line))
在 None
None
(2, '->>', 3)
(5, 'xxx->', 4)
(1, '->', 5)
(2, '->>', 6)