提取分隔符之间的所有字符串



我编写了一个函数,用于提取两个分隔符之间的字符串。但是在某些文件中,这些分隔符出现了几次,所以我想提取所有这些。 在我的实际函数中,它只提取它遇到的第一个然后退出。

我该如何解决它?

def extraction_error_CF(file): 
f=open(file,'r')
file=f.read()
f.close()
start = file.find('Error validating') #1st delimiter
end = file.find('</SPAN><BR>', start) # 2nd delimiter
if start!=-1 and end!=-1:             #If these two delimiters are present...
return(file[start:end])
else:
return""

对于 HTML/XML,你应该完全使用健壮的模块,如 BeautifulSoup, 但是,如果您真的只想在两个分隔符之间显示内容,则可以使用相同的函数,但是将结果添加到列表中(例如(,则可以将其打印出来

def extraction_error_CF(file): 
f=open(file,'r')
file=f.read()
f.close()
# Patterns
first = "Error validating"
second = "</span><br>"
# For all the matches
results = []
# Iterate the whole file
start = file.find(first)
end = file.find(second)
while start != -1 and end != -1:
# Add everything between the patterns
# but not including the patterns
results.append(file[start+len(first):end])
# Removing the text that already passed
file = file[end+len(second):]
start = file.find(first)
end = file.find(second)
# Return the content of the list as a string
if len(results) != 0:
return "".join(r for r in results)
else:
return None
print(extraction_error_CF("test"))
import re
def extraction_error_CF(file): # Get error from CF upload 
f=open(file,'r')
file=f.read()
f.close()
start = re.findall('Error validating(.*)</SPAN><BR>',file)
if start != -1:
return start
else:
return""

这就是我所做的,它运行良好,谢谢大家!

最新更新