删除只需要出现一次的if语句



我正在解析一个文件,并使用if语句来"查找";节。在if语句中完成的逻辑从该部分获取数据并对其进行保存和解析。在每个if语句的末尾,当前行被设置为该部分的末尾。这些文件大约有200,000行,脚本一次处理x个文件;我正努力把效率放在心上。

下面是一个代码示例:

section = ["Section 1", "Section 2", "Section 3"]
with open(file, 'r') as f:
for line in f:
if section[0]:
parseFileSection()
line = current_line
if section[1]:
parseFileSection()
line = current_line
if section[2]:
parseFileSection()
line = current_line

下面是一个文件示例:

Section 1
data
.
.
Section 2
.
.
Section 2
.
.
Section 2
.
.
Section 3
.
.
Section 2
.
.

是否有一种方法可以不检查Section 1Section 3的if语句,因为它们只出现在数据中一次?

一种想法是以某种方式循环通过section数组并在"使用"时弹出该值。但这也会涉及到大量的反复检查。除此之外,有些部分的解析方式不同,所以它不像检查部分,调用解析函数,然后弹出该部分那么简单。

我怀疑额外的if语句是可以忽略不计的,除非你有很多语句和/或你有几个数量级的数据(我曾经在python中处理过100 Gb的文件,所以我对此有一点经验)

也就是说,删除这些部分没有害处。我假设您的sudo代码中有一些错别字——我希望您的意思是这样的:

section = ["section1", "section2", "section3"]
current_section = ''
def parseFileSectionType(line):
for s in section:
if s in line:
current_section = s
return s
if current_section:
return current_section
return "unknown"
with open(file, 'r') as f:
for line in f:
section = parseFileSectionType(line) # <-- assuming you have some function that determines which section you're in
if section == section[0]:
parsed_line = parseFileSection1(line) # <-- parseFileSection1 is the function responsible for parsing section1
if section == section[1]:
parsed_line = parseFileSection2(line) # <-- parseFileSection2 is the function responsible for parsing section2
if section == section[2]:
parsed_line = parseFileSection3(line) # <-- parseFileSection3 is the function responsible for parsing section3

在这种情况下,我会这样做:

section_mapper = {
"section1": parseFileSection1, # <-- parseFileSection1 is the function responsible for parsing section1
"section2": parseFileSection2, # <-- parseFileSection2 is the function responsible for parsing section2
"section3": parseFileSection3, # <-- parseFileSection3 is the function responsible for parsing section3
}
def parseFileSectionType(line):
for s in section_mapper.keys():
if s in line:
current_section = s
return s
if current_section:
return current_section
return "unknown"
with open(file, 'r') as f:
for line in f:
# parsed_line = section_mapper[parseFileSectionType(line)](line) # Split into two lines for readability
section = parseFileSectionType(line) # <-- assuming you have some function that determines which section you're in
parsed_line = section_mapper[section](line) # processes the line with the appropriate function from the section_mapper

最新更新