在python中通过缩进块拆分字符串



我正在尝试通过缩进块为我正在制作的保存编辑器拆分字符串。例: 我会输入:

"""
def foo():
bar()
oof()
def lol():
foo()
"""

它会输出[["def foo():"], [" bar()"], [" oof()"], ["def lol():", "foo()"]].

这是我的代码:

def splitByIndentation(data: str):
indented = False
prevLine = ''
currentBlock = []
output = []
for line in data.split('n'):
if line.startswith('t') or line.startswith('    '):
if not indented:
currentBlock = [prevLine]
indented = True
currentBlock.append(line)
else:
if indented:
output.append(currentBlock)
indented = False

prevLine = line
return output
if __name__ == '__main__':
print(splitByIndentation("""
def foo():
bar()
oof()
def lol():
foo()"""))

当我运行它时,它只输出[['def foo():', ' bar()', ' oof()']].

你的代码看起来基本正确。您也正确填充currentBlock。但是您当前的条件不会将最后的currentBlock附加到outputBlock.

只需对输入进行一些更改即可自行测试:

"""
def foo():
bar()
oof()
def lol():
foo()
def foo():
bar()
oof()"""))

您可以看到我又添加了一个块,它将给出以下输出:

[['def foo():', '    bar()', '    oof()'], ['def lol():', '    foo()']]

要解决这个问题,您只需在循环结束后附加output列表currentBlock

def splitByIndentation(data: str):
indented = False
prevLine = ''
currentBlock = []
output = []
for line in data.split('n'):
if line.startswith('t') or line.startswith('    '):
if not indented:
currentBlock = [prevLine]
indented = True
currentBlock.append(line)
else:
if indented:
output.append(currentBlock)
indented = False
prevLine = line
output.append(currentBlock)
return output

这将附加剩余的currentBlock

我不确定我是否正确回答了你的问题。但我想你想要这样的输出

[['def foo():', ' bar()', ' oof()'], ['def lol():', ' foo()']]

我已经缩短并更正了您的代码。

def splitByIndentation(data : str):
indented = False
output = []
prevLine = ''
for line in data.split('n'):
if line.startswith('    ') or line.startswith('t'):
if not indented:
output.append([prevLine])
indented = True
output[-1].append(line)
else:
prevLine = line
indented = False
return output
s = """
def foo():
bar()
oof()
def lol():
foo()
"""
print(splitByIndentation(s))

输出如下所示:

[['def foo():', '    bar()', '    oof()'], ['def lol():', '    foo()']]

此解决方案的问题在于,如果您的输入中有一个空的缩进行,例如

"""
def foo():
bar()
oof()

def lol():
foo()
"""

输出将如下所示:

[['def foo():', '    bar()', '    oof()', '    '], ['def lol():', '    foo()']]

要解决此问题,只需在迭代之前删除所有空行即可。

...
lines = [line for line in data.split('n') if line.strip() != '']
for line in lines:
...

当然,您只需在将行附加到块之前使用strip()即可轻松删除输出中的任何前导空格。

output[-1].append(line.strip())

完整的函数如下所示:

def splitByIndentation(data : str):
indented = False
output = []
prevLine = ''
# remove empty lines
lines = [line for line in data.split('n') if line.strip() != '']
for line in lines:
if line.startswith('    ') or line.startswith('t'):
if not indented:
output.append([prevLine])
indented = True
output[-1].append(line.strip()) # append to last block and remove whitespace
else:
prevLine = line
indented = False
return output
s = """
def foo():
bar()
oof()

def lol():
foo()
"""
print(splitByIndentation(s))

这将为您提供输出:

[['def foo():', 'bar()', 'oof()'], ['def lol():', 'foo()']]
我认为这与

您在用例中需要的东西不同,但我正在发布,因为我认为这可能会有所帮助。

不确定您的用例,但是如果我遇到需要我能够根据缩进访问行的情况,我会主要使用字典来更好地访问。

键是缩进级别,值是该缩进级别的行列表。

代码应如下所示:

from collections import defaultdict
def splitByIndentation(data: str):
output = defaultdict(list)
for line in data.split('n'):
intdentationCount = (len(line) - len(line.strip()))//4 # Assuming the indentation is always 4 spaces
output[intdentationCount].append(line.strip())
return dict(output)
if __name__ == '__main__':
print(splitByIndentation(
"""def foo():
bar()
oof()
def lol():
foo()
#fofo()"""))

输出将如下所示:

{0: ['def foo():', 'def lol():'], 1: ['bar()', 'oof()', 'foo()'], 2: ['#fofo()']}

代码的问题在于,只有在indented变得true之后,它才会附加currentBlock顶部output。现在,为了在您的最后一个块中发生这种情况,您实际上需要 1 个最后一个元素,line.startswith('t') or line.startswith(' ')不等于True.

您的原始字符串实际上是正确的:

"""
def foo():
bar()
oof()
def lol():
foo()
"""

这是因为这个字符串将以n('ndef foo():n bar()n oof()ndef lol():n foo()n') 结尾,因此data.split('n')[-1]将变为''。这会让你得到你正在寻找的答案。不幸的是,这不是您在末尾传递给函数的字符串。

实际传递的字符串是:

"""
def foo():
bar()
oof()
def lol():
foo()"""

现在,字符串以' foo()'结尾。检查:

"""
def foo():
bar()
oof()
def lol():
foo()""".split('n')[-1]

因此,有了这些知识,我们可以稍微改变一下您的函数,以确保它以任何一种方式工作:

def splitByIndentation(data: str):
indented = False
prevLine = ''
currentBlock = []
output = []
lines = data3.split('n')
if lines[-1].strip() != '':
lines.append('')
else:
lines[-1] = lines[-1].strip()
for line in lines:

if line.startswith('t') or line.startswith('    '):
if not indented:
currentBlock = [prevLine]
indented = True
currentBlock.append(line)
else:
if indented:
output.append(currentBlock)
indented = False

prevLine = line
return output

让我们给她一个旋转:

data1 = """
def foo():
bar()
oof()
def lol():
foo()"""
data2 = """
def foo():
bar()
oof()
def lol():
foo()
"""

data3 = """
def foo():
bar()
oof()
def lol():
foo()
"""
print(splitByIndentation(data1))
# [['def foo():', '    bar()', '    oof()'], ['def lol():', '    foo()']]
print(splitByIndentation(data1) == splitByIndentation(data2) == splitByIndentation(data3))
# True

最新更新