在python中通过缩进块拆分字符串

我正在尝试通过缩进块为我正在制作的保存编辑器拆分字符串。例：我会输入：

"""
def foo():
bar()
oof()
def lol():
foo()
"""

它会输出[["def foo():"], [" bar()"], [" oof()"], ["def lol():", "foo()"]].

这是我的代码：

def splitByIndentation(data: str):
indented = False
prevLine = ''
currentBlock = []
output = []
for line in data.split('n'):
if line.startswith('t') or line.startswith('    '):
if not indented:
currentBlock = [prevLine]
indented = True
currentBlock.append(line)
else:
if indented:
output.append(currentBlock)
indented = False

prevLine = line
return output
if __name__ == '__main__':
print(splitByIndentation("""
def foo():
bar()
oof()
def lol():
foo()"""))

当我运行它时，它只输出[['def foo():', ' bar()', ' oof()']].

你的代码看起来基本正确。您也正确填充currentBlock。但是您当前的条件不会将最后的currentBlock附加到outputBlock.

只需对输入进行一些更改即可自行测试：

"""
def foo():
bar()
oof()
def lol():
foo()
def foo():
bar()
oof()"""))

您可以看到我又添加了一个块，它将给出以下输出：

[['def foo():', '    bar()', '    oof()'], ['def lol():', '    foo()']]

要解决这个问题，您只需在循环结束后附加output列表currentBlock：

def splitByIndentation(data: str):
indented = False
prevLine = ''
currentBlock = []
output = []
for line in data.split('n'):
if line.startswith('t') or line.startswith('    '):
if not indented:
currentBlock = [prevLine]
indented = True
currentBlock.append(line)
else:
if indented:
output.append(currentBlock)
indented = False
prevLine = line
output.append(currentBlock)
return output

这将附加剩余的currentBlock。

我不确定我是否正确回答了你的问题。但我想你想要这样的输出

[['def foo():', ' bar()', ' oof()'], ['def lol():', ' foo()']]

我已经缩短并更正了您的代码。

def splitByIndentation(data : str):
indented = False
output = []
prevLine = ''
for line in data.split('n'):
if line.startswith('    ') or line.startswith('t'):
if not indented:
output.append([prevLine])
indented = True
output[-1].append(line)
else:
prevLine = line
indented = False
return output
s = """
def foo():
bar()
oof()
def lol():
foo()
"""
print(splitByIndentation(s))

输出如下所示：

[['def foo():', '    bar()', '    oof()'], ['def lol():', '    foo()']]

此解决方案的问题在于，如果您的输入中有一个空的缩进行，例如

"""
def foo():
bar()
oof()

def lol():
foo()
"""

输出将如下所示：

[['def foo():', '    bar()', '    oof()', '    '], ['def lol():', '    foo()']]

要解决此问题，只需在迭代之前删除所有空行即可。

...
lines = [line for line in data.split('n') if line.strip() != '']
for line in lines:
...

当然，您只需在将行附加到块之前使用strip()即可轻松删除输出中的任何前导空格。

output[-1].append(line.strip())

完整的函数如下所示：

def splitByIndentation(data : str):
indented = False
output = []
prevLine = ''
# remove empty lines
lines = [line for line in data.split('n') if line.strip() != '']
for line in lines:
if line.startswith('    ') or line.startswith('t'):
if not indented:
output.append([prevLine])
indented = True
output[-1].append(line.strip()) # append to last block and remove whitespace
else:
prevLine = line
indented = False
return output
s = """
def foo():
bar()
oof()

def lol():
foo()
"""
print(splitByIndentation(s))

这将为您提供输出：

[['def foo():', 'bar()', 'oof()'], ['def lol():', 'foo()']]

我认为这与

您在用例中需要的东西不同，但我正在发布，因为我认为这可能会有所帮助。

不确定您的用例，但是如果我遇到需要我能够根据缩进访问行的情况，我会主要使用字典来更好地访问。

键是缩进级别，值是该缩进级别的行列表。

代码应如下所示：

from collections import defaultdict
def splitByIndentation(data: str):
output = defaultdict(list)
for line in data.split('n'):
intdentationCount = (len(line) - len(line.strip()))//4 # Assuming the indentation is always 4 spaces
output[intdentationCount].append(line.strip())
return dict(output)
if __name__ == '__main__':
print(splitByIndentation(
"""def foo():
bar()
oof()
def lol():
foo()
#fofo()"""))

输出将如下所示：

{0: ['def foo():', 'def lol():'], 1: ['bar()', 'oof()', 'foo()'], 2: ['#fofo()']}

代码的问题在于，只有在indented变得true之后，它才会附加currentBlock顶部output。现在，为了在您的最后一个块中发生这种情况，您实际上需要 1 个最后一个元素，line.startswith('t') or line.startswith(' ')不等于True.

您的原始字符串实际上是正确的：

"""
def foo():
bar()
oof()
def lol():
foo()
"""

这是因为这个字符串将以n('ndef foo():n bar()n oof()ndef lol():n foo()n') 结尾，因此data.split('n')[-1]将变为''。这会让你得到你正在寻找的答案。不幸的是，这不是您在末尾传递给函数的字符串。

实际传递的字符串是：

"""
def foo():
bar()
oof()
def lol():
foo()"""

现在，此字符串以' foo()'结尾。检查：

"""
def foo():
bar()
oof()
def lol():
foo()""".split('n')[-1]

因此，有了这些知识，我们可以稍微改变一下您的函数，以确保它以任何一种方式工作：

def splitByIndentation(data: str):
indented = False
prevLine = ''
currentBlock = []
output = []
lines = data3.split('n')
if lines[-1].strip() != '':
lines.append('')
else:
lines[-1] = lines[-1].strip()
for line in lines:

if line.startswith('t') or line.startswith('    '):
if not indented:
currentBlock = [prevLine]
indented = True
currentBlock.append(line)
else:
if indented:
output.append(currentBlock)
indented = False

prevLine = line
return output

让我们给她一个旋转：

data1 = """
def foo():
bar()
oof()
def lol():
foo()"""
data2 = """
def foo():
bar()
oof()
def lol():
foo()
"""

data3 = """
def foo():
bar()
oof()
def lol():
foo()
"""
print(splitByIndentation(data1))
# [['def foo():', '    bar()', '    oof()'], ['def lol():', '    foo()']]
print(splitByIndentation(data1) == splitByIndentation(data2) == splitByIndentation(data3))
# True

相关内容

最新更新

热门标签：