我正在尝试通过缩进块为我正在制作的保存编辑器拆分字符串。例: 我会输入:
"""
def foo():
bar()
oof()
def lol():
foo()
"""
它会输出[["def foo():"], [" bar()"], [" oof()"], ["def lol():", "foo()"]]
.
这是我的代码:
def splitByIndentation(data: str):
indented = False
prevLine = ''
currentBlock = []
output = []
for line in data.split('n'):
if line.startswith('t') or line.startswith(' '):
if not indented:
currentBlock = [prevLine]
indented = True
currentBlock.append(line)
else:
if indented:
output.append(currentBlock)
indented = False
prevLine = line
return output
if __name__ == '__main__':
print(splitByIndentation("""
def foo():
bar()
oof()
def lol():
foo()"""))
当我运行它时,它只输出[['def foo():', ' bar()', ' oof()']]
.
你的代码看起来基本正确。您也正确填充currentBlock
。但是您当前的条件不会将最后的currentBlock
附加到outputBlock
.
只需对输入进行一些更改即可自行测试:
"""
def foo():
bar()
oof()
def lol():
foo()
def foo():
bar()
oof()"""))
您可以看到我又添加了一个块,它将给出以下输出:
[['def foo():', ' bar()', ' oof()'], ['def lol():', ' foo()']]
要解决这个问题,您只需在循环结束后附加output
列表currentBlock
:
def splitByIndentation(data: str):
indented = False
prevLine = ''
currentBlock = []
output = []
for line in data.split('n'):
if line.startswith('t') or line.startswith(' '):
if not indented:
currentBlock = [prevLine]
indented = True
currentBlock.append(line)
else:
if indented:
output.append(currentBlock)
indented = False
prevLine = line
output.append(currentBlock)
return output
这将附加剩余的currentBlock
。
我不确定我是否正确回答了你的问题。但我想你想要这样的输出
[['def foo():', ' bar()', ' oof()'], ['def lol():', ' foo()']]
我已经缩短并更正了您的代码。
def splitByIndentation(data : str):
indented = False
output = []
prevLine = ''
for line in data.split('n'):
if line.startswith(' ') or line.startswith('t'):
if not indented:
output.append([prevLine])
indented = True
output[-1].append(line)
else:
prevLine = line
indented = False
return output
s = """
def foo():
bar()
oof()
def lol():
foo()
"""
print(splitByIndentation(s))
输出如下所示:
[['def foo():', ' bar()', ' oof()'], ['def lol():', ' foo()']]
此解决方案的问题在于,如果您的输入中有一个空的缩进行,例如
"""
def foo():
bar()
oof()
def lol():
foo()
"""
输出将如下所示:
[['def foo():', ' bar()', ' oof()', ' '], ['def lol():', ' foo()']]
要解决此问题,只需在迭代之前删除所有空行即可。
...
lines = [line for line in data.split('n') if line.strip() != '']
for line in lines:
...
当然,您只需在将行附加到块之前使用strip()
即可轻松删除输出中的任何前导空格。
output[-1].append(line.strip())
完整的函数如下所示:
def splitByIndentation(data : str):
indented = False
output = []
prevLine = ''
# remove empty lines
lines = [line for line in data.split('n') if line.strip() != '']
for line in lines:
if line.startswith(' ') or line.startswith('t'):
if not indented:
output.append([prevLine])
indented = True
output[-1].append(line.strip()) # append to last block and remove whitespace
else:
prevLine = line
indented = False
return output
s = """
def foo():
bar()
oof()
def lol():
foo()
"""
print(splitByIndentation(s))
这将为您提供输出:
[['def foo():', 'bar()', 'oof()'], ['def lol():', 'foo()']]
您在用例中需要的东西不同,但我正在发布,因为我认为这可能会有所帮助。
不确定您的用例,但是如果我遇到需要我能够根据缩进访问行的情况,我会主要使用字典来更好地访问。
键是缩进级别,值是该缩进级别的行列表。
代码应如下所示:
from collections import defaultdict
def splitByIndentation(data: str):
output = defaultdict(list)
for line in data.split('n'):
intdentationCount = (len(line) - len(line.strip()))//4 # Assuming the indentation is always 4 spaces
output[intdentationCount].append(line.strip())
return dict(output)
if __name__ == '__main__':
print(splitByIndentation(
"""def foo():
bar()
oof()
def lol():
foo()
#fofo()"""))
输出将如下所示:
{0: ['def foo():', 'def lol():'], 1: ['bar()', 'oof()', 'foo()'], 2: ['#fofo()']}
代码的问题在于,只有在indented
变得true
之后,它才会附加currentBlock
顶部output
。现在,为了在您的最后一个块中发生这种情况,您实际上需要 1 个最后一个元素,line.startswith('t') or line.startswith(' ')
不等于True
.
您的原始字符串实际上是正确的:
"""
def foo():
bar()
oof()
def lol():
foo()
"""
这是因为这个字符串将以n
('ndef foo():n bar()n oof()ndef lol():n foo()n'
) 结尾,因此data.split('n')[-1]
将变为''
。这会让你得到你正在寻找的答案。不幸的是,这不是您在末尾传递给函数的字符串。
实际传递的字符串是:
"""
def foo():
bar()
oof()
def lol():
foo()"""
现在,此字符串以' foo()'
结尾。检查:
"""
def foo():
bar()
oof()
def lol():
foo()""".split('n')[-1]
因此,有了这些知识,我们可以稍微改变一下您的函数,以确保它以任何一种方式工作:
def splitByIndentation(data: str):
indented = False
prevLine = ''
currentBlock = []
output = []
lines = data3.split('n')
if lines[-1].strip() != '':
lines.append('')
else:
lines[-1] = lines[-1].strip()
for line in lines:
if line.startswith('t') or line.startswith(' '):
if not indented:
currentBlock = [prevLine]
indented = True
currentBlock.append(line)
else:
if indented:
output.append(currentBlock)
indented = False
prevLine = line
return output
让我们给她一个旋转:
data1 = """
def foo():
bar()
oof()
def lol():
foo()"""
data2 = """
def foo():
bar()
oof()
def lol():
foo()
"""
data3 = """
def foo():
bar()
oof()
def lol():
foo()
"""
print(splitByIndentation(data1))
# [['def foo():', ' bar()', ' oof()'], ['def lol():', ' foo()']]
print(splitByIndentation(data1) == splitByIndentation(data2) == splitByIndentation(data3))
# True