我正在学习python,并做了以下实验。
text = "this is line one . this is line two . this is line three ."
tokens = text.split(" ") # split text into token with seperator "space"
lioftokens = tokens.split(".") # split tokens into list of tokens with seperator "dot"
print(tokens) # output = ['this', 'is', 'line', 'one', '.', 'this', 'is', 'line', 'two', '.', 'this', 'is', 'line', 'three', '.']
print(lioftokens) # expected output = [['this', 'is', 'line', 'one', '.'],
# ['this', 'is', 'line', 'two', '.'],
# ['this', 'is', 'line', 'three', '.']]
给出错误而不是预期的输出。
split()
是字符串,不是列表。我该如何解决它?
# IamNewToPython
尝试使用list
推导式:
text = "this is line one . this is line two . this is line three ."
print([line.rstrip().split() for line in text.split('.') if line])
输出:
[['this', 'is', 'line', 'one'], ['this', 'is', 'line', 'two'], ['this', 'is', 'line', 'three']]
如果您想保留分隔符,请尝试:
import re
text = "this is line one . this is line two . this is line three ."
print([line.rstrip().split() for line in re.split('([^.]*.)', text) if line])
输出:
[['this', 'is', 'line', 'one', '.'], ['this', 'is', 'line', 'two', '.'], ['this', 'is', 'line', 'three', '.']]
编辑:
如果你想分割列表,试试:
l = ['this', 'is', 'line', 'one', '.', 'this', 'is', 'line', 'two', '.', 'this', 'is', 'line', 'three', '.']
newl = [[]]
for i in l:
newl[-1].append(i)
if i == '.':
newl.append([])
print(newl)
输出:
[['this', 'is', 'line', 'one', '.'], ['this', 'is', 'line', 'two', '.'], ['this', 'is', 'line', 'three', '.'], []]
这行得通:
>>> text = "this is line one . this is line two . this is line three ."
>>> list(filter(None, map(str.split, text.split("."))))
[['this', 'is', 'line', 'one'],
['this', 'is', 'line', 'two'],
['this', 'is', 'line', 'three']]
您可以简单地先按.
拆分列表,然后简单地按map
和str.split
拆分列表中的每个单独字符串。
text = "this is line one . this is line two . this is line three ."
# first split on the periods
sentences = text.split('.')
for s in sentences:
# chop off trailing whitespace and then split on spaces
print(s.rstrip().split())
str.split()方法。
text = "this is line one . this is line two . this is line three ."
print([text.split()[i:i+5] for i in range(0,len(text.split()),5) ])