将多个行文本文件拆分为单个列表

>我需要一些帮助来弄清楚如何将文本文件中的单词拆分为列表。我可以使用这样的东西：

words = []
for line in open('text.txt'):
    line.split()
    words.append(line)

但是，如果文件包含多行文本，则它们将被拆分为子列表，例如

this is the first line
this is the second line

成为：

[['this', 'is', 'the', 'first', 'line'], ['this', 'is', 'the', 'second', 'line']]

我如何使它们在同一个列表中？即

[['this', 'is', 'the', 'first', 'line', 'this', 'is', 'the', 'second', 'line']]

谢谢！

编辑：该程序将打开多个文本文件，因此需要将每个文件中的单词添加到子列表中。因此，如果文件有多行，则这些行中的所有单词应一起存储在子列表中。即每个新文件都会启动一个新的子列表。

您可以使用列表推导，就像这样来扁平化单词列表

[word for words in line.split() for word in words]

这与写作相同

result = []
for words in line.split():
    for word in words:
       result.append(word)

或者你可以使用itertools.chain.from_iterable，像这样

from itertools import chain
with open("Input.txt") as input_file:
    print list(chain.from_iterable(line.split() for line in input_file))

你的代码实际上并没有按照你说的那样做。 line.split() 只返回行中的单词列表，您不对其进行任何操作;它不会以任何方式影响line，所以当你做words.append(line)时，你只是附加了原始行，一个字符串。

因此，首先，您必须解决此问题：

words = []
for line in open('text.txt'):
    words.append(line.split())

现在，您正在做的是反复将新的单词列表附加到空列表中。所以你当然会得到一个单词列表。这是因为您混淆了list append和extend方法。 append 采用任何对象，并将该对象添加为列表中的新元素; extend 接受任何可迭代对象，并将该可迭代对象的每个元素添加为列表的单独新元素。

如果你也解决这个问题：

words = []
for line in open('text.txt'):
    words.extend(line.split())

。现在你得到了你想要的。

不确定为什么要保留 [[]]，但是：

words = [open('text.txt').read().split()]

相关内容

最新更新

热门标签：