我正在尝试制作一个程序来计算用户从文本文件中读取的推文数量。唯一的问题是我需要排除任何带有"DM"或"RT"字样的行。
file = open('stream.txt', 'r')
fileread = file.readlines()
tweets = [string.split() for string in fileread]
如何更改我的代码以确保它排除带有"DM"或"RT"的行?
感谢所有帮助:D
打开文件后请务必关闭文件。最好的方法是使用with open(...)
答案的解决方案是在列表理解中放置一个条件:
with open('stream.txt', 'r') as file:
fileread = file.readlines()
tweets = [string.split() for string in fileread
if not "DM" in string and not "RT" in string]
如果你想排除几个字符串,你可以在某个时候使用any
来节省空间:
with open('stream.txt', 'r') as file:
fileread = file.readlines()
exclude = ["DM", "RT"]
tweets = [string.split() for string in fileread
if not any(exclude[j] in string for j in range(len(exclude)))]
在声明fileread
时过滤掉包含'DM'
和'RT'
的行:
fileread = [l for l in file.readlines() if not 'DM' in l and not 'RT' in l]
您可以简单地遍历文件中的每一行:
tweets = list()
with open('stream.txt', 'r') as f:
for line in f:
if "DM" not in line and "RT" not in line:
tweets.append(line.split())
这是一个简洁的解决方案(因为你似乎通过理解来欣赏列表;-(
file = open('stream.txt', 'r')
fileread = file.readlines()
goodlines = [lines for lines in fileread if lines[:2]!="DM" and lines[:2]!="RT"]
tweets = [string.split() for string in goodlines]
goodlines充当过滤器,如果前两个Caracter与"DM"和"RT"不同,则保留文件读取行。(如果我正确理解了你的问题(