如何从字符串列表中删除字符的所有实例?

我有一个tweet列表，我必须计算每个单词的实例并将其转换为字典。但我也必须删除某些字符，忽略换行符('n')字符，并使所有字符大写。

这是我的代码，但不知何故，我想要删除的一些字符仍然在输出中。我不知道我是不是漏掉了什么。

注意:"tweet_texts"是tweet列表的名称。

words_dict = {} #where I store the words
remove_chars = "&$@[].,'#()-"!?’_" #characters to be removed
tweet_texts = [t.upper() for t in tweet_texts]
tweet_texts = [t.replace('n','') for t in tweet_texts]
for chars in remove_chars:
tweet_texts = [t.replace(chars,'') for t in tweet_texts]

for texts in tweet_texts:

words = texts.split()

for word in words:
if word in words_dict:
words_dict[word] += 1
else:
words_dict[word] = 1

print(words_dict)
>>> {'RT': 53, '1969ENIGMA:': 1, 'SONA': 60,“WALANG': 1, 'SUSTANSYA”:': 1} #this isn't the whole output, the actual output is really long so I cut it

查看示例输出，我可以看到字符“，它看起来很像"，但不在要删除的字符列表中。

print('"' == '“')  # False
print(ord('"'))  # 34
print(ord('“'))  # 8220

也许您可以尝试使用正则表达式只保留单词和空白字符。这样的。

import re
from collections import Counter
clean_tweets = [re.sub(r"[^ws]", "", tweet) for tweet in tweet_texts]
words_dict = Counter()
for tweet in clean_tweets:
words_dict.update(tweet.split())

相关内容

最新更新

热门标签：