好吧,你们所有天才的程序员和开发人员...我真的可以在此方面使用一些帮助。
我目前正在使用coursera(https://www.coursera.org/specializations/python)提供的"每个人的Python",我陷入了任务。
我不知道如何创建仅包含字符串中每个单词的第一个实例的列表:
示例字符串:
my_string = "How much wood would a woodchuck chuck,
if a woodchuck would chuck wood?"
所需列表:
words_list = ['How', 'much', 'wood', 'would',
'a', 'woodchuck', 'chuck', 'if']
谢谢大家的时间,考虑和贡献!
您可以用已经看到的单词构建一个列表,并过滤非字母字符:
my_string = "How much wood would a woodchuck chuck, if a woodchuck would chuck wood?"
new_l = []
final_l = []
for word in my_string.split():
word = ''.join(i for i in word if i.isalpha())
if word not in new_l:
final_l.append(word)
new_l.append(word)
输出:
['How', 'much', 'wood', 'would', 'a', 'woodchuck', 'chuck', 'if']
这可以通过2个步骤完成,首先删除标点符号,然后将单词添加到一个将删除重复的集合中。
python 3:
from string import punctuation # This is a string of all ascii punctuation characters
trans = str.maketrans('', '', punctuation)
text = 'How much wood would a woodchuck chuck, if a woodchuck would chuck wood?'.translate(trans)
words = set(text.split())
pyhton 2:
from string import punctuation # This is a string of all ascii punctuation characters
text = 'How much wood would a woodchuck chuck, if a woodchuck would chuck wood?'.translate(None, punctuation)
words = set(text.split())
由于单词的所有实例都是相同的,因此我将提出问题,意味着您想要一个唯一的单词列表。可能最简单的方法是:
import re
non_unique_words = re.findall(r'w+', my_string)
unique_words = list(set(non_unique_words))
're.findall'命令将返回任何单词,然后转换为集合并返回列表将使结果变得独一无二。
尝试:
my_string = "How much wood would a woodchuck chuck, if a woodchuck would chuck wood?"
def replace(word, block):
for i in block:
word = word.replace(i, '')
return word
my_string = replace(my_string, ',?')
result = list(set(my_string.split()))
您可以将re
模块和铸件结果使用到set
,以删除重复项:
>>> import re
>>> my_string = "How much wood would a woodchuck chuck, if a woodchuck would chuck wood?"
>>> words_list = re.findall(r'w+', my_string) # Find all words in your string (without punctuation)
>>> words_list_unique = sorted(set(words_list), key=words_list.index) # Cast your result to a set in order to remove duplicates. Then cast again to a list.
>>> print(words_list_unique)
['How', 'much', 'wood', 'would', 'a', 'woodchuck', 'chuck', 'if']
说明:
-
w
表示 tarne ,w+
表示 Word 。 - 因此,您使用
re.findall(r'w+', my_string)
来在my_string
中查找所有单词。 -
set
是一个具有唯一元素的集合,因此您将结果列表从re.findall()
投入到集合中。 - 然后您将
list
(sorted
)重新铸造,以获取带有唯一单词的列表。 - edit - 如果要保留单词的顺序,则可以使用
key=words_list.index
使用CC_12,以便保持订购,因为set
S是无序的集合。
如果您需要保留单词出现的顺序:
import string
from collections import OrderedDict
def unique_words(text):
without_punctuation = text.translate({ord(c): None for c in string.punctuation})
words_dict = OrderedDict((k, None) for k in without_punctuation.split())
return list(words_dict.keys())
unique_words("How much wood would a woodchuck chuck, if a woodchuck would chuck wood?")
# ['How', 'much', 'wood', 'would', 'a', 'woodchuck', 'chuck', 'if']
我使用OrderedDict,因为在Python标准库中似乎没有订购集。
编辑:
使单词列表案例不敏感的一个可以使字典键较低键: (k.lower(), None) for k in ...
应该足以找到所有单词,然后滤除重复项。
words = re.findall('[a-zA-Z]+', my_string)
words_list = [w for idx, w in enumerate(words) if w not in words[:idx]]