如何创建仅包含字符串中每个单词的第一个实例的列表(不包括标点符号，newlines等)

好吧，你们所有天才的程序员和开发人员...我真的可以在此方面使用一些帮助。

我目前正在使用coursera（https：//www.coursera.org/specializations/python）提供的"每个人的Python"，我陷入了任务。

我不知道如何创建仅包含字符串中每个单词的第一个实例的列表：

示例字符串：

my_string = "How much wood would a woodchuck chuck,
             if a woodchuck would chuck wood?"

所需列表：

words_list = ['How', 'much', 'wood', 'would',
              'a', 'woodchuck', 'chuck', 'if']

谢谢大家的时间，考虑和贡献！

您可以用已经看到的单词构建一个列表，并过滤非字母字符：

my_string = "How much wood would a woodchuck chuck, if a woodchuck would chuck wood?"
new_l = []
final_l = []
for word in my_string.split():
    word = ''.join(i for i in word if i.isalpha())
    if word not in new_l:
       final_l.append(word)
       new_l.append(word)

输出：

['How', 'much', 'wood', 'would', 'a', 'woodchuck', 'chuck', 'if']

这可以通过2个步骤完成，首先删除标点符号，然后将单词添加到一个将删除重复的集合中。

python 3：

from string import punctuation #  This is a string of all ascii punctuation characters
trans = str.maketrans('', '', punctuation)
text = 'How much wood would a woodchuck chuck, if a woodchuck would chuck wood?'.translate(trans)
words = set(text.split())

pyhton 2：

from string import punctuation #  This is a string of all ascii punctuation characters
text = 'How much wood would a woodchuck chuck, if a woodchuck would chuck wood?'.translate(None, punctuation)
words = set(text.split())

由于单词的所有实例都是相同的，因此我将提出问题，意味着您想要一个唯一的单词列表。可能最简单的方法是：

import re
non_unique_words = re.findall(r'w+', my_string)
unique_words = list(set(non_unique_words))

're.findall'命令将返回任何单词，然后转换为集合并返回列表将使结果变得独一无二。

尝试：

my_string = "How much wood would a woodchuck chuck, if a woodchuck would chuck wood?"
def replace(word, block):
    for i in block:
        word = word.replace(i, '')
    return word
my_string = replace(my_string, ',?')
result = list(set(my_string.split()))

您可以将re模块和铸件结果使用到set，以删除重复项：

>>> import re
>>> my_string = "How much wood would a woodchuck chuck, if a woodchuck would chuck wood?"
>>> words_list = re.findall(r'w+', my_string)  # Find all words in your string (without punctuation)
>>> words_list_unique = sorted(set(words_list), key=words_list.index)  # Cast your result to a set in order to remove duplicates. Then cast again to a list.
>>> print(words_list_unique)
['How', 'much', 'wood', 'would', 'a', 'woodchuck', 'chuck', 'if']

说明：

w表示 tarne ， w+表示 Word 。
因此，您使用re.findall(r'w+', my_string)来在my_string 中查找所有单词。
set是一个具有唯一元素的集合，因此您将结果列表从re.findall()投入到集合中。
然后您将list（sorted）重新铸造，以获取带有唯一单词的列表。
edit - 如果要保留单词的顺序，则可以使用key=words_list.index使用CC_12，以便保持订购，因为set S是无序的集合。

如果您需要保留单词出现的顺序：

import string
from collections import OrderedDict
def unique_words(text):
    without_punctuation = text.translate({ord(c): None for c in string.punctuation})
    words_dict = OrderedDict((k, None) for k in without_punctuation.split())
    return list(words_dict.keys())
unique_words("How much wood would a woodchuck chuck, if a woodchuck would chuck wood?")
# ['How', 'much', 'wood', 'would', 'a', 'woodchuck', 'chuck', 'if']

我使用OrderedDict，因为在Python标准库中似乎没有订购集。

编辑：

使单词列表案例不敏感的一个可以使字典键较低键： (k.lower(), None) for k in ...

应该足以找到所有单词，然后滤除重复项。

words = re.findall('[a-zA-Z]+', my_string)
words_list = [w for idx, w in enumerate(words) if w not in words[:idx]]

相关内容

最新更新

热门标签：