《单词词典》作为关键字，它所包含的句子作为值

我有一个文本，我使用set将其拆分为一个唯一单词列表。我还把课文分成一组句子。然后我把句子列表分成一个列表(每个句子中的单词/也许我不需要做最后一部分(

text = 'i was hungry. i got food. now i am not hungry i am full'
sents = ['i was hungry', 'i got food', 'now i am', 'not hungry i am full']
words = ['i', 'was', 'hungry', 'got', 'food', 'now', 'not', 'am', 'full']
split_sents = [['i', 'was', 'hungry'], ['i', 'got', 'food'], ['now', 'i', 'am', 'not','hungry','i','am','full']]

我想写一个循环或列表理解，制作一本词典，其中单词中的每个单词都是一个关键字，如果单词出现在一个句子中，每个句子都被捕获为列表值，这样我就可以获得一些统计数据，比如句子数，还有每个单词的句子平均长度。。。到目前为止，我有以下内容，但这是不对的。

word_freq = {}
for sent in split_sents:
for word in words:
if word in sent:
word_freq[word] += sent
else:
word_freq[word] = sent

它返回一个单词关键字和空值的字典。理想情况下，我希望在没有集合/计数器的情况下完成它，尽管任何解决方案都已通知。我相信这个问题以前也被问过，但我找不到正确的解决方案，所以如果你链接到解决方案，请随时链接并关闭。

以下是一种使用列表和字典理解的方法

代码：

text = 'i was hungry. i got food. now i am not hungry i am full'
sents = ['i was hungry', 'i got food', 'now i am', 'not hungry i am full']
words = ['i', 'was', 'hungry', 'got', 'food', 'now', 'not', 'am', 'full']

word_freq = {w:[s for s in sents if w in s.split()] for w in words }
print(word_freq)

输出：

{
'i': ['i was hungry', 'i got food', 'now i am', 'not hungry i am full'], 
'was': ['i was hungry'], '
hungry': ['i was hungry', 'not hungry i am full'], 
'got': ['i got food'], 
'food': ['i got food'], 
'now': ['now i am'], 
'not': ['not hungry i am full'], 
'am': ['now i am', 'not hungry i am full'], 
'full': ['not hungry i am full']
}

或者，如果您想将句子输出为单词列表：

word_freq = {w:[s.split() for s in sents if w in s.split()] for w in words }

输出：

{
'i': [['i', 'was', 'hungry'], ['i', 'got', 'food'], ['now', 'i', 'am'], ['not', 'hungry', 'i', 'am', 'full']], 
'was': [['i', 'was', 'hungry']], 
'hungry': [['i', 'was', 'hungry'], ['not', 'hungry', 'i', 'am', 'full']], 
'got': [['i', 'got', 'food']], 
'food': [['i', 'got', 'food']], 
'now': [['now', 'i', 'am']], 
'not': [['not', 'hungry', 'i', 'am', 'full']], 
'am': [['now', 'i', 'am'], ['not', 'hungry', 'i', 'am', 'full']], 
'full': [['not', 'hungry', 'i', 'am', 'full']]}

word_freq = {}
for word in set(words):
word_freq[word] = list()
for sent in split_sents:
for word in words:
if word in sent:
word_freq[word].append(sent)
print(word_freq)
=================================================================
{'am': [['now', 'i', 'am', 'not', 'hungry', 'i', 'am', 'full']],
'food': [['i', 'got', 'food']],
'full': [['now', 'i', 'am', 'not', 'hungry', 'i', 'am', 'full']],
'got': [['i', 'got', 'food']],
'hungry': [['i', 'was', 'hungry'],
['now', 'i', 'am', 'not', 'hungry', 'i', 'am', 'full']],
'i': [['i', 'was', 'hungry'],
['i', 'got', 'food'],
['now', 'i', 'am', 'not', 'hungry', 'i', 'am', 'full']],
'not': [['now', 'i', 'am', 'not', 'hungry', 'i', 'am', 'full']],
'now': [['now', 'i', 'am', 'not', 'hungry', 'i', 'am', 'full']],
'was': [['i', 'was', 'hungry']]}

相关内容

最新更新

热门标签：