《单词词典》作为关键字,它所包含的句子作为值



我有一个文本,我使用set将其拆分为一个唯一单词列表。我还把课文分成一组句子。然后我把句子列表分成一个列表(每个句子中的单词/也许我不需要做最后一部分(

text = 'i was hungry. i got food. now i am not hungry i am full'
sents = ['i was hungry', 'i got food', 'now i am', 'not hungry i am full']
words = ['i', 'was', 'hungry', 'got', 'food', 'now', 'not', 'am', 'full']
split_sents = [['i', 'was', 'hungry'], ['i', 'got', 'food'], ['now', 'i', 'am', 'not','hungry','i','am','full']]

我想写一个循环或列表理解,制作一本词典,其中单词中的每个单词都是一个关键字,如果单词出现在一个句子中,每个句子都被捕获为列表值,这样我就可以获得一些统计数据,比如句子数,还有每个单词的句子平均长度。。。到目前为止,我有以下内容,但这是不对的。

word_freq = {}
for sent in split_sents:
for word in words:
if word in sent:
word_freq[word] += sent
else:
word_freq[word] = sent

它返回一个单词关键字和空值的字典。理想情况下,我希望在没有集合/计数器的情况下完成它,尽管任何解决方案都已通知。我相信这个问题以前也被问过,但我找不到正确的解决方案,所以如果你链接到解决方案,请随时链接并关闭。

以下是一种使用列表和字典理解的方法

代码:

text = 'i was hungry. i got food. now i am not hungry i am full'
sents = ['i was hungry', 'i got food', 'now i am', 'not hungry i am full']
words = ['i', 'was', 'hungry', 'got', 'food', 'now', 'not', 'am', 'full']

word_freq = {w:[s for s in sents if w in s.split()] for w in words }
print(word_freq)

输出:

{
'i': ['i was hungry', 'i got food', 'now i am', 'not hungry i am full'], 
'was': ['i was hungry'], '
hungry': ['i was hungry', 'not hungry i am full'], 
'got': ['i got food'], 
'food': ['i got food'], 
'now': ['now i am'], 
'not': ['not hungry i am full'], 
'am': ['now i am', 'not hungry i am full'], 
'full': ['not hungry i am full']
}

或者,如果您想将句子输出为单词列表:

word_freq = {w:[s.split() for s in sents if w in s.split()] for w in words }

输出:

{
'i': [['i', 'was', 'hungry'], ['i', 'got', 'food'], ['now', 'i', 'am'], ['not', 'hungry', 'i', 'am', 'full']], 
'was': [['i', 'was', 'hungry']], 
'hungry': [['i', 'was', 'hungry'], ['not', 'hungry', 'i', 'am', 'full']], 
'got': [['i', 'got', 'food']], 
'food': [['i', 'got', 'food']], 
'now': [['now', 'i', 'am']], 
'not': [['not', 'hungry', 'i', 'am', 'full']], 
'am': [['now', 'i', 'am'], ['not', 'hungry', 'i', 'am', 'full']], 
'full': [['not', 'hungry', 'i', 'am', 'full']]}
word_freq = {}
for word in set(words):
word_freq[word] = list()
for sent in split_sents:
for word in words:
if word in sent:
word_freq[word].append(sent)
print(word_freq)
=================================================================
{'am': [['now', 'i', 'am', 'not', 'hungry', 'i', 'am', 'full']],
'food': [['i', 'got', 'food']],
'full': [['now', 'i', 'am', 'not', 'hungry', 'i', 'am', 'full']],
'got': [['i', 'got', 'food']],
'hungry': [['i', 'was', 'hungry'],
['now', 'i', 'am', 'not', 'hungry', 'i', 'am', 'full']],
'i': [['i', 'was', 'hungry'],
['i', 'got', 'food'],
['now', 'i', 'am', 'not', 'hungry', 'i', 'am', 'full']],
'not': [['now', 'i', 'am', 'not', 'hungry', 'i', 'am', 'full']],
'now': [['now', 'i', 'am', 'not', 'hungry', 'i', 'am', 'full']],
'was': [['i', 'was', 'hungry']]}

最新更新