python将两个字典合并为嵌套字典(文本相似性)



我有以下文档:

documents = ["Human machine interface for lab abc computer applications",
"A survey of user opinion of computer system response time",
"The EPS user interface management system",
"System and human system engineering testing of EPS",
"Relation of user perceived response time to error measurement",
"The generation of random binary unordered trees",
"The intersection graph of paths in trees",
"Graph minors IV Widths of trees and well quasi ordering",
"Graph minors A survey"]

我从中构建了一个单词矩阵:

wordmatrix = []
wordmatrix = [sentences.split(" ") for sentences in documents]

输出:

[["人类","机器","接口","for","实验室","abc","计算机","应用程序"],["A","调查","of","用户","意见","of',"计算机","系统","响应","时间"],["该","EPS","用户","接口","管理","系统"],["系统","one_answers","人","系统","工程","测试","of","EPS"],["关系","of',"用户"、"感知"、"响应"、"时间"、"到"、"错误","measurement"],["The","generation","of","random","binary","无序","树"],["The","intersection","graph","of","paths","in","trees"],["Graph","minor","IV","Widths","of","trees",'and','well','quasi','ordering'],['Graph','minor','A',"调查"]]

接下来,我想创建一个字典,每个文档都有一个关键字,单词作为关键字和值,即单词在文档中出现的频率

但我只走了这么远:

初始化字典

dic1 = {}
dic2 = {}
d = {}

第一本字典为每份文档提供一个密钥:

dic1 = dict(enumerate(sentence for sentence in wordmatrix))

输出:

{0:["人","机器","接口","for","lab","abc","computer","应用程序"],1:["A","调查","of","用户","意见","of',"计算机","系统","响应","时间"],2:["该","EPS","用户","接口","管理","系统"],3:["系统","one_answers","人","系统","工程","测试","of","EPS"],4:["关系","of"、"user"、"percepted"、"response"、"time"、"to"、"error","measurement"],5:["The","generation","of","random","binary","无序","树"],6:["该","交集","图","of","paths","in","trees"],7:["Graph","minor","IV","Widths","trees","and","well","quasie","ordering"],8:["Graph","minor","A","调查"]}

还有第二本词典,将每个单词都作为一个关键字:

for sentence in wordmatrix:
for word in sentence:
dic2[word] = dic2.get(word, 0) + 1

输出:

{"人类":1,"机器":1、"接口":2、"for":1和"lab":1以及"abc":1,"计算机":2,"应用程序":1,"A":2,"用户":3,"意见":1,"系统":3、"响应":2,"时间":2、"The":3,"EPS":2,"管理":1,"系统":1、"one_answers":2、"人":1,"工程":1,"测试":11,"错误":1,"测量":1,"binary":1,"无序":1、"trees":3、"intersection":1和"graph":1,"路径":1,"in":1、"Graph":2、"minor":2,"IV":1和"Widths":1,"井":1,"准":1、"排序":1}

但是,我想将两个字典合并到一个字典中,它应该是这样的:{0:{"人":1,"机器":1、"接口":2,……}、1:(依此类推(}

谢谢!

您不必组合两个dict,只有当您有dic2时,您才能用dic2构建一个新的dict。

for line_num, sentence in enumerate(wordmatrix):
dic1[line_num] = {}
for word in sentence:
dic1[line_num][word] = dic2[word]

最新更新