在线查找列表集合中最相似的列表

我有一组列表，每个列表都表示图上的路径：

list1 = [1,2,3,6] # directed edge from 1->2, 2->3, 3->6
list2 = [8,3,5,6]
list3 = [9,1,3,4]
list4 = [7,8,1,4]

我还有图的邻接矩阵。

每走一步，我都会获得优势：例如，时间步长0:[1,2]，时间步长1:[3,6]，并且在每个时间步长，我必须在考虑之前的时间步长的情况下找到最相似的列表。意思是，列表是最完整的。

什么是有效的方法？

我试图使用一种天真的方法，将传入的边与每个列表中的每个边进行比较，但考虑到我有大量的列表，每个列表都有大量的边，这太慢了。

更新：在每个时间步长编写一个示例输入和输出。

时间步长0：输入[1,2]，输出list1

时间步长1：输入[8,3]，输出list1, list2 #equally complete

时间步长2：输入[3,6]，输出list1

更新2：感谢@Nuclearman我编码了(也许是天真的？(解决方案

list1 = [1,2,3,6] # directed edge from 1->2, 2->3, 3->6
list2 = [8,3,5,6]
list3 = [9,1,3,4]
list4 = [7,8,1,4]
lists_dict = {
'list1' : list1,
'list2' : list2,
'list3' : list3,
'list4' : list4
}

edges = {
'list1' : len(list1) - 1,
'list2' : len(list2) - 1,
'list3' : len(list3) - 1,
'list4' : len(list4) - 1
}
covered_edges = {
'list1' : 0,
'list2' : 0,
'list3' : 0,
'list4' : 0
}
completeness = {
'list1' : covered_edges['list1']/edges['list1'],
'list2' : covered_edges['list2']/edges['list2'],
'list3' : covered_edges['list3']/edges['list3'],
'list4' : covered_edges['list4']/edges['list4']
}
graph = {}
for list_name in lists_dict.keys():
idx = 0

while idx < len(lists_dict[list_name])-1:

node1 = lists_dict[list_name][idx]
node2 = lists_dict[list_name][idx+1]
if node1 in graph.keys(): #if exist
graph[node1][node2] =  list_name

else: #if doesnt exist
graph[node1] = {node2: list_name}

idx+=1

times= [[1,2],[3,5],[5,6],[8,1],[7,8]]
for time in times:
edge_in_list = graph[time[0]][time[1]] #list name
covered_edges[edge_in_list] +=1
print(covered_edges)

completeness = {
'list1' : covered_edges['list1']/edges['list1'],
'list2' : covered_edges['list2']/edges['list2'],
'list3' : covered_edges['list3']/edges['list3'],
'list4' : covered_edges['list4']/edges['list4']
}

mx = max(completeness.values())
max_list = [k for k, v in completeness.items() if v == mx]

print(max_list)
print('')

尝试使用邻接列表设置作为嵌套哈希来表示图

IE：你的整个例子可以这样设置(不要记得这是否是有效的python(：

graph = {
1: {2: [1], 3: [3], 4: [4] },
2: {3: [1] },
3: {6: [1], 5: [2], 4: [3] },
5: {6: [2] },
7: {8: [4] },
8: {3: [2], 1: [4] },
9: {1: [3] },
}

然后，您只需记录每个列表中剩余的数量，并使用O(log N)或更好的find min(或find max，只需调整键(、查找、添加项和删除项将其存储到数据结构中。你可能需要做一些数学运算，这取决于你如何定义完整性。IE：您可能需要存储总边和覆盖边，然后使用[(total - covered) / total, list #]或作为数据结构的密钥。这样，即使有多个列表具有相同的完整性，您也可以快速找到该列表。对于您想要的结果，返回具有最高完整性的所有列表。

通过上图，您可以快速确定每条边的列表，然后在剩余计数中查找该边，并将每个列表的计数减少一。IE：您可以看到graph[1][2]是列表1，graph[8][3]是列表2，graph[3][6]也是列表1。

为了提高性能，您可能还希望保留一组已看到的边，并跳过任何已看到的边缘，尽管这可能是需要的，也可能不是需要的，而且可能不是您想要处理的方式。

性能与需要更改的列表数量成比例，因此对输出敏感。如果提供的示例有任何内容，那么与列表数量相比，您需要为每个传入边缘更新的列表计数数量可能非常少。如果所有L列表中都有E总边，并且您需要在线处理K边，并且这些K边导致处理总A列表(A是一个输出敏感变量，取决于列表之间的重叠程度，您给出的示例为零重叠，因为每条边都有一个与之相关的列表，但不清楚更多的列表和边是否仍会存在这种情况(。那么我相信性能是O(E + K + AlogL)，因为那些A处理的列表每个都需要log L查找来查找+更新列表计数。E是构建图所需的预处理。这看起来基本上是乐观的，除非有其他事情。可能比您目前拥有的O(K*E)要好得多，除非您有极高的重叠(A(。

相关内容

最新更新

热门标签：