检查 HUGE 字典是否包含字符串并获取所有匹配元素的元素

我有两个巨大的字典，一个名为DictHashesSource，有2256001行，另一个名为DictHashesTarget的字典有2061735行。

Dictionary<int, string> DictHashesSource = new Dictionary<int, string>();
Dictionary<int, string> DictHashesTarget = new Dictionary<int, string>();

我想做的是，对于DictHashesSource的每个元素，检索DictHashesTarget中匹配的所有元素，并以相反的方式做完全相同的事情。为此，我使用了像下面这样 LINQ：

IEnumerable<string> interceptedRowsSource = DictHashesSource.Values.Where(x => DictHashesTarget.Values.Contains(x)).ToList();
IEnumerable<string> interceptedRowsTarget = DictHashesTarget.Values.Where(x => DictHashesSource.Values.Contains(x)).ToList();

问题是，由于两个字典真的很大，每个操作需要1个多小时，有没有办法降低这个算法的复杂性？

注意：我确实必须使用两个字典，因为我必须在进一步的操作中使用这些键。

另一个注意事项：相同的值在两个字典中没有相同的键

一种方法可能是制作一个反向字典。所以你有更恒定的结果。所以你的值成为键，反之亦然。

Dictionary<int, string> source = new Dictionary<int, string>();
Dictionary<int, string> target = new Dictionary<int, string>();
source.Add(1, "a");
source.Add(2, "b");
source.Add(3, "c");
target.Add(4, "c");
target.Add(5, "d");
target.Add(6, "e");
// Reverse index:
var reverseSource = source.Reverse();
var reverseTarget = target.Reverse();
foreach (var sourceItem in reverseSource)
{
if (reverseTarget.ContainsKey(sourceItem.Key)){
Console.WriteLine($"Source and Target contains {sourceItem.Key}");
}
}

具有以下反向字典功能。

public static Dictionary<TValue, TKey> Reverse<TKey, TValue>(this IDictionary<TKey, TValue> source)
{
var dictionary = new Dictionary<TValue, TKey>();
foreach (var entry in source)
{
if (!dictionary.ContainsKey(entry.Value))
dictionary.Add(entry.Value, entry.Key);
}
return dictionary;
}

如果需要，您可以将所有键添加为逗号分隔列表？

您可以使用两个字典中的值创建 HashSets。

HashSet<string> HashesSourceSet;
HashSet<string> HashesTargetSet;

然后执行以下操作：

var result1 = HashesSourceSet.Where(x => HashesTargetSet.Contains(x)).ToList();
var result2 = HashesTargetSet.Where(x => HashesSourceSet.Contains(x)).ToList();

这会将复杂性降低到 O(n(

-----------------更新--------------------

正如您提到的，您需要哈希计数，您可以执行以下操作：


Dictionary<string, int> HashesWithCount = new Dictionary<string, int>();
foreach (var x in DictHashesSource.Values)
{   
HashesWithCount[x] = HashesWithCount.ContainsKey(x) ? (HashesWithCount [x] + 1) : 1;
}

现在，您有了重复值的计数。

相关内容

最新更新

热门标签：