删除Julia中字典中重复的Vector



我正在寻找解决方案,以删除向量中的重复值,该向量是Julia的字典格式。

这是我的字典:

x = Dict{AbstractString,Array{Integer,1}}("A" => [1,2,3], "B" => [3,4,5], "C" => [5,6,7])

以下是预期输出:

Dict{AbstractString, Vector{Integer}} with 3 entries:
"A" => [1, 2]
"B" => [4]
"C" => [6, 7]

这是一种相对较短的方法(我没有对其进行全速优化以保持解决方案的短时间(:

julia> using StatsBase
julia> x = Dict{AbstractString,Array{Integer,1}}("A" => [1,2,3], "B" => [3,4,5], "C" => [5,6,7])
Dict{AbstractString, Vector{Integer}} with 3 entries:
"B" => [3, 4, 5]
"A" => [1, 2, 3]
"C" => [5, 6, 7]
julia> dups = [k for (k, v) in countmap(Iterators.flatten(values(x))) if v > 1]
2-element Vector{Int64}:
5
3
julia> foreach(v -> setdiff!(v, dups), values(x))
julia> x
Dict{AbstractString, Vector{Integer}} with 3 entries:
"B" => [4]
"A" => [1, 2]
"C" => [6, 7]

如果代码中有任何不清楚的地方,请发表评论。

这个解决方案在适当的位置更新您的字典x,因为我认为这是您想要的。

或者,行人风格的

allvalues(x::AbstractDict) = reduce(vcat, collect(values(x)))
function finddups(x::AbstractArray) 
dups = Int[]
filter(item -> item in dups ? true : begin
push!(dups, item)
false end, x)
end

x = Dict{AbstractString,Array{Integer,1}}("A" => [1,2,3], "B" => [3,4,5], "C" => [5,6,7])
dups = x |> allvalues |> finddups
foreach(v -> setdiff!(v, dups), values(x))
x

最新更新