我想这个问题有资格作为入门级的clojure问题。我基本上在多次处理clojure地图并提取不同类型的数据时遇到了麻烦。
给定这样的地图,我正在尝试根据多个嵌套键对条目进行计数:
[
{
"a": "X",
"b": "M",
"c": 188
},
{
"a": "Y",
"b": "M",
"c": 165
},
{
"a": "Y",
"b": "M",
"c": 313
},
{
"a": "Y",
"b": "P",
"c": 188
}
]
首先,我想按 a 键值对条目进行分组:
{
"X" : [
{
"b": "M",
"c": 188
}
],
"Y" : [
{
"b": "M",
"c": 165
},
{
"b": "M",
"c": 313
},
{
"b": "P",
"c": 188
}
]
}
其次,我想假设 b 键的值是重复的,并忽略其余键:
{
"X" : [
{
"b": "M"
}
],
"Y" : [
{
"b": "M"
}
{
"b": "P"
}
]
}
然后,只需计算 b 键的所有实例:
{
"X" : 1,
"Y" : 2
}
当我通过 monger 获取数据时,我定义了:
(defn db-query
([coll-name]
(with-open [conn (mg/connect)]
(doall (mc/find-maps (mg/get-db conn db-name) coll-name))))
然后遇到路障:
(defn get-sums [request]
(->> (db-query "data")
(group-by :a)
(into {})
keys))
我该如何从这里继续?
这是一种幼稚的方法,我相信有更好的方法,但这可能是您需要弄清楚的。
(into {}
(map
; f
(fn [ [k vs] ] ;[k `unique count`]
[k (count (into #{} (map #(get % "b") vs)))])
; coll
(group-by #(get % "a") DATA))) ; "a"s as keys
;user=> {"X" 1, "Y" 2}
解释:
; I am using your literal data as DATA, just removed the , and ;
(def DATA [{...
(group-by #(get % "a") DATA) ; groups by "a" as keys
; so I get a map {"X":[{},...] "Y":[{},{},{},...]}
; then I map over each [k v] pair where
; k is the map key and
; vs are the grouped maps in a vector
(fn [ [k vs] ]
; here `k` is e.g. "Y" and `vs` are the maps {a _, b, _, c _}
; now `(map #(get % "b") vs)` gets me all the b values
; `into set` makes them uniqe
; `count` counts them
; finally I return a vector with the same name `k`,
; but the value is the counted `b`s
[k (count (into #{} (map #(get % "b") vs)))])
; at the end I just put the result `[ ["Y" 2] ["X" 1] ]` `into` a map {}
; so you get a map
(def data [{"a" "X", "b" "M", "c" 188}
{"a" "Y", "b" "M", "c" 165}
{"a" "Y", "b" "M", "c" 313}
{"a" "Y", "b" "P", "c" 188}])
;; Borrowing data from @leetwinski
如果要定义数据,您可能需要考虑的一件事是使用关键字而不是字符串作为键。这样做的好处是能够使用关键字作为函数来访问地图中的事物,即 (get my-map "a")
变得(:a my-map)
.
要获取按"a"键分组的数据,请执行以下操作:
(defn by-a-key [data]
(group-by #(get % "a") data))
我认为你实际上可以跳过你的第二步,如果它只是用来让你进入第三步,因为它不是这样做所必需的。在第二次阅读时,我无法判断您是否只想为每个不同的"b"键保留一个元素。我将假设不会,因为您没有指定如何选择要保留的内容,并且它们似乎大不相同。
(reduce-kv
(fn [m k v]
(assoc m k
(count (filter #(contains? % "b") v))))
{}
(by-a-key data))
你也可以像这样做整个事情:
(frequencies (map #(get % "a") (filter #(contains? % "b") data)))
由于您可以在分组之前按包含"b"键进行过滤,因此您可以依靠频率为您分组和计数。
您可以使用reduce
(def data [{"a" "X", "b" "M", "c" 188}
{"a" "Y", "b" "M", "c" 165}
{"a" "Y", "b" "M", "c" 313}
{"a" "Y", "b" "P", "c" 188}])
(def processed (reduce #(update % (%2 "a") (fnil conj #{}) (%2 "b"))
{} data))
;; {"X" #{"M"}, "Y" #{"M" "P"}}
;; you create a map of "a" values to a sets of "b" values in one pass
;; and then you just create a new map with counts
(reduce-kv #(assoc %1 %2 (count %3)) {} processed)
;; {"X" 1, "Y" 2}
因此,它使用与@birdspider的解决方案相同的逻辑,但对集合使用的传递更少
在一个函数中:
(defn process [data]
(->> data
(reduce #(update % (%2 "a") (fnil conj #{}) (%2 "b")) {})
(reduce-kv #(assoc %1 %2 (count %3)) {})))
user> (process data)
;; {"X" 1, "Y" 2}