Today I spent quite some time on providing a solution for a Stack Overflow answer. It looked quite easy to me, but my solution is complex.
The question can be found here and this is my answer.
If I understood the question correctly it came down to obtaining all values for keyWeWants from a data structure like this:
{ "agg": { "agg1": [ { "keyWeWant": "*-20.0", "asdf": 0, "asdf": 20, "asdf": 14, "some_nested_agg": [ { "keyWeWant2": 20, "to": 25, "doc_count": 4, "some_nested_agg2": { "count": 7, "min": 2, "max": 5, "keyWeWant3": 2.857142857142857, "sum": 20 } }, { "keyWeWant2": 25, "to": 30, "doc_count": 10, "some_nested_agg2": { "count": 16, "min": 2, "max": 10, "keyWeWant3": 6.375, "sum": 102 } }] }]} } The parsed structure should look like this:
[ { "keyWeWant" : "*-20", "keyWeWant2" : 20, "keyWeWant3" : 2.857142857142857 }, { "keyWeWant" : "*-20", "keyWeWant2" : 25, "keyWeWant3" : 6.375 }, { ... }, { ... } ] In the question it is requested that the function looks something like this:
function_name(data_map, { "keyWeWant" : ['agg', 'agg1'], "keyWeWant2" : ['agg', 'agg1', 'some_nested_agg'], "keyWeWant" : ['agg', 'agg1', 'some_nested_agg', 'some_nested_agg2'] }) I took the challenge to solve it exactly with the data structure provided in the question. This was probably not the best approach and I don't think my solution is optimal.
Here's my solution (selectively copied from the answer):
I placed this test data in a file called data.json. Then Cheshire JSON library parses the data to a Clojure data structure:
(use '[cheshire.core :as cheshire]) (def my-data (-> "data.json" slurp cheshire/parse-string)) Next the paths to get are defined as follows:
(def my-data-map {"keyWeWant" ["agg", "agg1"], "keyWeWant2" ["agg", "agg1", "some_nested_agg"], "keyWeWant3" ["agg", "agg1", "some_nested_agg", "some_nested_agg2"]}) It is the data_map of the question without ":", single quotes changed to double quotes and the last "keyWeWant" changed to "keyWeWant3".
find-nested below has the semantics of Clojure's get-in, only then it works on maps with vectors, and returns all values instead of one. When find-nested is given a search vector it finds all values in a nested map where some values can consist of a vector with a list of maps. Every map in the vector is checked.
(defn find-nested "Finds all values in a coll consisting of maps and vectors. All values are returned in a tree structure: i.e, in your problem it returns (20 25) if you call it with (find-nested ['agg', 'agg1', 'some_nested_agg', 'keyWeWant2'] my-data). Returns nil if not found." [ks c] (let [k (first ks)] (cond (nil? k) c (map? c) (find-nested (rest ks) (get c k)) (vector? c) (if-let [e (-> c first (get k))] (if (string? e) e ; do not map over chars in str (map (partial find-nested (rest ks)) e)) (find-nested ks (into [] (rest c)))) ; create vec again :else nil))) find-nested finds the values for a search path:
(find-nested ["agg", "agg1", "some_nested_agg", "keyWeWant2"] my-data) ; => (20 25) If all the paths towards the "keyWeWant's are mapped over my-data these are the slices of a tree:
(*-20.0
(20 25)
(2.857142857142857 6.375))
The requested structure for (all end results with paths getting there) can be obtained from this tree in function-name like this:
(defn function-name "Transforms data d by finding (nested keys) via data-map m in d and flattening the structure." [d m] (let [tree (map #(find-nested (conj (second %) (first %)) d) m) leaves (last tree) leaf-indices (range (count leaves)) results (for [index leaf-indices] (map (fn [slice] (if (string? slice) slice (loop [node (nth slice index)] (if node node (recur (nth slice (dec index))))))) tree)) results-with-paths (mapv #(zipmap (keys m) %) results) json (cheshire/encode results-with-paths)] json)) results uses a loop to step back if a leaf-index is larger than that particular slice. I think it will work out for deeper nested structures as well -if a next slice is always double the size of a previous slice or the same size it should work out -, but I have not tested it.
Calling (function-name my-data my-data-map) leads to a JSON string in your requested format:
[{
"keyWeWant": "-20.0",
"keyWeWant2": 20,
"keyWeWant3": 2.857142857142857 }
{
"keyWeWant": "-20.0",
"keyWeWant2" 25,
"keyWeWant3" 6.375 }]
To improve my Clojure code (e.g., succinctness) I would love to be pointed towards better solutions.