Given a dataset as such
If "letter" is the header that is chosen, how do I convert it into an indexed dataset / association-of-associations?
i.e. How do I define f such that f[dataset_,columnHeader_] produces the following?
Please note GroupBy is close but fails as you are unable to use Part to work with the result to extract column data. eg:
data = {<|"letter" -> "a", "foo" -> 1, "bar" -> 2|>, <|"letter" -> "b", "foo" -> 3, "bar" -> 4|>, <|"letter" -> "c", "foo" -> 5, "bar" -> 6|>}; dataDS = Dataset[data]; dataDSg= GroupBy[dataDS, Key["letter"]]; dataDSg[All, "foo"] (* <- produces an error *) Where as data in the format of an association-of-association works fine
data2 = <|"a" -> <|"foo" -> 1, "bar" -> 2|>, "b" -> <|"foo" -> 3, "bar" -> 4|>, "c" -> <|"foo" -> 5, "bar" -> 6|>|>; data2DS = data2 // Dataset; data2DS [All, "foo"] (* <- returns a dataset with 1,3,5 *) Update
Some timing comparisons
(* make dataset to test *) colHeader = CharacterRange["a", "z"]; colHeader[[1]] = "letter"; data = RandomReal[{-1, 1}, {100000, 26}]; table = Insert[data, colHeader, 1]; dataDS = Dataset[AssociationThread[table[[1]], #] & /@ table[[2 ;;]]]; Anton Antonov answer
f[ds_Dataset, ch_] := Dataset@Association@Normal@ds[All, #[ch] -> KeyDrop[#, ch] &] fAns = f[dataDS, "letter"]; // RepeatedTiming (* 0.934 *) kglr answer
f0 = GroupBy[##, Association@*KeyDrop[#2]] &; f0ans = f0[dataDS, "letter"]; // RepeatedTiming (* 1.85 *) f1 = #[GroupBy[#2] /* Map[Association@*KeyDrop[#2]]] &; f1ans = f1[dataDS, "letter"]; // RepeatedTiming (* 1.714 *) Sjoerd Smit answer
groupByKey[ds_, key_String] := GroupBy[ds, Function[Slot[key]] -> KeyDrop[key], First]; groupByKeyAns = groupByKey[dataDS, "letter"]; // RepeatedTiming (* 1.2 *) some other timings that don't produce an answer but help to put the times above into context
GroupBy[dataDS, "letter"]; // RepeatedTiming (* 0.25 *) Dataset[Normal[dataDS]]; // RepeatedTiming (* 0.38 *) 







