Note that by adding the key "c" to only one of the associations, the data type switchswitched from Assoc to Struct and gave us the vertical key/value pair visualization we saw earlier.. If we had added "c" to all assocations, we would have retained the Struct tabular visualization:
Note that by adding the key "c" to only one of the associations, the data type switch from Assoc to Struct and gave us the vertical key/value pair visualization we saw earlier.. If we had added "c" to all assocations, we would have retained the Struct tabular visualization:
Note that by adding the key "c" to only one of the associations, the data type switched from Assoc to Struct and gave us the vertical key/value pair visualization we saw earlier.. If we had added "c" to all assocations, we would have retained the Struct tabular visualization:
There are two issues under discussion: 1) the distinct dataset visualizations for the same data and 2) ways to update dataset subelements in place. We will discuss these separately.
Distinct Dataset Visualizations
The way a dataset is displayed is sensitive to the data type of the dataset. That type, in turn, is sensitive to the history of the dataset. This is discussed at length in (143551). For the case at hand, we can see how the data type evolves with each AppendTo operation:
Needs["Dataset`"] Needs["TypeSystem`"] { db = Dataset[{}] , AppendTo[db,<|"a" -> 1, "b" -> 2|>] , AppendTo[db,<|"a" -> 2, "b" -> 5|>] , AppendTo[db,<|"x" -> 2, "y" -> 5|>] } // Unevaluated // Map[{#, GetType[#]}&] // Grid[#, Frame->All, Alignment->Left]& The principal data type is a Vector of Assoc. The last row shows how adding the incompatible keys "x" and "y" switched the key type from Enumeration to the generic AnyType.
Now constrast this to db2:
db2 = Dataset[{<|"a" -> 1, "b" -> 2|>, <|"a" -> 2, "b" -> 5|>}] db2 // GetType (* Vector[Struct[{a, b}, {Atom[Integer], Atom[Integer]}], 2] *) The principal data type is now a Vector of Struct. A "struct" represents the case when the dataset is known to contain associations of consistent type. It deduced this at the time that db2 was constructed.
In the case of db which was built incrementally, the type system infers the final data type from a combination of the initial data type and the type transformations of any applied operators (e.g. AppendTo). Such type inferencing is generally less specific than the type deduction that occurs at construction time. We can use Dataset as a query operator to force reconstruction of a dataset and thereby deduce its data type anew:
db = Dataset[{}]; AppendTo[db, <|"a" -> 1, "b" -> 2|>]; AppendTo[db, <|"a" -> 2, "b" -> 5|>] db // GetType (* Vector[Assoc[Atom[Enumeration["a", "b"]], Atom[Integer], 2], 2] *) db = db[Dataset] db // GetType (* Vector[Struct[{"a", "b"}, {Atom[Integer], Atom[Integer]}], 2] *) Updating Subelements of Datasets
There are presently very few ways to update a Dataset in place. See, for example, the discussion in (54491) or the work-around sketched in (141916). In particular, the kinds of update contemplated in the question are not presently supported.
The way to achieve such alterations presently is through query operators. For example, we can append a new key "c" to element 1:
db3 = db2[{1 -> Append["c" -> 7]}] Note that by adding the key "c" to only one of the associations, the data type switch from Assoc to Struct and gave us the vertical key/value pair visualization we saw earlier.. If we had added "c" to all assocations, we would have retained the Struct tabular visualization:
db3 = db2[{1 -> Append["c" -> 7], 2 -> Append["c" -> 8]}] The closest thing to updating a dataset in place is expressed as db = db[...ops...].
It is possible to update a simple list of associations in place:
$list = db2 // Normal (* {<|"a" -> 1, "b" -> 2|>, <|"a" -> 2, "b" -> 5|>} *) $list[[1, "c"]] = 7; $list (* {<|"a" -> 1, "b" -> 2, "c" -> 7|>, <|"a" -> 2, "b" -> 5|>} *) Closing Comments
Beware that performing large numbers of incremental changes to datasets will likely get progressively slower. This is the dataset analog to repeatedly applying AppendTo to a list, a strategy which exhibits a slow-down proportional to the square of the length of the list. The dataset infrastructure is best suited for operators that are applied to significant subsets of the dataset all at once (e.g. one or more complete columns).
The operation of the dataset type system is discussed in (89080). Choosing between datasets or associations is discussed in (87360)





