Interrresting things:
- Transform to manipulate Rows and update the schema automatically. Example:
pcoll.apply( TransformRows.create() .withField("a", TypeDescriptors.longs(), r -> r.getInt64("a")) .withField("b", TypeDescriptors.strings(), r -> r.getString("b") + " " + r.getInt64("a")) .withField("aIsEven", TypeDescriptors.booleans(), r -> r.getInt64("a") % 2 == 0) .withField("c", TypeDescriptors.longs(), generateC) .withField("d", TypeDescriptors.longs(), r -> 2 * generateC.apply(r)) .withField("numbers", TypeDescriptors.lists(TypeDescriptors.longs()), r -> List.of(r.getInt64("key"), r.getInt64("a"))) .withFieldRenamed("b", "name") .withFieldRenamed("d", "doubleC") .drop("key") )- Simple framework to unit test beam pipelines as is, without rewriting them to replace / remove sources and sinks. Example: