Skip to main content
12 events
when toggle format what by license comment
Oct 28, 2016 at 3:48 comment added WReach Oh, and I should mention that JoinAcross is seriously broken in version 11.0.0 but works just fine in 11.0.1 and 10.x releases -- see the comments to (129122).
Oct 28, 2016 at 3:43 history edited WReach CC BY-SA 3.0
added 3 characters in body
Oct 28, 2016 at 3:39 comment added WReach I added a section demonstrating the use of fold across multiple datasets. The revised approach is more robust in the face of duplicate keys. ## references all arguments, in this case it is shorthand for #, #2. I don't use delayed assignment -- perhaps you mean :>? I use it instead of -> to ensure that n is a local variable. The notation <| ... |> & is simply defining a pure function that returns an association. The term "inner" here comes from relational joins. I'm happy to continue discussion, but perhaps it should be in chat.
Oct 28, 2016 at 3:30 history edited WReach CC BY-SA 3.0
added the section concerning multiple datasets
Oct 27, 2016 at 9:40 comment added SumNeuron @WReach So if I encapsulate the KeyValueMap in the operator form of Query as follows: data[All, KeyValueMap[...]/*Merge[Mean]], it works. Could you possibly break down that function though? I get the string replace patterns. What I do not understand is why 1.) you use a delayed assignment, 2.) why you wrap the string replace in an association and then use a pure function. It comes to reason that #2 is the value associated with the given Key correct? By why this notation? Also where to read more about inner?
Oct 27, 2016 at 9:08 comment added SumNeuron @WReach Also what is with the double slot?
Oct 27, 2016 at 6:14 comment added SumNeuron @WReach unfortunately that doesn't seem to work for Dataset objects? Why are some methods unable to work with Dataset if it is just an association wrapped with a different head?
Oct 21, 2016 at 14:56 comment added WReach If the join criteria are identical for all datasets, we can use something like Fold[JoinAcross[##, "Gene"] &, {d1, d2, d3}]. Often it is the case that the join critieria are not identical or there are key collisions between the datasets. In such cases we need to explicitly nest the JoinAcross expressions. /* essentially chains operators together so that they are applied in order. In queries, the use can be subtle -- see (98193) for discussion.
Oct 21, 2016 at 5:44 comment added SumNeuron @WReach I have many questions about your answer. In no particular order, JoinAcross. My own answer, which appears to be an excessively verbose equivalent to yours, works with an arbitrary number of Dataset objects. JoinAcross requires the first two arguments be separate lists. if you had a variable d={d1,d2,...} how could you alter JoinAcross to handle that? I feel like this is a need for Fold but I never been able to get Fold to work as I wanted to. Also could you possibly elaborate on your use of composition /*?
Oct 13, 2016 at 6:35 comment added Kuba The very first answer with JoinAcross that makes sense to me. I was wondering where this function may be useful. Up to now I considered it a retarded sister of GroupBy+Merge. :) +1
Oct 13, 2016 at 4:36 history edited WReach CC BY-SA 3.0
minor corrections
Oct 13, 2016 at 4:25 history answered WReach CC BY-SA 3.0