Timeline for Dataset Processing: efficient ways to clean and merge sets for Life Sciences
Current License: CC BY-SA 3.0
12 events
| when toggle format | what | by | license | comment | |
|---|---|---|---|---|---|
| Oct 28, 2016 at 3:48 | comment | added | WReach | Oh, and I should mention that JoinAcross is seriously broken in version 11.0.0 but works just fine in 11.0.1 and 10.x releases -- see the comments to (129122). | |
| Oct 28, 2016 at 3:43 | history | edited | WReach | CC BY-SA 3.0 | added 3 characters in body |
| Oct 28, 2016 at 3:39 | comment | added | WReach | I added a section demonstrating the use of fold across multiple datasets. The revised approach is more robust in the face of duplicate keys. ## references all arguments, in this case it is shorthand for #, #2. I don't use delayed assignment -- perhaps you mean :>? I use it instead of -> to ensure that n is a local variable. The notation <| ... |> & is simply defining a pure function that returns an association. The term "inner" here comes from relational joins. I'm happy to continue discussion, but perhaps it should be in chat. | |
| Oct 28, 2016 at 3:30 | history | edited | WReach | CC BY-SA 3.0 | added the section concerning multiple datasets |
| Oct 27, 2016 at 9:40 | comment | added | SumNeuron | @WReach So if I encapsulate the KeyValueMap in the operator form of Query as follows: data[All, KeyValueMap[...]/*Merge[Mean]], it works. Could you possibly break down that function though? I get the string replace patterns. What I do not understand is why 1.) you use a delayed assignment, 2.) why you wrap the string replace in an association and then use a pure function. It comes to reason that #2 is the value associated with the given Key correct? By why this notation? Also where to read more about inner? | |
| Oct 27, 2016 at 9:08 | comment | added | SumNeuron | @WReach Also what is with the double slot? | |
| Oct 27, 2016 at 6:14 | comment | added | SumNeuron | @WReach unfortunately that doesn't seem to work for Dataset objects? Why are some methods unable to work with Dataset if it is just an association wrapped with a different head? | |
| Oct 21, 2016 at 14:56 | comment | added | WReach | If the join criteria are identical for all datasets, we can use something like Fold[JoinAcross[##, "Gene"] &, {d1, d2, d3}]. Often it is the case that the join critieria are not identical or there are key collisions between the datasets. In such cases we need to explicitly nest the JoinAcross expressions. /* essentially chains operators together so that they are applied in order. In queries, the use can be subtle -- see (98193) for discussion. | |
| Oct 21, 2016 at 5:44 | comment | added | SumNeuron | @WReach I have many questions about your answer. In no particular order, JoinAcross. My own answer, which appears to be an excessively verbose equivalent to yours, works with an arbitrary number of Dataset objects. JoinAcross requires the first two arguments be separate lists. if you had a variable d={d1,d2,...} how could you alter JoinAcross to handle that? I feel like this is a need for Fold but I never been able to get Fold to work as I wanted to. Also could you possibly elaborate on your use of composition /*? | |
| Oct 13, 2016 at 6:35 | comment | added | Kuba | The very first answer with JoinAcross that makes sense to me. I was wondering where this function may be useful. Up to now I considered it a retarded sister of GroupBy+Merge. :) +1 | |
| Oct 13, 2016 at 4:36 | history | edited | WReach | CC BY-SA 3.0 | minor corrections |
| Oct 13, 2016 at 4:25 | history | answered | WReach | CC BY-SA 3.0 |