I have data that is given as a list of ordered pairs mixed with scalars. The pairs can contain infinite bounds. My goal is to convert the data into an index used in future computations.
data = {{1, ∞}, {-∞, 2}, 3, {2, 2}, {2, 3}}; This gives me all of the unique values present in data.
udata = Sort[DeleteDuplicates[Flatten@data], Less] ==> {-∞, 1, 2, 3, ∞} Now I use Dispatch to create replacement rules based on the unique values.
dsptch = Dispatch[Thread[udata -> Range[Length[udata]]]]; Finally I replace the values with their indices and expand scalars a such that they are also pairs {a,a}. This results in a matrix of indices which is what I'm after.
Replace[data /. dsptch, a_Integer :> {a, a}, 1] ==> {{2, 5}, {1, 3}, {4, 4}, {3, 3}, {3, 4}} NOTES:
The number of unique values is generally small compared to the length of
databut this doesn't have to be the case.Any real numbers are possible. The
dataI've shown simply gives a sense of the structural possibilities.
Question: Is there a way to create the final matrix of indices that is much faster than what I'm doing here?
Edit: To test the how potential solutions scale I recommend using the following data. It is fairly representative of a true-to-life case.
inf = {#, ∞} & /@ RandomChoice[Range[1000], 3*10^5]; neginf = {-∞, #} & /@ RandomChoice[Range[1000], 10^5]; int = Sort /@ RandomChoice[Range[1000], {10^5, 2}]; num = RandomChoice[Range[1000], 5*10^5]; testData = RandomSample[Join[inf, neginf, int, num]];
Sort@DeleteDuplicates@Flattenis practically unbeatable. I tried. $\endgroup$Sort...Flattenwas going to be next to impossible, I tried usingReapandSowto simultaneously collect the unique terms and substitute in a function that would later return the index. Twice as slow as your method. Tried using an implementation of a binary tree, it can't handle $10^6$ terms, e.g. $10^5$ terms on par with your implementation running $10^6$ terms. So, I don't know exactly optimize the bottleneck any further. $\endgroup$If[Length[#] == {}, {#, #}, #] & /@ ArrayComponents[testData];$\endgroup$