The Question
I'm not super familiar with the name's of common algorithms in Data Science, and I feel like this would be something that is commonly used, and so should have a name - want to refer to its proper name for the sake of documenting it correctly in a codebase. I've implemented an algorithm that is -kind- of like TF-IDF (only similar algorithm I know by name), it runs on a dataset containing a string and a float column, here's how the algorithm works on an example table:
| Input (str) | Output (float) |
|---|---|
| a | 2.0 |
| b | 0.0 |
| a | 1.0 |
| a | 6.0 |
| c | 8.0 |
| c | 4.0 |
Step 1
group by Input, and take the mean of the output
| Input (str) | Output Mean (float) |
|---|---|
| a | 3.0 |
| b | 0.0 |
| c | 6.0 |
Step 2
Calculate the rank of the Inputs based on the order of their Output column
| Input (str) | Rank (float) |
|---|---|
| a | 2.0 |
| b | 1.0 |
| c | 3.0 |
Step 3
We then map the input strings to this new rank
| Input (float) | Output (float) |
|---|---|
| 2.0 | 2.0 |
| 1.0 | 0.0 |
| 2.0 | 1.0 |
| 2.0 | 6.0 |
| 3.0 | 8.0 |
| 3.0 | 4.0 |
Follow-up Question
Assuming the answer does not also answer this, what is this called for an arbitrary aggregation method, for example we median, or max instead of finding the mean in the first step.