I have 3 models and each model is solving tasks (say task 1 to 2).
Once these tasks (of same type) are solved by the models; I am collecting 3 numerical features (say feature1 to feature3) for each task for each model.
Model A: Feature1-task1-modelA = 20 Feature2-task1-modelA = 40 Feature3-task1-modelA = 55 Feature1-task2-modelA = 77 Feature2-task2-modelA = 30 Feature3-task2-modelA = 22 Model B: Feature1-task1-modelB = 10 Feature2-task1-modelB = 70 Feature3-task1-modelB = 33 Feature1-task2-modelB = 88 Feature2-task2-modelB = 79 Feature3-task2-modelB = 97 Model C: Feature1-task1-modelC = 45 Feature2-task1-modelC = 65 Feature3-task1-modelC = 75 Feature1-task2-modelC = 30 Feature2-task2-modelC = 40 Feature3-task2-modelC = 99 These features eventually will be used for a classification problem to determine which model will be selected for solving these tasks.
I am in the process of feature selection where I am trying select only top features that will be beneficial for the model selection.
My thinking is to calculate these top features using Chi-Square and p-value. Similar to the following:
Feature Chi2 Score p-value feature2 3.89 0.1427 feature3 2.70 0.2592 feature1 2.41 0.2992 So here if I am selecting top 2 features then I will be using only feature2 and feature3 in the classification problem.
My question is: How I can I aggregate these features values from the different tasks and then select top features?
I could be wrong in my overall approach. How can I do this? Are there any other ideas to select top features?
*Note: Don't bother with the numbers since all used above are dummy ones.