0
$\begingroup$

I have a couple of hundred categories where each of these categories has a specific set of attributes having different values (historical).

The problem I need to solve is to select the best set of categories out of a smaller group which meet some constraints.

I'm new to datascience and was wondering how do I go about solving this problem?

One option I thought was to use multiple regression for the different attributes to assign a weight to each category and then use these to generate a random forest on the historical groups of categories to train and test them.

Does this make sense?

$\endgroup$
1
  • 1
    $\begingroup$ A more concrete description maybe with a toy example would be helpful. But "select a subset... meet some constraints" sure sounds like an integer program. $\endgroup$ Commented Jul 10, 2019 at 12:38

1 Answer 1

0
$\begingroup$

If I understand your question correctly, the problem you are trying to solve is a "multiclass classification" problem, so I'd suggest doing some research on that. Possible solutions range from Support Vector Classification (simple) to deep neural networks (harder). In most machine learning problems, it's best to start with a simple approach. This means that:

  • You learn about simple patterns in your data
  • You get a benchmark
  • If the result from your simple is good enough, you can stop there :)

I recommend checking out scipy's docs on classification models. Also if you're a beginner you might find this cheatsheet handy for similar questions in future.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.