2
$\begingroup$

I would like to make a prediction of a list of products (>1000 in dataset) within a particular product category (>100 in dataset).

Example:

  1. Select product categories (1 or many): fruits, vegetables or more;
  2. Model makes predictions and possibly return:
  • Fruits: Banana, Apple, Strawberry;
  • Vegetables: Carrot, Onion, Potato;

Assumptions:

  • Product category/categories to be entered by the user and prediction will happen within this group or groups (it is important to return all of them in the results);

Potential solution:

  • Regarding products classification I take into consideration Multi-Class Classification (One vs Rest or One vs One => however, I am afraid that it will required way to much computing power and time)
  • The tricky part is to make prediction within a category, does it means that I need to build a separate model for each category?
$\endgroup$
3
  • $\begingroup$ "it is important to return all of them in the results" Is this means returning all products in same category? $\endgroup$ Commented Dec 13, 2021 at 18:26
  • $\begingroup$ What model does seems like just to predict elements from subcategories. In that case you just need to train 1 model for this. Please share your datatable format example to make it more clear $\endgroup$ Commented Dec 13, 2021 at 18:29
  • $\begingroup$ Thank you @EnesKuz -> I post data and description in the main answer below. $\endgroup$ Commented Dec 14, 2021 at 10:09

1 Answer 1

1
$\begingroup$

Below is some sample data limited to few examples only out of >1000. First picture is a colored print-screen of a table to highlight different categories: main, macro-, micro-categories & below the picture is a raw data. The objective is:

  1. User enters two (can be one or even all) macro-categories (e.g. Citrus and Leafy green)
  2. For these macro-categories (which were entered) the model returns top best matches
  3. I am afraid that if I train a model (RF or SVM) over whole dataset, using macr- and micro-categories as different features in the model -> the model could predict micro-category not related to the macro-category (e.g. I will get as a result Oranges (Citrus), but I entered to the model only Berries and Melons).

enter image description here

Main_class Macro_class Micro_class Performance Usage Experience
Fruits Berries  Blueberries 1 0.5 6.05
Fruits Berries  Raspberries 2 1 12.1
Fruits Berries  Goji berries 3 1.5 18.15
Fruits Berries  Strawberries 4 2 24.2
Fruits Berries  Bilberries 5 2.5 30.25
Fruits Berries  Açaí berries 6 3 36.3
Fruits Berries  Cranberries 7 3.5 42.35
Fruits Berries  Grapes 8 4 48.4
Fruits Citrus oranges 21 10.5 127.05
Fruits Citrus grapefruits 23 11.5 139.15
Fruits Citrus mandarins 25 12.5 151.25
Fruits Citrus limes 27 13.5 163.35
Fruits Stone fruit nectarines 6 3 36.3
Fruits Stone fruit apricots 7 3.5 42.35
Fruits Stone fruit peaches 8 4 48.4
Fruits Stone fruit plums 9 4.5 54.45
Fruits Melons watermelons 2 1 12.1
Fruits Melons rockmelons 3 1.5 18.15
Fruits Melons honeydew melons 4 2 24.2
Vegetables Leafy green lettuce 25 12.5 151.25
Vegetables Leafy green spinach 27 13.5 163.35
Vegetables Leafy green silverbeet 6 3 36.3
Vegetables Cruciferous cabbage 7 3.5 42.35
Vegetables Cruciferous cauliflower 8 4 48.4
Vegetables Cruciferous Brussels sprouts 25 12.5 151.25
Vegetables Cruciferous broccoli 27 13.5 163.35
Vegetables Marrow pumpkin 6 3 36.3
Vegetables Marrow cucumber 7 3.5 42.35
Vegetables Marrow zucchini 8 4 48.4
Vegetables Allium onion 5 2.5 30.25
Vegetables Allium garlic 6 3 36.3
Vegetables Allium shallot 7 3.5 42.35
$\endgroup$
3
  • $\begingroup$ please explain "model returns top best matches" what is that mean ? $\endgroup$ Commented Dec 14, 2021 at 19:27
  • $\begingroup$ Also as I see you dont need main class at all. And you are right you shouldnt train your network mixing micro and macros. $\endgroup$ Commented Dec 14, 2021 at 19:28
  • $\begingroup$ "model returns top best matches" -> I assume that model will predict 2 dependent variables: micro_class and performance. By predicting performance I would get numerical information (performance) to make products ranking and get for example top 3 'best matches'. Also, to predict micro_class I would probably need to funnel dataset and create separate classifier for each macro_category? $\endgroup$ Commented Dec 14, 2021 at 20:49

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.