0
$\begingroup$

I have a dataset with ~20.000 entries containing mean values for different groups. The groups are defined with 4 categorical columns and I have the week number, the number of samples per week and the mean value.

So my data could look something like this:

AFFILIATE TASK INDUSTRY STATE WEEK NUMBER_OF_SAMPLES MEAN
A1 T1 I1 NY 1 50 20.3
A1 T1 I1 NY 2 40 20.1
A1 T1 I1 NY 3 60 30.9
A2 T1 I1 NY 10 10 200.3
A2 T1 I1 NY 12 20 199.9
A2 T1 I1 NY 15 15 201.5

There are around 700 unique groupings for the categorical columns. The groups can have a different number of weeks per group. All groups have at least 10 entries and as many as 24 entries. This could be lengthened in the future to at least 50 weeks.

I now want to build a model that finds outliers in my data. Since I plan on using an Active Learning approach in the future, I want to find a Semi-Supervised model, that handles such outliers. I already built a small set of Pseudo-Labels for the training of a Semi-Supervised model.

My questions specifically are:

  • What model do you suggest?
  • What input parameters do you suggest? (Z-Score, Moving-Average, ...)
  • How many Pseudo-Labels would I expect to need to get decent results?
  • How should I prepare my Categorical and Numerical data? (One hot encoding, scaling, ...)

I tried using DeepSAD from the package deepod but I got very poor results. It always seemed like the model scores all entries in one group with the same anomaly score, while I want each week to be scored separately.

I also have a manually labelled validation set containing 1.000 entries with 25 anomalies and 975 non anomalies. I use this validation set to check the performance of my model.

Thank you!

New contributor
Dee Vee is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.
$\endgroup$
2
  • 1
    $\begingroup$ Can you define what an outlier is for your data? $\endgroup$ Commented Nov 25 at 11:37
  • $\begingroup$ Yes. We can have either a innovative outlier, additive outlier or temporary outlier. Here is an example (researchgate.net/figure/…). In the future we might want to detect seasonal outliers as well $\endgroup$ Commented Nov 25 at 12:52

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.