2
$\begingroup$

Imagine I have a time series measurement of the 1 tool. Each measurement is labeled by the ordinal category (3 categories) and measurements are non-overlapping.

What I want to do is to test a significant difference between the groups based on aggregated statistics for each measurement (like standard deviation, kurtosis etc.). For example, I have 1000 measurements, each measurement has 10000 points. So for each measurement I calculate aggregated statistics. Based on these stats I want to test the significant differences between the labels (for each aggregated stats).

What would be the best tests to do? I am mostly interested in the tests between a pair of groups( For example: Standard Deviation of Label 1 vs standard Deviation of Label 2). I was thinking about Mann–Whitney, but I am not sure.

The aggregated statistics data are not normally distributed.

$\endgroup$
1
  • 1
    $\begingroup$ Because your question has been acceptably answered, deleting or substantially modifying the question damages the thread. Please visit our help center for guidance on how this site works. $\endgroup$ Commented Sep 8 at 13:27

1 Answer 1

3
$\begingroup$

Your description of your data, and objectives is not very clear. This would normally be handled with comments, but there are so many questions, that I have to use an answer. So this is not an answer, just a long comment.

From what I can understand,

  • You have 1000 metrics (different “things” being measured), each metric can belong to 1 of 3 possible categories (3 groups), and each metric is measured 10000 times over time. I will assume ~300 metrics per group. I am not sure how to interpret “measurements are non-overlapping”, or its relevance...
  • But what are these metrics? Are they identically distributed, or not? Are they independent? Does it make sense to aggregate them? For example, you could have a temperature, atmospheric pressure, humidity, etc. all measured at one location, over time. There it makes no sense to lump them together inside a group; what you need to do is compare temperatures from location 1 to temperatures from location 2, and using aggregate statistics is of no use. But maybe you are only measuring temperature, at multiple sites, but in the same general location (your 3 categories)? Then you could aggregate the data across categories, and compute aggregate statistics per site. Can you clearly describe your scenario?
  • For each metric, you compute several statistics (mean, standard deviation, kurtosis, etc.). How many exactly? And which ones specifically? (there are standard tests for some statistics, but not so for others...). And why? (I see no advantage of comparing ~ 300 averages -or sd’s, etc.- over directly comparing ~300,000 measurements). Can you explain the reason for using aggregated statistics?
  • So what you have now are multiple samples (of size ~= 300) from sampling distributions of multiple statistics.
  • You want to compare groups based on these statistics. But that is very vague; what does “compare” mean? That they have the exact same distribution? That they have the same location? That the time series have the same periodicity (frequency analysis). Can you also clarify?
  • And last, why are you doing this analysis. What will you do if Category 1 is “different” (whatever that means) from Category 2? What decision/conclusion will be made from the results?

As you can see, as it stands, it is very hard to provide helpful answers to your post.

$\endgroup$
2
  • $\begingroup$ Thank you for the answer! I accepted the answer and you are right about all of your questions. I created another post: stats.stackexchange.com/questions/670074/… I hope, I made it more clear. $\endgroup$ Commented Sep 7 at 9:06
  • 2
    $\begingroup$ @ML_specialist, a better practice on this site is to edit the original question, rather than creating a new one... $\endgroup$ Commented Sep 7 at 17:16

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.