Filtering out bursts with a consistent range from a time series

Question

I have time series of bursts that look like this:

… and zoomed in:

Now, there are also spurious bursts (which I call noise) in the data, which look like this:

… and zoomed in:

As you can see, the main characteristics of the noise is that it has a more consistent range. I want to filter out this noise.

So far, I defined a moving window of length 10 and whenever the standard deviation of the data within the window was smaller than some threshold, I suppressed it. However, this method could not filter out all noise and also filtered out some real bursts.

I am looking to use some kind of statistical model (and not machine learning) to accomplish this.

Optimization can be done by first using more of the prior information. Why do you regard the noise more 'consistent' in range? What model do you have behind this? — Sextus Empiricus
– Sextus Empiricus, Commented Aug 9, 2018 at 15:37
"I defined a moving window of length 10 and whenever the standard deviation of the data within the window was smaller than some threshold, I suppressed it." I do not see how this works and why it is done. How does this filter out the peak at 68600 and how does this not filter the peak at 4560? — Sextus Empiricus
– Sextus Empiricus, Commented Aug 9, 2018 at 15:47
@Martijn Weterings, the moving window method did not work well. I basically just suppressed the motion when it is not significantly different from the previous one. I considered noise burst more consistent based on observation whereas any other movement would disturb the pattern. Please see my latest comment as I adopted KS-test to improve the results. — CodeLover
– CodeLover, Commented Aug 9, 2018 at 16:00
What do you mean by 'the motion'? How do you get to your observation of noise burst (how do you know that it is noise)? What pattern (that is being disturbed) should I see? What latest comment do you refer to(and what is KS-test?)? — Sextus Empiricus
– Sextus Empiricus, Commented Aug 9, 2018 at 16:58

Wrzlprmft · Accepted Answer · 2018-08-07 17:12:59Z

Here is how I would approach your problem:

Preparation

Split the time series into bursts, in particular do not use moving windows, since they are not good for capturing the transition between zero and a burst as well as longer bursts.
Collect a test dataset $\mathcal{B} = {B_1,\ldots,B_n}$ of clear real bursts and a test dataset $\mathcal{C} = {C_1,\ldots,C_m}$ of noise bursts. The more, the better.

General procedure

You want to find some characteristic derived from the test datasets that tells you whether a given burst is noise or not (forgive me if I am stating the obvious here). As the noise bursts are more uniform, what suggests itself is using some similarity to $\mathcal{B}$ as a measure. Ideally your measure indicates higher similarity for any $C_1,\ldots,C_m$ than for any $B_1,\ldots,B_n$. However, be prepared for not obtaining such a perfect separation. You can use receiver operating characteristics (ROC) to quantify the quality of your separation and to find a threshold that fits your desires (i.e., is a good compromise between false positives and false negatives).

Be aware that fine-tuning your characteristics may lead to it being overly specific to your test dataset (in-sample optimisation). You can avoid this by collecting two pairs of test datasets and assessing your methods separation capabilities on the respective other one.

Possible Characteristics

The easiest approach is to accumulate all noise bursts ($\mathcal{C}$) into one distribution and use something like the Kolmogorov–Smirnov test characteristics or the Mann–Whitney test to quantify similarity between a burst’s distribution of values and the distribution of values from the known noise bursts $\mathcal{C}$. This makes most sense if the values of all noise bursts are roughly sampled from the same distribution, but this is not a strict requirement – it may still allow for a good separation.

An example for a more complicated characteristics would be to count the number of members of $\mathcal{C}$ your burst complies with.

thank you for your response! I tried to cluster distinct bursts into one distribution and compared it against the uniform distribution. However, I still got p-value = 0 for noise burst which is less than 5%, meaning I can reject that the data is uniform. Do you think I should eliminate any outlier before the KS test? FYI, I tried kstest function from scipy.stats. Thanks! — CodeLover
– CodeLover, Commented Aug 8, 2018 at 15:11
@CodeLover: Why would you compare against the uniform distribution? (Sidenote: Real data is almost never uniformly distributed, unless you are looking at something like angles between $0$ and $2π$.) If anything, compare noise bursts against regular bursts. — Wrzlprmft
– Wrzlprmft, Commented Aug 8, 2018 at 16:10
Thank you @Wrzlprmft! I tried to compare it against real burst and it worked quite successfully. Right now I just need to adjust p-value threshold to increase sensitivity. — CodeLover
– CodeLover, Commented Aug 8, 2018 at 20:25

Stack Exchange Network

Filtering out bursts with a consistent range from a time series

1 Answer 1

Preparation

General procedure

Possible Characteristics

Hot Network Questions

Filtering out bursts with a consistent range from a time series

1 Answer 1

Preparation

General procedure

Possible Characteristics

Related

Hot Network Questions