0
$\begingroup$

I am reading a paper on wind power forecasting and the authors present a plot of the data before outliers are removed and a plot after. However, they don't actually say what method was employed to remove the outliers. I was hoping someone might offer some guesses or hints on how one would go about obtaining plot (b) from plot (a).enter image description here

$\endgroup$
3
  • 6
    $\begingroup$ Whatever we guess, it's the authors' job to properly explain what they're doing, so if this is not explained, it doesn't reflect well on what's in the paper. (Of course they could just look at the first plot and then throw away whatever they don't like...) $\endgroup$ Commented Nov 27, 2023 at 17:16
  • $\begingroup$ Agreed. I asked for clarification but haven't received a response so I figured I would see if I could find a reasonable approach myself. So far my efforts have come up empty so I am asking here. $\endgroup$ Commented Nov 27, 2023 at 17:19
  • 1
    $\begingroup$ Is there a reason for the time series tag? I don't see any time series aspect here. $\endgroup$ Commented Nov 27, 2023 at 17:25

1 Answer 1

0
$\begingroup$

the authors ... don't actually say what method was employed to remove the outliers.

You didn't cite the authors' paper :-(


A significant number of readings produce zero power. This likely corresponds to scheduled or unscheduled downtime where the turbine is disconnected from the grid or otherwise not generating power on a windy day.

Most of the other filtered readings correspond to low-efficiency operation. This could be caused by wind direction being off-axis from the turbine, deliberate use of a mechanical brake, poorly maintained lubrication, and similar mechanical issues.

Given that additionally a small number of outliers are discarded from the "unusually high efficiency" side, there's a decent chance the authors computed standard deviation and retained just those points that are within about 2 sigma of a fitted curve. This would explain filtering of the "zero power" outliers, as well.

$\endgroup$
4
  • $\begingroup$ Thank you for your insight. I actually started with something similar to what you are suggesting (2 sigma or similar) but it kept giving me weird results. I will revisit and see if I can make it work now that I have some assurance that it is a valid approach. $\endgroup$ Commented Nov 27, 2023 at 19:56
  • $\begingroup$ Clearly the 0 power data were removed. That could be from something like what JH suggested, but it also might be that those are data entry errors. $\endgroup$ Commented Nov 27, 2023 at 20:02
  • $\begingroup$ A very close look dissuaded me from offering this suggestion, because a lower cutoff based on conditional SD that cleans the graph so neatly near the bottom left would not leave so many low stragglers in the middle. I suspect something a little more sophisticated might have happened, such as removing outlying residuals from some kind of robust logistic regression. But without seeing the original paper it seems all we can do is speculate. $\endgroup$ Commented Nov 27, 2023 at 20:16
  • $\begingroup$ Sorry, here is the paper: mdpi.com/1996-1073/16/6/2688 $\endgroup$ Commented Nov 27, 2023 at 21:18

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.