Let's say I'm a dairy farmer in WisconsinWisconsin. I look at some data about milk output from my dairy and find that output is down 50% from last year. This is a remarkable drop, and it must be explained! What on earth is causing this? I need to call a vet to check that there isn't some infection afflicting the cows. I need to send our feed for analysis and make sure it's properly balanced. I need to have our milking machinery serviced and checked. This is all going to be very time-consuming and expensive, but it's very important to get to the bottom of it.
Let's say I'm a dairy farmer in Wisconsin. I look at some data about milk output from my dairy and find that output is down 50% from last year. This is a remarkable drop, and it must be explained! What on earth is causing this? I need to call a vet to check that there isn't some infection afflicting the cows. I need to send our feed for analysis and make sure it's properly balanced. I need to have our milking machinery serviced and checked. This is all going to be very time-consuming and expensive, but it's very important to get to the bottom of it.
Let's say I'm a dairy farmer in Wisconsin. I look at some data about milk output from my dairy and find that output is down 50% from last year. This is a remarkable drop, and it must be explained! What on earth is causing this? I need to call a vet to check that there isn't some infection afflicting the cows. I need to send our feed for analysis and make sure it's properly balanced. I need to have our milking machinery serviced and checked. This is all going to be very time-consuming and expensive, but it's very important to get to the bottom of it.
Let's say I'm a dairy farmer in WisconsinWisconsin. I look at some data about milk output from my dairy and find that output is down 50% from last year. This is a remarkable drop, and it must be explained! What on earth is causing this? I need to call a vet to check that there isn't some infection afflicting the cows. I need to send our feed for analysis and make sure it's properly balanced. I need to have our milking machinery serviced and checked. This is all going to be very time-consuming and expensive, but it's very important to get to the bottom of it.
If our StackExchangeStack Exchange cows are our answerers, the analysis presented in OP is suggesting that our very best cows, the ones that produce more than 3 answers in a given week, are being particularly affected. They're dropping faster than the overall answers! The proportion of answers by the >=3 answerers is down! But, before we go about an expensive way of figuring out what is targeting our best producers, is there any simpler explanation?
I run the simulation 1000 times. The mean total answers in the simsimulation is 21000, 62.6% of the original count (good reality check there).
It looks to me like indeed, thethe reduction in top answerers, at least comparing just these two weeks of data, is more than what you'd expect just from an across-the-board drop. The simulation predicts 1461 users with >=3 answers, but the actual April data only had 1360 such users. If you were just expecting the number to drop proportionateproportional to the number of answers, though, you would have predicted 2399 * 0.626 = 1502 users, so some of the drop is accounted for by the general effect filtered through the threshold, rather than the specific one.
VeryI am very open to feedback and criticism on this approach. I didn't have a lot of time to play around in SEDE, but if someone wants to make a data set that has week-by-week data instead of just one week extracted, I'd be happy to tweak the rest of my code to plot this out over time.
Let's say I'm a dairy farmer in Wisconsin. I look at some data about milk output from my dairy and find that output is down 50% from last year. This is a remarkable drop, it must be explained! What on earth is causing this? I need to call a vet to check that there isn't some infection afflicting the cows. I need to send our feed for analysis and make sure it's properly balanced. I need to have our milking machinery serviced and checked. This is all going to be very time-consuming and expensive but it's very important to get to the bottom of it.
If our StackExchange cows are our answerers, the analysis presented in OP is suggesting that our very best cows, the ones that produce more than 3 answers in a given week, are being particularly affected. They're dropping faster than the overall answers! The proportion of answers by the >=3 answerers is down! But, before we go about an expensive way of figuring out what is targeting our best producers, is there any simpler explanation?
I run the simulation 1000 times. The mean total answers in the sim is 21000, 62.6% of the original count (good reality check there).
It looks to me like indeed, the reduction in top answerers, at least comparing just these two weeks of data, is more than what you'd expect just from an across-the-board drop. The simulation predicts 1461 users with >=3 answers, but the actual April data only had 1360 such users. If you were just expecting the number to drop proportionate to the number of answers, though, you would have predicted 2399 * .626 = 1502 users, so some of the drop is accounted for by the general effect filtered through the threshold, rather than the specific one.
Very open to feedback and criticism on this approach. I didn't have a lot of time to play around in SEDE but if someone wants to make a data set that has week-by-week data instead of just one week extracted, I'd be happy to tweak the rest of my code to plot this out over time.
Let's say I'm a dairy farmer in Wisconsin. I look at some data about milk output from my dairy and find that output is down 50% from last year. This is a remarkable drop, and it must be explained! What on earth is causing this? I need to call a vet to check that there isn't some infection afflicting the cows. I need to send our feed for analysis and make sure it's properly balanced. I need to have our milking machinery serviced and checked. This is all going to be very time-consuming and expensive, but it's very important to get to the bottom of it.
If our Stack Exchange cows are our answerers, the analysis presented in OP is suggesting that our very best cows, the ones that produce more than 3 answers in a given week, are being particularly affected. They're dropping faster than the overall answers! The proportion of answers by the >=3 answerers is down! But, before we go about an expensive way of figuring out what is targeting our best producers, is there any simpler explanation?
I run the simulation 1000 times. The mean total answers in the simulation is 21000, 62.6% of the original count (good reality check there).
It looks to me like indeed, the reduction in top answerers, at least comparing just these two weeks of data, is more than what you'd expect just from an across-the-board drop. The simulation predicts 1461 users with >=3 answers, but the actual April data only had 1360 such users. If you were just expecting the number to drop proportional to the number of answers, though, you would have predicted 2399 * 0.626 = 1502 users, so some of the drop is accounted for by the general effect filtered through the threshold, rather than the specific one.
I am very open to feedback and criticism on this approach. I didn't have a lot of time to play around in SEDE, but if someone wants to make a data set that has week-by-week data instead of just one week extracted, I'd be happy to tweak the rest of my code to plot this out over time.
If we plot out the actual April versus simulated data we can get a better idea of what's going on. It looks like there are really no fewer answerers than predicted among the very top answerers, those producing over 20 answers in a week. Rather, it looks like there's a proportional drop in people posting 3-4 answers and an increase in those posting 1. I think these data would make me look for reasons that people aren't posting more than 1 answer, rather than only focusing on why people who post a lot of answers might be going away. Some people have pointed to the 30-minute timeout as a possible cause. One might look at historically how many of the people who post 3-4 answers in a week are new accounts posting in rapid succession to guess at that impact. You might also look at how many people experience the 30 minute block in a week. If the number seeing the block is similar to or greater than the 100-150 missing people who'd normally post >=3 answers, that might support it as a cause.
If we plot out the actual April versus simulated data we can get a better idea of what's going on. It looks like there are really no fewer answerers than predicted among the very top answerers, those producing over 20 answers in a week. Rather, it looks like there's a proportional drop in people posting 3-4 answers and an increase in those posting 1. I think these data would make me look for reasons that people aren't posting more than 1 answer, rather than only focusing on why people who post a lot of answers might be going away. Some people have pointed to the 30-minute timeout as a possible cause. One might look at historically how many of the people who post 3-4 answers in a week are new accounts posting in rapid succession to guess at that impact.
If we plot out the actual April versus simulated data we can get a better idea of what's going on. It looks like there are really no fewer answerers than predicted among the very top answerers, those producing over 20 answers in a week. Rather, it looks like there's a proportional drop in people posting 3-4 answers and an increase in those posting 1. I think these data would make me look for reasons that people aren't posting more than 1 answer, rather than only focusing on why people who post a lot of answers might be going away. Some people have pointed to the 30-minute timeout as a possible cause. One might look at historically how many of the people who post 3-4 answers in a week are new accounts posting in rapid succession to guess at that impact. You might also look at how many people experience the 30 minute block in a week. If the number seeing the block is similar to or greater than the 100-150 missing people who'd normally post >=3 answers, that might support it as a cause.