replaced http://meta.stackexchange.com/ with https://meta.stackexchange.com/

edited Mar 20, 2017 at 10:30

1

I've found a flaw flaw in my analysis.

It turns out that I ignored most posts with zero comments. The results including those posts significantly change the picture. I'm not sure that my theory is wrong, but I need to do some more analysis in the database. There's a really good chance I was thinking too fast.

Showing comments costs a post votes.

I believe that comments are most distracting to people who are reading a Q&A page. Comments are most useful for people who are authors or editors of individual posts. It's pretty easy to measure the output of authors and editors by looking at how often they create or revise posts. It seems harder to measure the output of readers on most sites. But on sites with voting (such as ours) the output is easy to measure: readers contribute by voting (up or down) on posts.

My guess is that when readers get to the bottom of a post, they tend to forget to go back to the top of the post and vote if there are comments to read. As a compulsive reader, I can say, anecdotally, that I'm far more likely to keep reading if there's more text. It's possible that comment clutter prevents people from going back to vote on posts without extra mental effort. If so, displaying comments might cost the authors of posts reputation.

That's a good story, but how can we test it? A split test where group A sees comments as they are now and for group B all comments are hidden would probably do the trick. If people in the test group vote (up or down) more often that people in the control group, we can be pretty certain that displaying comments is a drag on our reputation-based economy.

I've found a flaw in my analysis.

It turns out that I ignored most posts with zero comments. The results including those posts significantly change the picture. I'm not sure that my theory is wrong, but I need to do some more analysis in the database. There's a really good chance I was thinking too fast.

Showing comments costs a post votes.

I believe that comments are most distracting to people who are reading a Q&A page. Comments are most useful for people who are authors or editors of individual posts. It's pretty easy to measure the output of authors and editors by looking at how often they create or revise posts. It seems harder to measure the output of readers on most sites. But on sites with voting (such as ours) the output is easy to measure: readers contribute by voting (up or down) on posts.

My guess is that when readers get to the bottom of a post, they tend to forget to go back to the top of the post and vote if there are comments to read. As a compulsive reader, I can say, anecdotally, that I'm far more likely to keep reading if there's more text. It's possible that comment clutter prevents people from going back to vote on posts without extra mental effort. If so, displaying comments might cost the authors of posts reputation.

That's a good story, but how can we test it? A split test where group A sees comments as they are now and for group B all comments are hidden would probably do the trick. If people in the test group vote (up or down) more often that people in the control group, we can be pretty certain that displaying comments is a drag on our reputation-based economy.

I've found a flaw in my analysis.

It turns out that I ignored most posts with zero comments. The results including those posts significantly change the picture. I'm not sure that my theory is wrong, but I need to do some more analysis in the database. There's a really good chance I was thinking too fast.

Showing comments costs a post votes.

I believe that comments are most distracting to people who are reading a Q&A page. Comments are most useful for people who are authors or editors of individual posts. It's pretty easy to measure the output of authors and editors by looking at how often they create or revise posts. It seems harder to measure the output of readers on most sites. But on sites with voting (such as ours) the output is easy to measure: readers contribute by voting (up or down) on posts.

My guess is that when readers get to the bottom of a post, they tend to forget to go back to the top of the post and vote if there are comments to read. As a compulsive reader, I can say, anecdotally, that I'm far more likely to keep reading if there's more text. It's possible that comment clutter prevents people from going back to vote on posts without extra mental effort. If so, displaying comments might cost the authors of posts reputation.

That's a good story, but how can we test it? A split test where group A sees comments as they are now and for group B all comments are hidden would probably do the trick. If people in the test group vote (up or down) more often that people in the control group, we can be pretty certain that displaying comments is a drag on our reputation-based economy.

Migration of MSO links to MSE links

Source Link

edited Apr 24, 2014 at 13:52

Community Bot

1

I've found a flaw flaw in my analysis.

It turns out that I ignored most posts with zero comments. The results including those posts significantly change the picture. I'm not sure that my theory is wrong, but I need to do some more analysis in the database. There's a really good chance I was thinking too fast.

Showing comments costs a post votes.

I believe that comments are most distracting to people who are reading a Q&A page. Comments are most useful for people who are authors or editors of individual posts. It's pretty easy to measure the output of authors and editors by looking at how often they create or revise posts. It seems harder to measure the output of readers on most sites. But on sites with voting (such as ours) the output is easy to measure: readers contribute by voting (up or down) on posts.

My guess is that when readers get to the bottom of a post, they tend to forget to go back to the top of the post and vote if there are comments to read. As a compulsive reader, I can say, anecdotally, that I'm far more likely to keep reading if there's more text. It's possible that comment clutter prevents people from going back to vote on posts without extra mental effort. If so, displaying comments might cost the authors of posts reputation.

That's a good story, but how can we test it? A split test where group A sees comments as they are now and for group B all comments are hidden would probably do the trick. If people in the test group vote (up or down) more often that people in the control group, we can be pretty certain that displaying comments is a drag on our reputation-based economy.

I've found a flaw in my analysis.

It turns out that I ignored most posts with zero comments. The results including those posts significantly change the picture. I'm not sure that my theory is wrong, but I need to do some more analysis in the database. There's a really good chance I was thinking too fast.

Showing comments costs a post votes.

I believe that comments are most distracting to people who are reading a Q&A page. Comments are most useful for people who are authors or editors of individual posts. It's pretty easy to measure the output of authors and editors by looking at how often they create or revise posts. It seems harder to measure the output of readers on most sites. But on sites with voting (such as ours) the output is easy to measure: readers contribute by voting (up or down) on posts.

My guess is that when readers get to the bottom of a post, they tend to forget to go back to the top of the post and vote if there are comments to read. As a compulsive reader, I can say, anecdotally, that I'm far more likely to keep reading if there's more text. It's possible that comment clutter prevents people from going back to vote on posts without extra mental effort. If so, displaying comments might cost the authors of posts reputation.

That's a good story, but how can we test it? A split test where group A sees comments as they are now and for group B all comments are hidden would probably do the trick. If people in the test group vote (up or down) more often that people in the control group, we can be pretty certain that displaying comments is a drag on our reputation-based economy.

I've found a flaw in my analysis.

It turns out that I ignored most posts with zero comments. The results including those posts significantly change the picture. I'm not sure that my theory is wrong, but I need to do some more analysis in the database. There's a really good chance I was thinking too fast.

Showing comments costs a post votes.

I believe that comments are most distracting to people who are reading a Q&A page. Comments are most useful for people who are authors or editors of individual posts. It's pretty easy to measure the output of authors and editors by looking at how often they create or revise posts. It seems harder to measure the output of readers on most sites. But on sites with voting (such as ours) the output is easy to measure: readers contribute by voting (up or down) on posts.

My guess is that when readers get to the bottom of a post, they tend to forget to go back to the top of the post and vote if there are comments to read. As a compulsive reader, I can say, anecdotally, that I'm far more likely to keep reading if there's more text. It's possible that comment clutter prevents people from going back to vote on posts without extra mental effort. If so, displaying comments might cost the authors of posts reputation.

That's a good story, but how can we test it? A split test where group A sees comments as they are now and for group B all comments are hidden would probably do the trick. If people in the test group vote (up or down) more often that people in the control group, we can be pretty certain that displaying comments is a drag on our reputation-based economy.

Oops.

Source Link

edited Dec 7, 2013 at 17:02

Jon Ericson StaffMod

80.7k
35
249
350

I've found a flaw in my analysis.

It turns out that I ignored most posts with zero comments. The results including those posts significantly change the picture. I'm not sure that my theory is wrong, but I need to do some more analysis in the database. There's a really good chance I was thinking too fast.

That's a good story, but how can we test it? A split test where group A sees comments as they are now and for group B all comments are hidden would probably do the trick. If people in the test group vote (up or down) more often that people in the control group, we can be pretty certain that displaying comments is a drag on our reputation-based economy.

Short of doing an experiment, we can check the database:

Score vs. number of comments (limit 5)

The x-axis is number of comments and the y-axis is score. I've limited the number of comments at 5 since that's the maximum we display and I'm only looking at positively scored posts. (For the curious, I've selected posts to be those with ViewCount > 0 because during one test I tried dividing score by views and got a divide by zero error. I haven't investigated that yet.)

One way to interpret the graph is that showing even one comment costs the author of a good post an average of 1.75 in score and the second comment reduces voting by a tiny bit more. The third and subsequent comments seem to give some of that back, but likely that's because both commenting and voting are a function of views. When you expand the graph to look at longer comment threads, I think the trend is clear:

Score vs. number of comments (limit 15 comments)

Beyond about 15 comments, the data gets messy since there are so few posts that attract that much attention. And those posts tend to be the very best on the site and naturally get upvoted often as well. Since views include both active users and people who find the post via Google (and can't vote) factoring in ViewCount does not clear up the picture.

Now what?

I can think of a few alternate explanations (good posts get fewer comments or comments are more likely to be purged on good posts). A split test should be able to disprove the theory that comments cost votes. If we see evidence of causation and not simply correlation, we should consider displaying comments harmful because they stealing votes. Any algorithm for displaying comments should err on the side of hiding comments. If the test shows no behaviour difference, than we can't safely change our system until we understand the dynamic between comments and voting better.

Appendix

To address the comments:

If your theory is correct, then long answers should also cost votes because by the time people get to the end they're far away from the voting buttons. Does the data say anything about that? – Monica Cellio

I assumed that one line of a post is about 50 characters. That may or may not be accurate since some lines are much shorter (specifically code and embedded images). Mostly I had to bin the data and that's as good a way as any. Here's how body length correlates to score (assuming a positive score):

Score vs. post lines

As you can see, when a oneliner answer is any good, it's much more likely to be upvoted than longer answers. Not all sites show this pattern. On English.SE, short answers are much less valued on the whole than longer ones:

Score vs. post lines (ELU)

I happen to think length is correlated with quality in general. Great oneliners are common on Stack Overflow, but are much harder to pull off on sites such as ELU. However, I'm not sure we are looking at the same effect. The real problem with comments in terms of post voting is that reading comments causes a mental context switch. The dropoff of score between 0 and 1 comment seems to hold on every site I've checked.

Does it matter if comments cost people reputation? I'm not sure I care, though lower reputation users might more. – ben is uǝq backwards

Yes. I see voting as the currency of the Stack Exchange economy. New users need votes in order to be able to do things on the site. Equally concerning is that votes are the primary measure of quality of posts on the site. If fewer people vote on answers, it's harder for readers to separate good answers from poor answers. We spend a considerable amount of effort making sure people vote fairly, so if we have a systematic bias in our system, it would be good to know about.

I'm not sure that I agree with your analysis of the relationship between comments and votes. A question that needs clarification often gets a comment. All custom close messages are comments. Trying to point out where something is wrong in an answer (often in the critical early visibility) is a comment. All of these are issues that cost votes and comments are the symptoms of the problem (low vote scores are also a symptom), not the causes. – MichaelT

There are several comments that make more or less the same point. Determining cause and effect is often quite difficult when querying a database. The purpose of looking at past data is to see if there's any indication that a split test is worth while. I'm not at all sure what the results of such a test might be.

What is worth noting on this point is that if we were certain that comments pointed to legitimate problems with a post, than I'd agree that comments are more likely to be symptoms of the problem than causes. In the course of examining comment quality I found that comments can be divided into three categories:

Valuable meta-information about a post.

Suggested edits, answers, or new questions.

Could be flagged "not constructive", "obsolete", and "too chatty".

Quite a few of the third category are ought to correlate with increased score ("+1", "Thanks for the answer.", etc.). Without measuring user interaction directly, we can't really know what the effect of hiding comments will be.

That's a good story, but how can we test it? A split test where group A sees comments as they are now and for group B all comments are hidden would probably do the trick. If people in the test group vote (up or down) more often that people in the control group, we can be pretty certain that displaying comments is a drag on our reputation-based economy.

Short of doing an experiment, we can check the database:

Score vs. number of comments (limit 5)

The x-axis is number of comments and the y-axis is score. I've limited the number of comments at 5 since that's the maximum we display and I'm only looking at positively scored posts. (For the curious, I've selected posts to be those with ViewCount > 0 because during one test I tried dividing score by views and got a divide by zero error. I haven't investigated that yet.)

One way to interpret the graph is that showing even one comment costs the author of a good post an average of 1.75 in score and the second comment reduces voting by a tiny bit more. The third and subsequent comments seem to give some of that back, but likely that's because both commenting and voting are a function of views. When you expand the graph to look at longer comment threads, I think the trend is clear:

Score vs. number of comments (limit 15 comments)

Beyond about 15 comments, the data gets messy since there are so few posts that attract that much attention. And those posts tend to be the very best on the site and naturally get upvoted often as well. Since views include both active users and people who find the post via Google (and can't vote) factoring in ViewCount does not clear up the picture.

Now what?

I can think of a few alternate explanations (good posts get fewer comments or comments are more likely to be purged on good posts). A split test should be able to disprove the theory that comments cost votes. If we see evidence of causation and not simply correlation, we should consider displaying comments harmful because they stealing votes. Any algorithm for displaying comments should err on the side of hiding comments. If the test shows no behaviour difference, than we can't safely change our system until we understand the dynamic between comments and voting better.

Appendix

To address the comments:

If your theory is correct, then long answers should also cost votes because by the time people get to the end they're far away from the voting buttons. Does the data say anything about that? – Monica Cellio

I assumed that one line of a post is about 50 characters. That may or may not be accurate since some lines are much shorter (specifically code and embedded images). Mostly I had to bin the data and that's as good a way as any. Here's how body length correlates to score (assuming a positive score):

Score vs. post lines

As you can see, when a oneliner answer is any good, it's much more likely to be upvoted than longer answers. Not all sites show this pattern. On English.SE, short answers are much less valued on the whole than longer ones:

Score vs. post lines (ELU)

I happen to think length is correlated with quality in general. Great oneliners are common on Stack Overflow, but are much harder to pull off on sites such as ELU. However, I'm not sure we are looking at the same effect. The real problem with comments in terms of post voting is that reading comments causes a mental context switch. The dropoff of score between 0 and 1 comment seems to hold on every site I've checked.

Does it matter if comments cost people reputation? I'm not sure I care, though lower reputation users might more. – ben is uǝq backwards

Yes. I see voting as the currency of the Stack Exchange economy. New users need votes in order to be able to do things on the site. Equally concerning is that votes are the primary measure of quality of posts on the site. If fewer people vote on answers, it's harder for readers to separate good answers from poor answers. We spend a considerable amount of effort making sure people vote fairly, so if we have a systematic bias in our system, it would be good to know about.

I'm not sure that I agree with your analysis of the relationship between comments and votes. A question that needs clarification often gets a comment. All custom close messages are comments. Trying to point out where something is wrong in an answer (often in the critical early visibility) is a comment. All of these are issues that cost votes and comments are the symptoms of the problem (low vote scores are also a symptom), not the causes. – MichaelT

There are several comments that make more or less the same point. Determining cause and effect is often quite difficult when querying a database. The purpose of looking at past data is to see if there's any indication that a split test is worth while. I'm not at all sure what the results of such a test might be.

What is worth noting on this point is that if we were certain that comments pointed to legitimate problems with a post, than I'd agree that comments are more likely to be symptoms of the problem than causes. In the course of examining comment quality I found that comments can be divided into three categories:

Valuable meta-information about a post.

Suggested edits, answers, or new questions.

Could be flagged "not constructive", "obsolete", and "too chatty".

Quite a few of the third category are ought to correlate with increased score ("+1", "Thanks for the answer.", etc.). Without measuring user interaction directly, we can't really know what the effect of hiding comments will be.

I've found a flaw in my analysis.

It turns out that I ignored most posts with zero comments. The results including those posts significantly change the picture. I'm not sure that my theory is wrong, but I need to do some more analysis in the database. There's a really good chance I was thinking too fast.

That's a good story, but how can we test it? A split test where group A sees comments as they are now and for group B all comments are hidden would probably do the trick. If people in the test group vote (up or down) more often that people in the control group, we can be pretty certain that displaying comments is a drag on our reputation-based economy.

Addressing the comments.

Source Link

edited Dec 7, 2013 at 1:12

Jon Ericson StaffMod

80.7k
35
249
350

Loading

Source Link

answered Dec 5, 2013 at 18:30

Jon Ericson StaffMod

80.7k
35
249
350

Loading

Stack Exchange Network

Return to Answer

I've found a flaw flaw in my analysis.

Showing comments costs a post votes.

I've found a flaw in my analysis.

Showing comments costs a post votes.

I've found a flaw in my analysis.

Showing comments costs a post votes.

I've found a flaw flaw in my analysis.

Showing comments costs a post votes.

I've found a flaw in my analysis.

Showing comments costs a post votes.

I've found a flaw in my analysis.

Showing comments costs a post votes.

I've found a flaw in my analysis.

Now what?

Appendix

Now what?

Appendix

I've found a flaw in my analysis.

Return to Answer

I've found a flawflaw in my analysis.

Showing comments costs a post votes.

I've found a flaw in my analysis.

Showing comments costs a post votes.

I've found a flaw in my analysis.

Showing comments costs a post votes.

I've found a flawflaw in my analysis.

Showing comments costs a post votes.

I've found a flaw in my analysis.

Showing comments costs a post votes.

I've found a flaw in my analysis.

Showing comments costs a post votes.

I've found a flaw in my analysis.

Now what?

Appendix

Now what?

Appendix

I've found a flaw in my analysis.

I've found a flaw flaw in my analysis.

I've found a flaw flaw in my analysis.