34
\$\begingroup\$

If you didn't know already, the Stack Exchange Data Explorer (SEDE) now includes Code Review and all public Stack Exchange beta sites.

Bookmark: http://data.stackexchange.com/codereview/queries

This is our most powerful available tool to tap into our site's data and fetch everything we need to understand exactly where we're headed, and what we need to fix.


Let this Q&A be our Red Pill Lab.

This is your last chance. After this, there is no turning back. You take the blue pill – the story ends, you wake up in your bed and believe whatever you want to believe. You take the red pill – you stay in Wonderland, and I show you how deep the rabbit hole goes. Remember, all I'm offering is the truth – nothing more.

-- Morpheus

Post queries here as answers, vote for interesting and useful queries, share your findings!

Let's stay in Wonderland and see how deep the rabbit hole goes!

\$\endgroup\$
4
  • 3
    \$\begingroup\$ Someone might want to run the query which Grace Note described (and thought was important) in this chat. \$\endgroup\$ Commented Jan 16, 2014 at 13:18
  • 2
    \$\begingroup\$ None of the links listed work for me. It appears they decided to change "code%20review%20stack%20exchange" to simply "codereview" in the urls. \$\endgroup\$ Commented Jan 23, 2014 at 16:54
  • 1
    \$\begingroup\$ I think I've updated all of them now. I only tested most of them if I broke your URL be assured it was already broken, and I'm sorry. \$\endgroup\$ Commented Jan 23, 2014 at 17:01
  • \$\begingroup\$ @DanielCook Thanks! \$\endgroup\$ Commented Jan 23, 2014 at 17:01

24 Answers 24

14
\$\begingroup\$

Here's my re-creation of Grace Note's query.

The numbers, as of 2014-01-16, are:

Reputation Threshold # Users % of Avid Users Cumulative % of Avid Users -------------------- ------- --------------- -------------------------- 150 1095 51.7241 51.7241 200 793 37.4587 89.1828 500 132 6.2352 95.418 1000 51 2.4091 97.8271 2000 30 1.4171 99.2442 4000 16 0.7558 100 

Status Update, 2014-02-16: There's a bit of progress over the past month. It takes a lot of votes to promote a user to the next threshold!

Reputation Threshold # Users % of Avid Users Cumulative % of Avid Users -------------------- ------- --------------- -------------------------- 150 1182 51.2798 51.2798 200 857 37.18 88.4599 500 158 6.8547 95.3145 1000 58 2.5163 97.8308 2000 26 1.128 98.9588 4000 24 1.0412 100 

Status Update, 2014-03-23: It's no use! The percentages are barely budging. Although we have nearly doubled the number of ≥4k users since mid-Janurary, we're also gaining a lot of new avid users. The increased upvote activity has undoubtedly been beneficial, but I'm not convinced that this query is a useful metric of the vitality of the community.

Reputation Threshold # Users % of Avid Users Cumulative % of Avid Users -------------------- ------- --------------- -------------------------- 150 1299 51.50670000000 51.5067000 200 924 36.63760000000 88.1443000 500 167 6.62170000000 94.7661000 1000 75 2.97380000000 97.7399000 2000 27 1.07060000000 98.8105000 4000 30 1.18950000000 100.0000000 

Status Update, 2014-06-22:

Reputation Threshold # Users % of Avid Users Cumulative % of Avid Users -------------------- ------- --------------- -------------------------- 150 1545 51.346 51.346 200 1109 36.8561 88.2021 500 188 6.2479 94.45 1000 93 3.0907 97.5407 2000 36 1.1964 98.7371 4000 38 1.2629 100 

Here's one way to do something about the problem:

Welcome intermediate users by reviewing their answers or their questions

These queries finds posts from users with between 150 and 500 reputation points who have been seen recently. Those posts are likely to be at least somewhat valuable, since those users are known to have made positive contributions already. Please help bump up these members to 500!


There is an updated query available that now includes graduated privilege thresholds here

\$\endgroup\$
5
  • \$\begingroup\$ Compared to webapps.stackexchange.com, we have more users languishing in the <500 tiers. \$\endgroup\$ Commented Jan 16, 2014 at 23:25
  • 7
    \$\begingroup\$ I think the lesson to learn from Occupy CR is that every single vote counts. Voting for the top 1% users will help towards bringing the regulars where they need to be to sustain a graduated CR, and voting for any of the other 99% users will help bring their average rep above 500. Targeting users between 150 and 500 will work too, but because every single vote counts, we should be voting for questions & answers, not users. \$\endgroup\$ Commented Jan 19, 2014 at 17:39
  • \$\begingroup\$ Thanks for the update! Once thing sticks out to me: it looks like we've lost four 2K users, yet we've still gained eight 4K users. Am I interpreting this data correctly? \$\endgroup\$ Commented Feb 16, 2014 at 9:39
  • \$\begingroup\$ @Jamal That should mean that 8 users got promoted from the [2000,4000) range to the [4000,∞) range, and 4 users got promoted from [1000,2000) to [2000,4000), resulting in a net loss of four 2k users. \$\endgroup\$ Commented Feb 16, 2014 at 9:46
  • 1
    \$\begingroup\$ Oh, okay. I thought I was looking at it incorrectly. Either way, we've made quite some improvements. \$\endgroup\$ Commented Feb 16, 2014 at 9:48
13
\$\begingroup\$

It is very exciting to get badges on Stack Exchange, but, how many badges are you one vote away from?

Badges can be somewhat complicated to understand, but, sometimes you are just one vote away from that Enlightened.... but, were you the first answer? Are there badges you missed?

Is there a vote you can make which will let someone else earn a badge?

All these questions, and more, can be answered with this simple query:

One Vote Short

It currently looks something like this:

eq Badge User Link User Name Post Link Score Name --- ------------- ----------------------------- ----------------------------- ------------------------------------------------------------------------------------------------------------------------------ ----- -------- 1 Enlightened 200_success 200_success Can someone review my implementation of a KeyValuePair? 9 Answer 2 Enlightened 200_success 200_success Brute Force Algorithm in C 9 Answer 3 Enlightened Adrian Panasiuk Adrian Panasiuk Can someone review my code please? 9 Answer 4 Enlightened ANeves ANeves Replace strings in a file 9 Answer 5 Enlightened Anton Golov Anton Golov How can I improve this implementation of Hangman? 9 Answer 6 Enlightened Betamos Betamos Please check my code and coding style. 9 Answer 7 Enlightened c_maker c_maker What do you think about this smelly test? 9 Answer 8 Enlightened Charles Bailey Charles Bailey C++ avoid duplicate code in similar methods 9 Answer 9 Enlightened ChrisWue ChrisWue vb.net select case string comparison 9 Answer 10 Enlightened ChrisWue ChrisWue How do I make this duplicate sort algorithm run faster? 9 Answer 11 Enlightened ChrisWue ChrisWue Simple specific classes design 9 Answer 12 Enlightened Fanis Fanis Why is my web scraping script so slow? 9 Answer 13 Enlightened Gareth Rees Gareth Rees Simpler method to detect int overflow 9 Answer 14 Enlightened Gareth Rees Gareth Rees Append nones to list if is too short 9 Answer 15 Enlightened greatwolf greatwolf Is this implementation of Shamos-Hoey Algorithm OK? 9 Answer 16 Enlightened Groo Groo Can anyone help me in optimizing my code in C#? 9 Answer 17 Enlightened Jamal Jamal Printing certain number of words from file 9 Answer 18 Enlightened Jeff Mercado Jeff Mercado Java Implementation of Quick Sort 9 Answer 19 Enlightened Jeff Mercado Jeff Mercado Avoid duplicate conditional checks in multiple boolean conditions 9 Answer 20 Enlightened Jesse C. Slicer Jesse C. Slicer I there a better way to handle try in C# 9 Answer 21 Enlightened Joonas Pulakka Joonas Pulakka Getting a Variable Array into the Scope of another Method in Java 9 Answer 
\$\endgroup\$
11
\$\begingroup\$

Naruto (accepted answer with zero score)

Accepted answers without upvotes seem to be an anomaly: when would you accept an answer but not upvote it? Perhaps the OP didn't have enough reputation to upvote, or inexperienced in the community? It could be good to take a look here, some of the posts here probably deserve a vote.

\$\endgroup\$
1
  • 2
    \$\begingroup\$ "I will be upvoted, BELIEVE IT!" \$\endgroup\$ Commented Jan 8, 2016 at 16:01
10
\$\begingroup\$

Occupy Code Review (CR)

A while back I had favorited this query for Stack Overflow (SO). Now that CR is on the SEDE, I wondered how we'd compare:

How Rich Are the 1% (Active Users)


With this week's data...

SO's breakdown is:

  • 27.54% to the 1% (1,610 users)
  • 72.46% to the 99%

CR's breakdown is:

  • 17.33% to the 1% (11 users, 104,607 rep | average 9,509.72)
  • 82.67% to the 99% (1,011 users, 499,066 rep | average 493.64)

I think @200_success is right, Web Applications is a very much comparable, graduated site.

Web Applications' breakdown is:

  • 18.79% to the 1% (11 users, 115,301 rep | average 10,481.91)
  • 81.21% to the 99% (994 users, 498,403 rep | average 501.41)

I think we can easily aim at getting the 1%'s average rep to 10K, and the 99%'s average rep to 500 points. All it takes is some more voting.


02/16/2015

Looks like we're very comparable to SO now (by this metric at least).

Who Rep Users PctShareOfWealth AvgUserRep The 1% 483487 21 28.76140597 23023.19 The 99% 1197540 2018 71.23859403 593.43 

12/13/2016

Looks like we've gone past SO now. The 1% are getting richer, and the 99% are getting...richer?

Who Rep Users PctShareOfWealth AvgUserRep The 1% 1051683 35 32.98290456223 30048.09 The 99% 2136887 3379 67.01709543776 632.4 
\$\endgroup\$
3
  • \$\begingroup\$ Added average user rep to the query; top 1% is now 9696.55, bottom 99% is now 496.63. \$\endgroup\$ Commented Jan 21, 2014 at 19:31
  • \$\begingroup\$ Just checked results for this one; 1% is above 10k and 99% is above 500; the 1% owns a little more than 18% of total rep. \$\endgroup\$ Commented Feb 10, 2014 at 1:38
  • \$\begingroup\$ 1%-ers are at 33% of total rep. \$\endgroup\$ Commented Dec 13, 2016 at 6:49
7
\$\begingroup\$

As an out-of-interest discussion about avid users, and avid user retentions... I modified 200_success's avid-user query to exclude users who are dormant. Consider this result:

  • Avid Users of all time
Reputation Threshold # Users % of Avid Users Cumulative % of Avid Users -------------------- ------- --------------- -------------------------- 150 1207 51.38360000000 51.3836000 200 872 37.12220000000 88.5057000 500 161 6.85400000000 95.3597000 1000 59 2.51170000000 97.8714000 2000 24 1.02170000000 98.8931000 4000 26 1.10690000000 100.0000000 

and compare it to

  • avid users who were active (question, answer, or comment) in the past 2 months
Reputation Threshold # Users % of Avid Users Cumulative % of Avid Users -------------------- ------- --------------- -------------------------- 150 195 36.38060000000 36.3806000 200 199 37.12690000000 73.5075000 500 70 13.05970000000 86.5672000 1000 31 5.78360000000 92.3507000 2000 17 3.17160000000 95.5224000 4000 24 4.47760000000 100.0000000 
\$\endgroup\$
7
\$\begingroup\$

Some discussion about answer ratios, vote ratios, and other small-metric numbers was discussed in The 2nd Monitor recently. This motivated the following report. This report cannot be run on the Stack Overflow database - it times out, but it is OK to run on any of the other sites (the data volumes are much lower).

Enter image description here

\$\endgroup\$
0
6
\$\begingroup\$

Generalist badge, and your progression

Generalist is a badge that is awarded if you have a score of 15 for at least 20 of the top 40 tags on the site.

Small print:

  • The top-40 tags are ranked by counting the number of not-closed questions that have that tag.
  • There needs to be more than 200 questions for all top-40 tags (Code Review Recently crossed this threshold, and, when we did, the first Generals badges were awarded )
  • the Score of 15 is calculated by counting the up votes, and subtracting the down votes for answers you have (not questions). So for a score of 15 you need 15 up votes (150 reputation) for that tag, and no down votes.

Analysis:

Because you need 20 tags of score 15, and because questions can have, at most, 5 tags... it is possible that you can get Generalist by answering just 4 questions (if they are tagged just right), and have a reputation of 200 (200 is minimum in case you rep-maxed...). If you have fewer than 4 answers, or less than 200 rep, you will not show up...

Two queries:

  1. This query ranks all users on the site who could potentially qualify for Generalist.

    Note: only people with 20 or more tags will get Generalist badges.

    You can click on the column titles to sort the data. The "User-Link" column does not sort very well, so there is the "User Name" column if you want to find your name.

  2. This second query is for tracking an individual's ( YOUR ) progress toward Generalist.

    For this second query, you will need your Code Review UserID (the link above points to Community). Your UserID is part of the URL when you click on your profile. Copy that number, and enter it in to the box on the query, and run the query.

    The way to understand this report is that it outputs all of the top 40 tags. Then, it calculates the tag popularity (number of questions). The tags are ranked by their popularity. Your score in each tag is also calculated. If your score is less than 15, there will be an exclamation mark in the right-most column. If you have no exclamation mark in your top 20 tags (ranked in the ID column), then you are a Generalist!

\$\endgroup\$
6
\$\begingroup\$

Here is a slightly different way to approach the lack of voting pointed out by @tomdemuyt:

Questions with only zero-score answers

Zero-vote answers on questions with no accepted answer, and all answers with at most zero score

Please use these queries and do your part to upvote zombie answers!

\$\endgroup\$
4
  • 2
    \$\begingroup\$ Filter for any specific tag: data.stackexchange.com/code%20review%20stack%20exchange/query/… \$\endgroup\$ Commented Jan 17, 2014 at 3:33
  • \$\begingroup\$ When I run this for [Java] the first question I see [codereview.stackexchange.com/questions/24275/… threaded circular buffer) has one answer (score 1) and another deleted answer (score 0). But I don't see anything wrong with the NOT EXISTS in the query. Is the query affected because it's my answer? \$\endgroup\$ Commented Jan 19, 2014 at 23:39
  • \$\begingroup\$ @DavidHarkness It's probably because SEDE operates on a reporting database rather than live data. For the purpose of finding candidates to vote on, that's good enough. \$\endgroup\$ Commented Jan 20, 2014 at 2:49
  • \$\begingroup\$ @lol.upvote They changed the url pathing, the url for your link is now. data.stackexchange.com/codereview/query/160341/… Perhaps it should be it's own answer? \$\endgroup\$ Commented Jan 23, 2014 at 17:03
6
\$\begingroup\$

Tumbleweed Prevention

Most of us probably know how it feels to have a question that gets very little attention.

Imagine how it would be to get essentially no attention. No votes. No comments. Very few views.

All you get is a Tumbleweed badge. Yay.

Help prevent the "Tumbleweed" badge by giving the right questions some attention! A question with a single upvote is better than a question with no votes.

(See also a more strict fork of the same base query, used by @RoboSanta.)

\$\endgroup\$
1
6
\$\begingroup\$

I put together a Tag Answer Times query; looking at all the answers posted under the top 10 tags over the last 60 days, I was interested in knowing the hour of the day answers were posted.

The query adds 1 to the hour when the minutes are greater than 30.

Chart rendition of the query results

By removing the date filter, I get this:

All-time top 10 tags / answer times

It looks like the best time to ask a question is around 3 PM UTC.

\$\endgroup\$
5
\$\begingroup\$

Question/Answer ratio

The answers-per-question ratio was not brought up as being a problem in the Grace Note Review, but it is a persistent 'concern' on Area-51. Currently CodeReview is at 1.8 answers/quesion, which is 'OK'. Area51 claims that 2.5 is a good ratio... So, how to improve?

Well, the answer to that is to answer questions multiple times, but, you also want to be rewarded for that, so which questions would be good candidates for a second review? Well, that would be the:

  • Good Questions
  • Only one answer though....
  • and it's not a 'great' answer (low-votes)
  • and it is not an accepted answer
  • and the question-owner is still active on CR so she/he can vote/accept your new answer

Questions like the above would be good candidates for a second answer attempt.

And, now you can:

Run this query to get questions that are good Answer Ratio Candidates

\$\endgroup\$
5
\$\begingroup\$

Categorized lists of tags

I took the top 500 tags and (with the help of a few fellow contributors) made some loosely categorized CREATE TABLE statements that you can drop into your SEDE queries and JOIN to the Tags table.

Example usage:

CREATE TABLE #DatabaseTags (Name VARCHAR(100)); GO INSERT INTO #DatabaseTags VALUES [...] GO -- Top scored database posts SELECT [Post Link] = Posts.Id , [Score] = Posts.Score FROM Posts INNER JOIN PostTags ON PostTags.PostId = PostsTags.TagId INNER JOIN Tags ON PostTags.TagId = Tags.Id INNER JOIN #DatabaseTags ON #DatabaseTags.Name = Tags.Name ORDER BY Posts.Score DESC; 

Of course, feel welcome to inspire your own lists from this, and point out any errors or improvements that could be made in chat :)

\$\endgroup\$
5
\$\begingroup\$

Ripe zombies

Open questions with answers, at least one answer having score 0, no answer having score > 0 (a.k.a. zombies ready for killing). If you have some spare votes burning your pockets, this could be a good place to put them.

\$\endgroup\$
5
\$\begingroup\$

It was asked: Anyone know how to formulate the voting data as a pivot:

Here is a first stab at the critical voting factors for Code Review since 1 Jan 2013.

This does a pivot table on the voting data, and plots the results as a graph over time.

Enter image description here

\$\endgroup\$
4
\$\begingroup\$

"Call of Duty - We're on a mission" says,

Your ammunition: 40 votes per day, unlimited answers.

  • Every unanswered question is a zombie.
  • Every unvoted good answer is a target.

If one answers a question, it is encouraging to get at least one vote (or a comment about how to improve your answer); if you post and get no feedback at all, that's discouraging.

So:

  1. The following is a useful query (highly upvoted, on the front page of the SEDE 'featured' queries):

    Find interesting unanswered questions

    Looks at unanswered questions in your top 20 tags and sorts them by a combined weight which takes into account: score, askers reputation and how well you do on that particular tag

  2. I wrote a second query, copying the same criterion for "interesting" as the one above, adding the tag names, and sorting by reverse date:

    Interesting zero scored answers

    If you run this you'll find answers on topics which you know, which haven't received any votes (or, which hadn't received any at the time when the SEDE data was last collected).

For both queries, type in your numeric User ID, if SEDE can't resolve your ##UserId## automatically.

Good hunting!

\$\endgroup\$
3
\$\begingroup\$

We've discussed Unsung Hero for Best of 2013. Here is some data to go with that:

Unsung Heroes

\$\endgroup\$
2
  • \$\begingroup\$ You might want to come up with some other ORDER BY clause, if the top four users of all time show up as entries 10, 11, 12, and 13 of your Unsung Heroes query. \$\endgroup\$ Commented Jan 17, 2014 at 1:35
  • \$\begingroup\$ The top four users are sung heroes ;) \$\endgroup\$ Commented Jan 17, 2014 at 1:45
3
\$\begingroup\$

Vote Analysis

Compare year-to-year monthly votes, by type:

Answer votes

  • AcceptedByOriginator
  • UpMod
  • DownMod
  • BountyClose
  • Deletion
  • Undeletion
  • ModeratorReview
  • ApproveEditSuggestion

Question votes

  • UpMod
  • DownMod
  • Favorite
  • Close
  • Reopen
  • BountyStart
  • BountyClose
  • Deletion
  • Undeletion
  • ModeratorReview
  • ApproveEditSuggestion

TagWiki votes

  • ApproveEditSuggestion

TagWikiExcerpt votes

  • ApproveEditSuggestion

VoteType Analysis Over Time, By Month/Year

Answer Upvotes

Here's a table that illustrates the kind of analysis I see with this dataset:

 2011 2012 2013 2014 January 1043 1705 +63.5% 1190 -30.2% 1689 +41.9% February 1015 1287 +26.8% 1657 +28.7% March 971 1483 +52.7% 1435 -3.2% April 718 1208 +68.2% 1155 -4.4% May 819 987 +19.8% 1280 +29.7% June 489 1304 +166.7% 1199 -8.1% July 659 1553 +135.7% 1284 -17.3% August 846 1417 +67.5% 1436 +1.3% September 667 1639 +145.7% 1557 -5.0% October 980 1475 +50.5% 1812 +22.8% November 996 1485 +49.1% 2823 +90.1% December 1420 1031 -27.4% 3712 +260.0% ========================================================== 10623 16574 +56.0% 20540 +23.9% 

Monthly Upvotes (Q+A)

enter image description here

Monthly Votes Projection

enter image description here

\$\endgroup\$
8
  • \$\begingroup\$ Nothing like Excel for pivoting :) \$\endgroup\$ Commented Jan 17, 2014 at 15:45
  • 1
    \$\begingroup\$ Strange, it seemed to be going strong throughout 2012 until December, and it never really recovered until the big push started on Meta in October. \$\endgroup\$ Commented Jan 19, 2014 at 23:31
  • \$\begingroup\$ @David that would have to be investigated, but I suspect a high abandonment rate in the last quarter of 2012, of the then-regulars; that would explain the non-recovery until a new base of regulars formed. \$\endgroup\$ Commented Jan 20, 2014 at 14:44
  • \$\begingroup\$ @David looking at your profile... do you recall of anything special in 2012.Q4? \$\endgroup\$ Commented Jan 20, 2014 at 23:22
  • 1
    \$\begingroup\$ @lol.upvote I've only visited the site on 16 days--10 in the past month. I could have sworn I visited a lot more before that. However, at that time I was knee deep in a rewrite of a major backend app at work. I'd be hard-pressed to recall anything else from that time. :p \$\endgroup\$ Commented Jan 21, 2014 at 2:46
  • \$\begingroup\$ @DavidHarkness 16 days, 74 answers? Did you even sleep at all?! \$\endgroup\$ Commented Jan 21, 2014 at 2:53
  • 1
    \$\begingroup\$ @lol.upvote Oh, that's specific to meta. It shows my rep from the main site so I didn't notice. 293 days sounds more like it! :) \$\endgroup\$ Commented Jan 21, 2014 at 3:10
  • \$\begingroup\$ @200_success took a stab at the pivot table \$\endgroup\$ Commented Jan 21, 2014 at 5:22
3
\$\begingroup\$

Time to Answer

This query breaks down the average standard deviation of time to answer by month (questions' date asked), for the top 5 tags:

Average time to answer per tag

average time to answer history chart, per tag

http://data.stackexchange.com/codereview/query/170387/time-to-answer#graph

Stdev time to answer per tag

stdev time to answer history chart, per tag

Stdev @ StackOverflow

stdev metric, same query as above, for StackOverflow

Looks like the metric should be revolving at around 50 hours, per Stack Overflow.

\$\endgroup\$
3
\$\begingroup\$

Answers that elicited a "wow" comment from the original poster

(I've sorted the results by ascending answer score, to help identify which good candidates to upvote.)

\$\endgroup\$
3
\$\begingroup\$

FizzBuzz on FizzBuzz

This one is just for fun! It returns all questions, as well as performing FizzBuzz on several values related to the questions and users!

\$\endgroup\$
2
  • 2
    \$\begingroup\$ Aaww... missing row_id over creation date for the ultimate meta-fizzbuzz! \$\endgroup\$ Commented Jun 29, 2015 at 18:12
  • 1
    \$\begingroup\$ Feel welcome to add that in and update, I don't know how to do that! \$\endgroup\$ Commented Jun 29, 2015 at 20:24
3
\$\begingroup\$

Code-only answers

Code-only answers (also known as code dumps) don't make good code reviews in general, and require attention, to get improved, or possibly deleted.

\$\endgroup\$
3
\$\begingroup\$

Bad Naruto (*)

Selfie accepted answers with zero or negative score, when there are other answers with positive score. Such answers seem suspicious, and might even need moderator attention.

Warning: these are potential bad Narutos. Legitimate false positives may exist.

(*) Naruto (a hat in Winter Bash 2014): accepted answer with zero score

\$\endgroup\$
1
\$\begingroup\$

For bragging rights ;-)

Top answerers for each Tag

(Credit to @rolfl)

\$\endgroup\$
1
\$\begingroup\$

Look for zombies by tag

Note that you need to use quote marks in the tag field since it is VARCHAR. Like 'java', etc. Otherwise you'll get an error like "Invalid column: java".

\$\endgroup\$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.