Parsing a complicated CSV generated on election nights

Question

I have a Python script which parses a complicated CSV generated on election nights. Each row of the CSV represents a race. As I loop through the races, I store the candidates for each race into a list called cnds. The other variable to note is called num_win, and it holds the number of people who will be elected for that particular race. Usually it's just 1, but in cases like school boards, it can be much higher.

For illustration purposes, here's some sample data we might process:

num_win = 6 cnds = [ { 'cnd' : 'Christine Matthews', 'votes' : 200, 'winner': False }, { 'cnd' : 'Dexter Holmes', 'votes' : 123, 'winner': False }, { 'cnd' : 'Gerald Wheeler', 'votes' : 123, 'winner': False }, { 'cnd' : 'Timothy Hunter', 'votes' : 100, 'winner': False }, { 'cnd' : 'Sheila Murray', 'votes' : 94, 'winner': False }, { 'cnd' : 'Elisa Banks', 'votes' : 88, 'winner': False }, { 'cnd' : 'John Park', 'votes' : 88, 'winner': False }, { 'cnd' : 'Guadalupe Bates', 'votes' : 76, 'winner': False }, { 'cnd' : 'Lynne Austin', 'votes' : 66, 'winner': False } ]

First attempt:

My initial version was pretty straightforward. Make a copy of cnds, sort it in order of vote count, and limit to all but the num_win number of candidates. These are the winners. Then loop through cnds and mark the winners.

winners = sorted(cnds, key=lambda k: int(k['votes']), reverse=True)[0:num_win] for cnd in cnds: for winner in winners: if cnd['cnd'] == winner['cnd']: cnd['winner'] = True

This works great -- except I realized later that it doesn't account for ties.

Since this script is for election night when the results are unofficial, I only want to mark as winners the candidates I am sure are winners. In the data above, the clear winners would be: Christine Matthews, Dexter Holmes, Gerald Wheeler, Timothy Hunter, and Sheila Murray. There is a tie for the sixth spot. Depending on the type of race, etc, it might be settled later by a runoff or some other mechanism. So, on election night, I simply wouldn't mark anyone else after those 5 as being a winner.

Here's the new code I've written, which accounts for tie situations:

# Make list of unique vote totals, with number of candidates who had those vote totals # This code uses collections.Counter to make the list of uniques. # http://stackoverflow.com/a/15816111/566307 uniques = Counter(cnd['votes'] for cnd in cnds).iteritems() # Now convert the Counter() output into a sorted list of tuples. uniquesCount = sorted( uniques, reverse=True )[0:num_win] # How many candidates are there in this list? # http://stackoverflow.com/a/14180875/566307 cndsInUniques = map(sum,zip(*uniquesCount))[1] # There's too many candidates. Must be one or more ties if cndsInUniques > num_win: adjusted_num_win = num_win # We need to remove items from the uniques list until we get the # num of candidates below or equal to the num_win threshold. while len(uniquesCount) > 0: # delete last item del uniquesCount[-1] cndsInUniques = map(sum,zip(*uniquesCount))[1] if cndsInUniques <= num_win: adjusted_num_win = cndsInUniques break winners = sorted(cnds, key=lambda k: int(k['votes']), reverse=True)[0:adjusted_num_win] # Right number of candidates means no ties. Proceed as normal. else: # Make list of candidates, sorted by vote totals winners = sorted(cnds, key=lambda k: int(k['votes']), reverse=True)[0:num_win] # loop through all candidates and mark the ones who are winners for cnd in cnds: for winner in winners: if cnd['cnd'] == winner['cnd']: cnd['winner'] = True

This code is working for me, but I feel like it's a lot of work to reach the adjusted_num_win number that I need. Can anyone suggest an alternative, or ways I might simplify this?

vnp · Accepted Answer · 2014-05-12 22:46:58Z

# Make one more candidate than necessary into winners list winners = sorted(cnds, key=lambda k: int(k['votes'], reverse=True)[0:num_win + 1] # A tie to be resolved happens when two last candidates have equal vote count. # if so, go backwards removing everybody with the same vote count. # Binary search would work faster, of course. If a standard library # provides something similar to std::lower_bound from STL - its even better. index = num_win while index > 0 and winners[index - 1]['votes'] == winners[num_win]['votes']: index -= 1 # Finally, adjust your list winners = winners[0:index]

PS: One more thing to mention. The final nested loop is not really the best approach. You should decorate the original list with the sequential numbers (or use some other method to remember initial ordering), sort it, mark the winners which are at the beginning of the list, and sort it by sequence numbers back to the original state.

Interesting. Will this handle cases where more than two people are tied? Also, will it handle cases where there are more than one set of ties? (For example, num_win = 3; there are 4 candidates; two candidates tie with 12 votes; the other two tie with 16 (happened in an election last month)) — Kirkman14
– Kirkman14, Commented May 12, 2014 at 22:34
Yes. You only need to resolve tie which crosses the number of winners. The guys with 16 votes are winners anyway, right? — vnp
– vnp, Commented May 12, 2014 at 22:36
Well, that's definitely a better way. At first it wasn't clear to me that this would work in the case of a three-way tie, but after playing with it, I think I get it. — Kirkman14
– Kirkman14, Commented May 12, 2014 at 22:52
heapq.nlargest may be more efficient way to get the num_win+1 highest vote getters. At least, it has better computational complexity than sorted (though for small lists or large num_wins the sorting may be faster anyway). — Blckknght
– Blckknght, Commented May 13, 2014 at 11:17

Sean Perry · Accepted Answer · 2014-05-12 22:59:56Z

Simplify your logic by using a class and a multiple value for win/tied/lost. You could use an Enum here as well.

from collections import Counter from itertools import groupby from operator import attrgetter from random import shuffle HAS_LOST, HAS_TIED, HAS_WON = (0, 1, 2) class Candidate(object): """Simple wrapper around CSV data for an election candidate.""" name = None votes = None result = None # election result def __init__(self, name, votes): self.name = name self.votes = votes self.result = HAS_LOST def markWinner(self): self.result = HAS_WON def markTied(self): self.result = HAS_TIED def __repr__(self): return "Candidate({}, {})".format(self.name, self.votes) def __str__(self): if self.result == HAS_WON: result_string = "WINNER" elif self.result == HAS_TIED: result_string = "TIED" else: result_string = "LOST" return "{}: {} ({})".format(self.name, result_string, self.votes) num_win = 6 cnds = [ Candidate('Christine Matthews', 200), Candidate('Dexter Holmes', 123), Candidate('Gerald Wheeler', 123), Candidate('Timothy Hunter', 100), Candidate('Sheila Murray', 94), Candidate('Elisa Banks', 88), Candidate('John Park', 88), Candidate('Guadalupe Bates', 76), Candidate('Lynne Austin', 66) ] shuffle(cnds) # just to prove it works cnds.sort(key=attrgetter('votes'), reverse=True) wins_so_far = 0 for votes, group in groupby(cnds, key=attrgetter('votes')): group = list(group) # groups is a generator still so force the list if (wins_so_far + len(group)) < num_win: for c in group: c.markWinner() wins_so_far += len(group) else: for c in group: c.markTied() break # every one else remains marked HAS_LOST for cnd in sorted(cnds, key=attrgetter('result'), reverse=True): print cnd

valenterry · Accepted Answer · 2014-05-12 23:36:39Z

I want to offer a more functional approach.

We start with your sorted list:

 candidatesSorted = sorted(cnds, key=lambda k: int(k['votes']), reverse=True)

then we group the candidates by their votes:

groups = groupby(candidatesSorted , key=lambda k: int(k['votes']))

(currently not sure if these groups have to be ordered again or if the order remains) Now we want to add each group of candidates to the winnerlist, unless we already have six or more candidates in our winnerlist already:

reduce( lambda winners, group: len(winners) >= 6 ? winners : winners.extend(candidateGroup), groups)

this returns the (flat) list of 5 winners if there are no ties and a list of all winners including ties where the last tie group lies within the given limit (here 6).

This code is untested yet and im not that familiar with python so please feel free to edit and correct any mistakes that I sure have mad. I hope though that my comments help to understand the code even if it has errors.

I'm getting "invalid syntax" on the "?" in that final statement. — Kirkman14
– Kirkman14, Commented May 12, 2014 at 23:37

Stack Exchange Network

Parsing a complicated CSV generated on election nights

3 Answers 3

You must log in to answer this question.

Hot Network Questions

Parsing a complicated CSV generated on election nights

3 Answers 3

You must log in to answer this question.

Related

Hot Network Questions