I have a Python script which parses a complicated CSV generated on election nights. Each row of the CSV represents a race. As I loop through the races, I store the candidates for each race into a list called cnds. The other variable to note is called num_win, and it holds the number of people who will be elected for that particular race. Usually it's just 1, but in cases like school boards, it can be much higher.
For illustration purposes, here's some sample data we might process:
num_win = 6 cnds = [ { 'cnd' : 'Christine Matthews', 'votes' : 200, 'winner': False }, { 'cnd' : 'Dexter Holmes', 'votes' : 123, 'winner': False }, { 'cnd' : 'Gerald Wheeler', 'votes' : 123, 'winner': False }, { 'cnd' : 'Timothy Hunter', 'votes' : 100, 'winner': False }, { 'cnd' : 'Sheila Murray', 'votes' : 94, 'winner': False }, { 'cnd' : 'Elisa Banks', 'votes' : 88, 'winner': False }, { 'cnd' : 'John Park', 'votes' : 88, 'winner': False }, { 'cnd' : 'Guadalupe Bates', 'votes' : 76, 'winner': False }, { 'cnd' : 'Lynne Austin', 'votes' : 66, 'winner': False } ] First attempt:
My initial version was pretty straightforward. Make a copy of cnds, sort it in order of vote count, and limit to all but the num_win number of candidates. These are the winners. Then loop through cnds and mark the winners.
winners = sorted(cnds, key=lambda k: int(k['votes']), reverse=True)[0:num_win] for cnd in cnds: for winner in winners: if cnd['cnd'] == winner['cnd']: cnd['winner'] = True This works great -- except I realized later that it doesn't account for ties.
Since this script is for election night when the results are unofficial, I only want to mark as winners the candidates I am sure are winners. In the data above, the clear winners would be: Christine Matthews, Dexter Holmes, Gerald Wheeler, Timothy Hunter, and Sheila Murray. There is a tie for the sixth spot. Depending on the type of race, etc, it might be settled later by a runoff or some other mechanism. So, on election night, I simply wouldn't mark anyone else after those 5 as being a winner.
Here's the new code I've written, which accounts for tie situations:
# Make list of unique vote totals, with number of candidates who had those vote totals # This code uses collections.Counter to make the list of uniques. # http://stackoverflow.com/a/15816111/566307 uniques = Counter(cnd['votes'] for cnd in cnds).iteritems() # Now convert the Counter() output into a sorted list of tuples. uniquesCount = sorted( uniques, reverse=True )[0:num_win] # How many candidates are there in this list? # http://stackoverflow.com/a/14180875/566307 cndsInUniques = map(sum,zip(*uniquesCount))[1] # There's too many candidates. Must be one or more ties if cndsInUniques > num_win: adjusted_num_win = num_win # We need to remove items from the uniques list until we get the # num of candidates below or equal to the num_win threshold. while len(uniquesCount) > 0: # delete last item del uniquesCount[-1] cndsInUniques = map(sum,zip(*uniquesCount))[1] if cndsInUniques <= num_win: adjusted_num_win = cndsInUniques break winners = sorted(cnds, key=lambda k: int(k['votes']), reverse=True)[0:adjusted_num_win] # Right number of candidates means no ties. Proceed as normal. else: # Make list of candidates, sorted by vote totals winners = sorted(cnds, key=lambda k: int(k['votes']), reverse=True)[0:num_win] # loop through all candidates and mark the ones who are winners for cnd in cnds: for winner in winners: if cnd['cnd'] == winner['cnd']: cnd['winner'] = True This code is working for me, but I feel like it's a lot of work to reach the adjusted_num_win number that I need. Can anyone suggest an alternative, or ways I might simplify this?