Firstly, use PyPy.

 $ python2 p.py
 Total: 29.8 seconds

 $ pypy p.py
 Total: 5.1 seconds

PyPy actually does inlining, so the `timeit` overhead makes up a big part of that. Removing it gives

 $ pypy p.py
 Total: 4.1 seconds

That's a factor-of-7 improvement already.

Often PyPy 3 is faster, so to try that. To do you should

* change the `print`s to functions and use `from __future__ import print_function` to keep Python 2 compatibility and

* change `debugger` to use `def debugger(times, n_runs):` and call it with `debugger(*main(...))`.

It averages a tiny bit faster:

 $ pypy3 p.py
 Total: 4.1 seconds

`INDEX_TO_VALUE` should be a tuple, not a dictionary:

 INDEX_TO_VALUE = ((0, 0, 0), (0, 1, 0), (0, 2, 0),
 (0, 3, 1), (0, 4, 1), (0, 5, 1),
 (0, 6, 2), (0, 7, 2), (0, 8, 2),
 ...)

The comment above it is bad

 # index:(row,column,group,value)

The spacing is a bit cramped, but primarily I'm concerned that there are the wrong number of elements! Remove the last one.

Further, a comprehension is much cleaner:

 INDEX_TO_VALUE = tuple((row, col, row//3 + col//3*3) for row in range(9) for col in range(9))

although you might want to make a function to help:

 def group_at(row, col):
 return row // 3 + col // 3 * 3

 INDEX_TO_VALUE = tuple((row, col, group_at(row, col)) for row in range(9) for col in range(9))

Your `timeit` function should really avoid the globals by using lexical scoping. This strikes me as a good comromise:

 debugged = set()

 def timeit(func):
 @functools.wraps(func)
 def wrapper(*args, **kwargs):
 t_start = time.clock()
 ret = func(*args, **kwargs)
 t_end = time.clock() - t_start

 wrapper.calls += 1
 wrapper.time += t_end

 return ret

 wrapper.calls = 0
 wrapper.time = 0
 debugged.add(wrapper)
 return wrapper

You still have global data but you localize it to the function. To me, this seems a sensible trade. Note that `debugged` and the original `DEBUGGER_TIMER` were not constants so should not be in ALL_CAPS.

Another thing that irks me is your use of strings to represent the board. Tuples of integers makes more sense and `0` can replace your `.`. Strings might be neglibily faster, but it seems like a poor readability trade to me.

You don't use `graph` as far as I can tell. Just remove it.

You do:

 if vertex not in visited:
 visited.add(vertex)
 ...
 stack.extend(graph[vertex] - visited)

The `- visited` is redundant with the `if vertex not in visited` check. Remove it. Further, you don't need any of the checks anyway; the cost of doing them exceeds the benefit, at least after moving to lists of integers. You might need to change `solutions` to a set.

Your

 list(results.add("%s%s%s" % (puzzle[:index], move, puzzle[index + 1:])) for move in moves[1])

is a bit crazy; just do

 for move in moves[1]: results.add("%s%s%s" % (puzzle[:index], move, puzzle[index + 1:]))

`Cell.board` is a global; this is a really bad idea as it wrecks the ability to use this extensibly (say, have two boards). Speed-wise it's very problematic because you rebuild the whole grid each time; it would be better to mutate as you go and undo when backtracking. However, it is possible to improve. I tried something like

 @timeit
 def get_missing_values(row, column, group, board):
 missing = [1, 2, 3, 4, 5, 6, 7, 8, 9]

 for c_row, c_column, c_group, c_value in board:
 if c_row == row or c_column == column or c_group == group:
 if c_value in missing:
 missing.remove(c_value)

 return missing

 @timeit
 def create_cells(puzzle):
 taken = []
 untaken = []

 for index, value in enumerate(puzzle):
 row, column, group = INDEX_TO_VALUE[index]
 if value:
 taken.append((row, column, group, value))
 else:
 untaken.append((row, column, group))

 return taken, untaken

 @timeit
 def get_next_moves(puzzle):
 taken, untaken = create_cells(puzzle)
 other_option = None
 len_option = 9

 for row, column, group in untaken:
 missing_values = get_missing_values(row, column, group, taken)

 if len(missing_values) == 1:
 return 9*row+column, missing_values

 elif len(missing_values) < len_option:
 len_option = len(missing_values)
 other_option = 9*row+column, missing_values

 return other_option

This gets rid of the class for convenience (tuples are easier here). `get_next_moves` now generates the board and passes that to `get_missing_values`. This improves times:

 $ python3 p.py
 HARD - 3 RUNS
 Total: 19.8 seconds

 $ python2 p.py
 HARD - 3 RUNS
 Total: 12.7 seconds

 $ pypy3 p.py
 HARD - 3 RUNS
 Total: 2.3 seconds

 $ pypy p.py
 HARD - 3 RUNS
 Total: 2.3 seconds

But `get_missing_values` expectedly still takes majority time. Using a set instead of a list speeds up CPython but slows down PyPy.

This suggests effort should be solely on a more efficient representation for that. Here's one idea:

 @timeit
 def create_cells(puzzle):
 rows = [set() for _ in range(9)]
 columns = [set() for _ in range(9)]
 groups = [set() for _ in range(9)]
 untaken = []

 for index, value in enumerate(puzzle):
 row, column, group = INDEX_TO_VALUE[index]
 if value:
 rows[row].add(value)
 columns[column].add(value)
 groups[group].add(value)
 else:
 untaken.append((row, column, group))

 return rows, columns, groups, untaken

 @timeit
 def get_next_moves(puzzle):
 rows, columns, groups, untaken = create_cells(puzzle)
 other_option = None
 len_option = 9

 for row, column, group in untaken:
 missing_values = {1, 2, 3, 4, 5, 6, 7, 8, 9} - rows[row] - columns[column] - groups[group]

 if len(missing_values) == 1:
 return 9*row+column, missing_values

 elif len(missing_values) < len_option:
 len_option = len(missing_values)
 other_option = 9*row+column, missing_values

 return other_option

This gives, after disabling `timeit`,

 $ python3 p.py
 Total: 8.1 seconds

 $ python2 p.py
 Total: 6.7 seconds

 $ pypy3 p.py
 Total: 2.0 seconds

 $ pypy p.py 
 Total: 2.0 seconds

which is a massive boost for CPython.

It so happens that `get_next_moves` is still taking the most time, although `create_cells` is catching up. One idea is to change the sets to bitmasks:

 @timeit
 def create_cells(puzzle):
 rows = [0] * 9
 columns = [0] * 9
 groups = [0] * 9
 untaken = []

 for index, value in enumerate(puzzle):
 row, column, group = INDEX_TO_VALUE[index]
 if value:
 rows[row] |= 1<<(value-1)
 columns[column] |= 1<<(value-1)
 groups[group] |= 1<<(value-1)
 else:
 untaken.append((row, column, group))

 return rows, columns, groups, untaken

 decode_bits = [tuple(i+1 for i in range(9) if 1<<i & bits) for bits in range(512)]

 @timeit
 def get_next_moves(puzzle):
 rows, columns, groups, untaken = create_cells(puzzle)
 other_option = None
 len_option = 9

 for row, column, group in untaken:
 missing_values = decode_bits[0b111111111 & ~rows[row] & ~columns[column] & ~groups[group]]

 if len(missing_values) == 1:
 return 9*row+column, missing_values

 elif len(missing_values) < len_option:
 len_option = len(missing_values)
 other_option = 9*row+column, missing_values

 return other_option

This gives a very good boost in speed to PyPy and improves CPython noticeably:

 $ python3 p.py
 HARD - 3 RUNS
 Total: 8.0 seconds
 Mean: 2.66720 seconds
 Max: 2.67445 seconds
 Min: 2.66322 seconds

 create_cells()
 Called: 47680 times per run (143040 total)
 Running for 6.101s (in 3 runs) / 2.03374s per run
 get_next_moves()
 Called: 47680 times per run (143040 total)
 Running for 1.189s (in 3 runs) / 0.39625s per run
 possible_moves()
 Called: 47680 times per run (143040 total)
 Running for 0.443s (in 3 runs) / 0.14770s per run
 depth_first_search()
 Called: 1 times per run (3 total)
 Running for 0.268s (in 3 runs) / 0.08946s per run
 win()
 Called: 1 times per run (3 total)
 Running for 0.000s (in 3 runs) / 0.00004s per run
 main()
 Called: 1 times per run (1 total)
 Running for 0.000s (in 3 runs) / 0.00003s per run

and for PyPy:

 $ pypy3 p.py 
 HARD - 4 RUNS
 Total: 1.0 seconds
 Mean: 0.26078 seconds
 Max: 0.35315 seconds
 Min: 0.21801 seconds

 possible_moves()
 Called: 47680 times per run (190720 total)
 Running for 0.519s (in 4 runs) / 0.12972s per run
 create_cells()
 Called: 47680 times per run (190720 total)
 Running for 0.339s (in 4 runs) / 0.08473s per run
 depth_first_search()
 Called: 1 times per run (4 total)
 Running for 0.094s (in 4 runs) / 0.02351s per run
 get_next_moves()
 Called: 47680 times per run (190720 total)
 Running for 0.091s (in 4 runs) / 0.02272s per run
 win()
 Called: 1 times per run (4 total)
 Running for 0.000s (in 4 runs) / 0.00009s per run
 main()
 Called: 1 times per run (1 total)
 Running for 0.000s (in 4 runs) / 0.00005s per run

So the next thing to tackle is `create_cells` or `possible_moves`, depending on which interpreter you care about most. Going with `create_cells`, I currently have:

 @timeit
 def create_cells(puzzle):
 rows = [0] * 9
 columns = [0] * 9
 groups = [0] * 9
 untaken = []

 for index, value in enumerate(puzzle):
 row, column, group = INDEX_TO_VALUE[index]
 if value:
 rows[row] |= 1<<(value-1)
 columns[column] |= 1<<(value-1)
 groups[group] |= 1<<(value-1)
 else:
 untaken.append((row, column, group))

 return rows, columns, groups, untaken

CPython doesn't deduplicate repeated constants like PyPy does, so one should deduplicate it manually. We should also move to using `zip` instead of `enumerate`:

 @timeit
 def create_cells(puzzle):
 rows = [0] * 9
 columns = [0] * 9
 groups = [0] * 9
 untaken = []

 for position, value in zip(INDEX_TO_VALUE, puzzle):
 if value:
 row, column, group = position
 bit = 1<<(value-1)
 rows[row] |= bit
 columns[column] |= bit
 groups[group] |= bit
 else:
 untaken.append(position)

 return rows, columns, groups, untaken

CPython is noticably faster:

 $ python3 p.py
 Total: 5.9 seconds

 $ python2 p.py
 Total: 4.1 seconds

CPython is now as fast as PyPy was at the start! I didn't get times for PyPy.

It's even a bit faster if we carry the change through; make all operations act on shifted bits and unshift them at the end; eg.

 bit_counts = tuple(bin(i).count("1") for i in range(512))

 @timeit
 def get_next_moves(puzzle):
 rows, columns, groups, untaken = create_cells(puzzle)
 other_option = None
 len_option = 9

 for row, column, group in untaken:
 missing_values = 0b111111111 & ~rows[row] & ~columns[column] & ~groups[group]
 set_bits = bit_counts[missing_values]

 if set_bits == 1:
 return 9*row+column, missing_values

 elif set_bits < len_option:
 len_option = set_bits
 other_option = 9*row+column, missing_values

 return other_option

and similar for the rest of the code.

 $ pypy3 p.py
 Min: 0.11042 seconds
 
 $ pypy p.py 
 Min: 0.13626 seconds
 
 $ python3 p.py
 Min: 1.70156 seconds
 
 $ python2 p.py
 Min: 1.24454 seconds

I'm using minimum times now because PyPy gets below the minimum runtime overall. We can see that both interpreters are much faster than previously.

Personally, I would write `possible_moves` as a generator:

 @timeit
 def possible_moves(puzzle):
 index_moves = get_next_moves(puzzle)

 if not index_moves:
 return

 index, moves = index_moves
 for bit in bits:
 if moves & bit:
 yield puzzle[:index] + (bit,) + puzzle[index + 1:]

Instead of your timing code, I would have used `cProfile`. It's built-in and far simpler to use. For CPython I would also sometimes use `line_profiler`, which gives line-by-line timings. In other words, it's the best thing evah. I would use the `time` utility to get a time of the whole code when fine-grained times aren't needed.

These get rid of a nontrivial portion of the code.

I would be very careful to stick to PEP8 spacing and add line breaks where they help. Your code is too dense.

I would extract the grid printing code into another function.

`main` shouldn't really be returning things; stick everything under `if __name__ == '__main__'` into `main`.

`depth_first_search` will only ever return exactly one solution, so there's no need to return a set. Further,

* Try returning early,
* Raise an exception `if not win`, as you would have entered an invalid state.
* Don't use the top-level `Exception` type; use more precise variants.

<!-- -->

 def depth_first_search(start):
 stack = [start]
 solution = None

 while stack:
 vertex = stack.pop()

 if 0 not in vertex:
 assert win(vertex)

 if solution and vertex != solution:
 raise ValueError("More than one solution")

 solution = vertex

 else:
 stack.extend(possible_moves(vertex))

 if solution is None:
 raise ValueError("No solution found")

 return solution

`win` doesn't really check if you've won; rename it to, say, `validate`.

`INDEX_TO_VALUE` should really be renamed by this point. I would go with `POSITIONS`.

`validate` can just be:

 def validate(puzzle):
 return Counter(puzzle) == dict.fromkeys(BITS, 9)

In my opinion, `depth_first_search` should yield its solutions including duplicates and the callee should be responsible for checking that the right number of solutions are present and removing duplicates.

There should be a `puzzle name: puzzle` dictionary that is used to search for puzzles. I would also try and format the grid more explicitly.

I would go with:

 _ = 0

 puzzles = {
 'easy': [
 5,3,_, _,7,_, _,_,_,
 6,_,_, 1,9,5, _,_,_,
 _,9,8, _,_,_, _,6,_,

 8,_,_, _,6,_, _,_,3,
 4,_,_, 8,_,3, _,_,1,
 7,_,_, _,2,_, _,_,6,

 _,6,_, _,_,_, 2,8,_,
 _,_,_, 4,1,9, _,_,5,
 _,_,_, _,8,_, _,7,9,
 ],

 'hard': [
 8,_,_, _,_,_, _,_,_,
 _,_,3, 6,_,_, _,_,_,
 _,7,_, _,9,_, 2,_,_,

 _,5,_, _,_,7, _,_,_,
 _,_,_, _,4,5, 7,_,_,
 _,_,_, 1,_,_, _,3,_,

 _,_,1, _,_,_, _,6,8,
 _,_,8, 5,_,_, _,1,_,
 _,9,_, _,_,_, 4,_,_,
 ]
 }

Here's the full code:

 # encoding: utf8

 """
 A puzzle-solving masterpiece!

 Solves Soduko.
 """

 from __future__ import print_function

 import functools
 import time

 from collections import Counter


 def group_at(row, col):
 return row // 3 + col // 3 * 3

 # The row, column and group number for each item in the grid
 POSITIONS = tuple((row, col, group_at(row, col)) for row in range(9) for col in range(9))

 # The number of bits for each value 0 <= i < 512
 BIT_COUNTS = tuple(bin(i).count("1") for i in range(512))

 # For looping
 BITS = tuple(1<<i for i in range(9))

 # Inverse of above for printing
 DECODE_BIT = {1<<i: i+1 for i in range(9)}
 DECODE_BIT[0] = 0


 def find_taken(puzzle):
 """
 Return three lists of what numbers are taken in each row, column and group and
 one list of which positions (row, column, group) are untaken.
 """

 rows = [0] * 9
 columns = [0] * 9
 groups = [0] * 9
 untaken = []

 for position, bit in zip(POSITIONS, puzzle):
 if bit:
 row, column, group = position
 rows[row] |= bit
 columns[column] |= bit
 groups[group] |= bit
 else:
 untaken.append(position)

 return rows, columns, groups, untaken


 def get_next_moves(puzzle):
 """
 Return the (index, missing_values) pair with the fewest possible moves.
 index is an integer 0 <= index < 81 and missing_values is a bitset
 of length 9.

 Returns None if there are no candidate moves.
 """

 rows, columns, groups, untaken = find_taken(puzzle)
 other_option = None
 len_option = 9

 for row, column, group in untaken:
 missing_values = 0b111111111 & ~rows[row] & ~columns[column] & ~groups[group]
 set_bits = BIT_COUNTS[missing_values]

 if set_bits == 1:
 return 9*row+column, missing_values

 elif set_bits < len_option:
 len_option = set_bits
 other_option = 9*row+column, missing_values

 return other_option


 def possible_moves(puzzle, index, moves):
 """
 Yield the states of the grid for after taking the given moves at index on puzzle.

 index is an integer 0 <= index < 81 and moves is a bitset of length 9.
 """

 for bit in BITS:
 if moves & bit:
 yield puzzle[:index] + (bit,) + puzzle[index + 1:]


 def validate(puzzle):
 """
 Validate that the puzzle contains 9 of each number and is length 81.

 This does not fully validate that the solution is valid.
 """

 return Counter(puzzle) == dict.fromkeys(BITS, 9)


 def depth_first_search(puzzle):
 """
 Do a depth-first search of the solution space for the input Soduku puzzle.
 Yields solutions. May yield duplicates.
 """

 stack = [puzzle]

 while stack:
 vertex = stack.pop()

 if 0 not in vertex:
 assert validate(vertex)
 yield vertex

 else:
 stack.extend(possible_moves(vertex, *get_next_moves(vertex)))


 def print_grid(puzzle):
 """
 Print a pretty representation of the input Soduku puzzle.
 """

 for i, bit in enumerate(puzzle, 1):
 value = DECODE_BIT[bit] or "·"

 if i % 9 == 0:
 print(value)
 else:
 print(value, end="")

 if i % 9 and not i % 3:
 print(" ", end="")

 if i == 27 or i == 54:
 print()


 def main(puzzle_name):
 _ = 0

 puzzles = {
 'easy': [
 5,3,_, _,7,_, _,_,_,
 6,_,_, 1,9,5, _,_,_,
 _,9,8, _,_,_, _,6,_,

 8,_,_, _,6,_, _,_,3,
 4,_,_, 8,_,3, _,_,1,
 7,_,_, _,2,_, _,_,6,

 _,6,_, _,_,_, 2,8,_,
 _,_,_, 4,1,9, _,_,5,
 _,_,_, _,8,_, _,7,9,
 ],

 'hard': [
 8,_,_, _,_,_, _,_,_,
 _,_,3, 6,_,_, _,_,_,
 _,7,_, _,9,_, 2,_,_,

 _,5,_, _,_,7, _,_,_,
 _,_,_, _,4,5, 7,_,_,
 _,_,_, 1,_,_, _,3,_,

 _,_,1, _,_,_, _,6,8,
 _,_,8, 5,_,_, _,1,_,
 _,9,_, _,_,_, 4,_,_,
 ]
 }

 grid = tuple(i and 1<<(i-1) for i in puzzles[puzzle_name])

 print("Puzzle:")
 print_grid(grid)

 [result] = set(depth_first_search(grid))

 print()
 print("Result:")
 print_grid(result)


 if __name__ == '__main__':
 main('hard')