15
\$\begingroup\$

My attempt at stating this question, but with a more objective solving criterion.

Your task is to build a program or function that takes a solved Sudoku grid S in the format of your choice and attempts to generate a problem grid with as few clues as possible that has S as its unique solution. (It doesn't matter what method S is the unique solution by, including brute force, as long as the solution is provably unique.)


Your program will be scored by running it through a set of 100,000 solution grids found in this file (7.82 MB download), and adding up the number of clues in all 100,000 problem grids that your solution produces.

The Sudoku solutions in the above test file are expressed as an 81-character string, from left-to-right, then top-to-bottom. The code required to turn the input in the test file into a usable solution will not count towards your solution's byte count.

As in my Flood Paint challenge, your program must actually produce a valid output for all 100,000 puzzles to be considered a valid solution. The program that outputs the fewest total clues for all 100,000 test cases is the winner, with shorter code breaking a tie.


Current scoreboard:

  1. 2,361,024 - nutki, C
  2. 2,580,210 - es1024, PHP
  3. 6,000,000 - CarpetPython, Python 2
  4. 7,200,000 - Joe Z., Python
\$\endgroup\$
1
  • \$\begingroup\$ Also, you can be sure that any solution claiming less than 1,700,000 solutions is a phony, but I want to see just how low these can go. \$\endgroup\$ Commented Apr 7, 2015 at 7:51

5 Answers 5

8
\$\begingroup\$

C - 2,361,024 2,509,949 clues

Remove clues starting from the last cell if a brute force solver finds only one unique solution.

Second try: use heuristic to decide in which order to remove clues instead of starting from the last. This makes the code run much slower (20 minutes instead of 2 to calculate the result). I could make the solver faster, to experiment with different heuristics, but for now it will do.

#include <stdio.h> #include <string.h> char ll[100]; short b[81]; char m[81]; char idx[81][24]; int s; char lg[513]; void pri2() { int i; for(i=0;i<81;i++) putchar(lg[b[i]]); putchar('\n'); } void solve(pos){ int i,p; if (s > 1) return; if (pos == 81) { s++; return; } if (b[pos]) return solve(pos+1); for (p=i=0;i<24;i++) p |= b[idx[pos][i]]; for (i = 0; i < 9; i++) if (!(p&(1<<i))) { b[pos] = 1 << i; solve(pos + 1); } b[pos] = 0; } int main() { int i,j,t; for(i=0;i<9;i++) lg[1<<i]='1'+i; lg[0] = '.'; for(i=0;i<81;i++) { t = 0; for(j=0;j<9;j++) if(i/9*9 + j != i) idx[i][t++] = i/9*9 + j; for(j=0;j<9;j++) if(i%9 + j*9 != i) idx[i][t++] = i%9 + j*9; for(j=0;j<81;j++) if(j/27 == i/27 && i%9/3 == j%9/3 && i!=j) idx[i][t++] = j; } while(scanf("%s ",ll)>0) { memset(m, 0, sizeof(m)); for(i=0;i<81;i++) b[i] = 1 << (ll[i]-'1'); for(i=0;i<81;i++) { int j,k,l = 99; for(k=0;k<81;k++) if (m[k] <= l) l = m[k], j = k; m[j] = 24; t = b[j]; b[j] = 0; s = 0; solve(0); if (s > 1) b[j] = t; else for(k=0;k<24;k++) m[idx[j][k]]++; } pri2(); } return 0; } 
\$\endgroup\$
2
\$\begingroup\$

C++ / Tdoku - 2,081,604 clues

Below are results from running random minimization using tdoku on each puzzle N times and taking the result with the least clues. Results are shown for various N as powers of 2. No heuristics are involved at all. Each minimization just takes a random permutation of range(81) and then tries to remove clues in that order, putting a clue back whenever its removal results in multiple solutions.

Of note is the fact that the curve isn't flattening quickly and lower numbers can clearly be achieved without being any smarter just by running for larger N.

# (1) fetch tdoku git clone https://github.com/t-dillon/tdoku.git cd tdoku # (2) pick your compiler (recent clang is best for tdoku): export CC=clang-8 export CXX=clang++-8 # (3) place ppcg_sudoku_testing.txt, minimize.cc (below), # and run.sh (below) in the current directory # (4) build the tdoku solver library ./BUILD.sh # (5) build minimize.cc $CXX -std=c++11 -O3 -march=native -o minimize minimize.cc build/libtdoku.a # (6) run with concurrency appropriate for your system. # results below were run on a 32core/64thread TR-2990WX chmod a+x run.sh ./run.sh 64 1: 2438970 0 sec 2: 2377595 1 sec 4: 2329366 2 sec 8: 2288410 3 sec 16: 2254275 7 sec 32: 2223342 14 sec 64: 2195642 27 sec 128: 2174050 55 sec 256: 2153351 109 sec 512: 2127972 218 sec 1024: 2105589 436 sec 2048: 2091738 871 sec 4096: 2081604 1740 sec 

minimize.cc:

#include "include/tdoku.h" #include <cstdio> #include <cstring> #include <fstream> int NumClues(const char *puzzle) { int num_clues = 0; for (int i = 0; i < 81; i++) { if (puzzle[i] != '.') num_clues++; } return num_clues; } int main(int argc, char **argv) { int limit = std::stoi(argv[1]); std::ifstream file; file.open(argv[2]); std::string line; while (getline(file, line)) { char puzzle[81], min_puzzle[81]; int min_clues = 81; for (int j = 0; j < limit; j++) { memcpy(puzzle, line.c_str(), 81); TdokuMinimize(false, puzzle); int clues = NumClues(puzzle); if (clues <= min_clues) { min_clues = clues; memcpy(min_puzzle, puzzle, 81); } } printf("%.81s %d\n", min_puzzle, min_clues); } } 

run.sh

#!/bin/bash concurrency=$1 killall minimize 2>/dev/null rm -f ppcg_?? ppcg_??.done split -n l/$concurrency ppcg_sudoku_testing.txt ppcg_ for p in {0..20}; do t1=$(date +%s) lim=$((2**p)) for f in ppcg_??; do ./minimize $lim $f > $f.done & done wait t2=$(date +%s) cat *.done | awk -F' ' -v t=$((t2 - t1)) -v n=$lim '{ x += $2 } END { printf("%5d: %10d %d sec\n",n,x,t); }' done 
\$\endgroup\$
2
\$\begingroup\$

Python — 7,200,000 clues

As usual, here is a last-place reference solution:

def f(x): return x[:72] + "." * 9 

Removing the bottom row of numbers provably leaves the puzzle solvable in all cases, as each column still has 8 of 9 numbers filled, and each number in the bottom row is simply the ninth number left over in the column.

If any serious contender manages to legally score worse than this one, I will be astonished.

\$\endgroup\$
4
  • \$\begingroup\$ I mean, you could remove just the last one. \$\endgroup\$ Commented Apr 7, 2015 at 6:55
  • \$\begingroup\$ You could also just leave the whole thing solved, but neither of those would be a serious contender. \$\endgroup\$ Commented Apr 7, 2015 at 7:11
  • \$\begingroup\$ So why is this a serious contender? \$\endgroup\$ Commented Apr 7, 2015 at 22:34
  • \$\begingroup\$ It's not. That's why I said I'd be astonished if any serious contender managed to score worse than this non-serious contender. \$\endgroup\$ Commented Apr 7, 2015 at 22:41
1
\$\begingroup\$

Python 2 - 6,000,000 clues

A simple solution that uses the 3 common methods of solving these puzzles:

def f(x): return ''.join('.' if i<9 or i%9==0 or (i+23)%27 in (0,3) else c for i,c in enumerate(x)) 

This function produces clue formats like this:

......... .dddddddd .dddddddd .ddd.dd.d .dddddddd .dddddddd .ddd.dd.d .dddddddd .dddddddd 

This can always be solved. The 4 3x3 parts are solved first, then the 8 columns, then the 9 rows.

\$\endgroup\$
1
\$\begingroup\$

PHP — 2,580,210 clues

This first removes the last row and column, and bottom right corner of every box. It then tries to clear out each cell, running the board through a simple solver after each change to ensure the board is still unambiguously solvable.

Much of the code below was modified from one of my old answers. printBoard uses 0s for empty cells.

<?php // checks each row/col/block and removes impossible candidates function reduce($cand){ do{ $old = $cand; for($r = 0; $r < 9; ++$r){ for($c = 0; $c < 9; ++$c){ if(count($cand[$r][$c]) == 1){ // if filled in // remove values from row and col and block $remove = $cand[$r][$c]; for($i = 0; $i < 9; ++$i){ $cand[$r][$i] = array_diff($cand[$r][$i],$remove); $cand[$i][$c] = array_diff($cand[$i][$c],$remove); $br = floor($r/3)*3+$i/3; $bc = floor($c/3)*3+$i%3; $cand[$br][$bc] = array_diff($cand[$br][$bc],$remove); } $cand[$r][$c] = $remove; } }} }while($old != $cand); return $cand; } // checks candidate list for completion function done($cand){ for($r = 0; $r < 9; ++$r){ for($c = 0; $c < 9; ++$c){ if(count($cand[$r][$c]) != 1) return false; }} return true; } // board format: [[1,2,0,3,..],[..],..], $b[$row][$col] function solve($board){ $cand = [[],[],[],[],[],[],[],[],[]]; for($r = 0; $r < 9; ++$r){ for($c = 0; $c < 9; ++$c){ if($board[$r][$c]){ // if filled in $cand[$r][$c] = [$board[$r][$c]]; }else{ $cand[$r][$c] = range(1, 9); } }} $cand = reduce($cand); if(done($cand)) // goto not really necessary goto end; // but it feels good to use it else return false; end: // back to board format $b = []; for($r = 0; $r < 9; ++$r){ $b[$r] = []; for($c = 0; $c < 9; ++$c){ if(count($cand[$r][$c]) == 1) $b[$r][$c] = array_pop($cand[$r][$c]); else $b[$r][$c] = 0; } } return $b; } function add_zeros($board, $ind){ for($r = 0; $r < 9; ++$r){ for($c = 0; $c < 9; ++$c){ $R = ($r + (int)($ind/9)) % 9; $C = ($c + (int)($ind%9)) % 9; if($board[$R][$C]){ $tmp = $board[$R][$C]; $board[$R][$C] = 0; if(!solve($board)) $board[$R][$C] = $tmp; } }} return $board; } function generate($board, $ind){ // remove last row+col $board[8] = [0,0,0,0,0,0,0,0,0]; foreach($board as &$j) $j[8] = 0; // remove bottom corner of each box $board[2][2] = $board[2][5] = $board[5][2] = $board[5][5] = 0; $board = add_zeros($board, $ind); return $board; } function countClues($board){ $str = implode(array_map('implode', $board)); return 81 - substr_count($str, '0'); } function generateBoard($board){ return generate($board, 0); } function printBoard($board){ for($i = 0; $i < 9; ++$i){ echo implode(' ', $board[$i]) . PHP_EOL; } flush(); } function readBoard($str){ $tmp = str_split($str, 9); $board = []; for($i = 0; $i < 9; ++$i) $board[] = str_split($tmp[$i], 1); return $board; } // testing $n = 0; $f = fopen('ppcg_sudoku_testing.txt', 'r'); while(($l = fgets($f)) !== false){ $board = readBoard(trim($l)); $n += countClues(generateBoard($board)); } echo $n; 
\$\endgroup\$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.