Initial versionAfter a round of optimizations, and adapted for the modified rules:
extern int atoi(const char*); extern int puts(const char*); typedef unsigned long uint64_t;u64; static int W, H, W1;S; static u64 RM, DM, NSol; static char Out[64 * 2 + 1]; static void place(uint64_tu64 cMaskcM, u64 vM, int n) { if (n) { intu64 posm = 1ul << __builtin_ctzl(~cMaskcM); uint64_t maskcM -= 1ulm; << pos; if (m cMask& |=RM) mask; { int y u64 nM = posm /<< W;1; int x =if pos(cM -& ynM) *place(cM W; - nM, vM, n uint64_t- neighMask1); = mask << 1;} if ((x < W - 1) && !(cMaskm & neighMask)DM) { Out[W1 * y +u64 x]nM = '-'; Out[W1m *<< yW; + x + 1] = '-'; vM |= m; vM |= nM; place(cMaskcM |- neighMasknM, vM, n - 1); } } else if (S) { neighMask++NSol; = mask} <<else W;{ ifchar* ((yp <= HOut; - 1) && !for (cMaskint &y neighMask)= 0; y < H; ++y) { Out[W1 * yfor +(int x]x = '|'; 0; x < W; ++x) { Out[W1*p++ *= (yvM +& 1) +? x]'|' =: '|'; '-'; vM >>= 1; } place(cMask | neighMask, n -*p++ 1);= '\n'; } } else {*p++ = '\0'; puts(Out); ++NSol; } } int main(int argc, char* argv[]) { W = atoi(argv[1]); H = atoi(argv[2]); W1S = W(argc +> 1;3); int n = W * H; if (n & 1) { return 0; } n >>= 1; for (int y = 0; y < H; ++y) { Out[W1RM *<<= yW; +RM W]|= =(1ul '\n';<< (W - 1)) - 1; } Out[W1DM = (1ul << (W * H](H =- '\0';1))) - 1; place(-1, 0, n >> 1); printf("%lu\n", NSol); return 0; } This is not much optimized yet. But optimization ofI started bumping into the actual algorithm code is largely moot because mostlength limit of the time is used for output anyway1024 characters, and compilation time is already similar to execution time.so I had a version that only countedto reduce the solutions instead of printing themreadability somewhat. Much shorter variable names, and it took about 8 milliseconds for the 8x6 size on my modest laptopetc.
Build/test instructions:
> gcc -O0O2 Code.c Run with solution output enabled:
> ./a.out 8 68 >/dev/null My development was on a Mac using clang as the compiler. But I hopeRun with only solution count:
> ./a.out 8 8 s Some comments:
- With the larger test example, I do want optimization now. While my system is different (Mac), around
-O2seems to be good. - The code has gotten slower for the case where output is generated. This was a conscious sacrifice for optimizing the "count only" mode, and for reducing the code length.
- There will be a few compiler warnings because of missing includes and external declarations for system functions. It was the easiest way to get me to below 1024 characters in the end without making the code totally unreadable.
Also note that gcc will behave similarlythe code still generates the actual solutions, even in "count only" mode. The execution timeWhenever a solution is reduced when compiling with e.g. -O3found, but the increase in compile time more than compensated for it in my tests. Ideally it could be tried with -O0vM tobitmask contains a -O31 on the benchmark system to determine which results in the lowest value for the sum.
I hope you'll also test for correctness, because I don'tpositions that have much to validate it beyond thea vertical bar, and a 4x20 casefor positions with a horizontal bar. Only the conversion of this bitmask into ASCII format, and the actual output, is skipped.