Skip to main content
Fixed with one JRCXZ
Source Link
Peter Cordes
  • 5.1k
  • 1
  • 24
  • 35

x86 machine code (unfinished, fails for length 1), 1113 bytes.

(Or 11 bytes without handling single-character strings that are trivially stringy.)

Uses the bit-position check from @attinat's C answer

Same machine code works in 16, 32, and 64-bit modes. The source is NASM for 64-bit mode.

objdumpnasm -drwCfelf64 -Mintell/dev/stdout disassembly: 0000000000001450 <string_connected>:listing   1450: 17 addr ac global string_connected  18 lods al,BYTE PTR ds:[rsi]  1451:code c0 e8 05 string_connected:  19 shrbytes al,0x5  1454: ;;; input: char *RSI, transitions to 32check=RCX  06 20 xor al,BYTE PTR [rsi]  ;;; 1456output: AL=non-zero => connected. AL=zero disconnected  24 01 21 and al,0x1  1458.loop: e0 f6  loopne; 1450do <string_connected>{ 145a:22 00000000 AC c3 lodsb ret  

The loop condition is effectively }while(--rcx && al&1);

callable from C as unsigned char string_connected(int dummy_rdi, const char *p, int dummy_rdx, size_t len); with the x86-64 System V calling convention.

RCX = len = strlen(p) - 1. i.e. the number of character-boundaries to check in the explicit-length string.


The bit-position logic is based on the observation that bit 0 of the ASCII code tells you the starting height, and bit 5 tells you the ending height.

;;; _ 1011111 ;;; \; 1011100 ;;; /al = 101111*p++ ;;; ~ 1111110 ;;; 23 00000001 ^E309 ^  ; end condition (c>>5) & 1 => jrcxz 0 =.early_exit low  ; start cond: c&1; =>transitions=0 0special =case. high  Checking before the ;loop (prev>>5)&1would ==require curr&1extra code meansto weset haveAL.  a discontinuity  24 00000003 ;C0E805 ((prev>>5) ^ curr) & 1 == 0 means we have a discontinuity 

test harness (modified from attinat's TIO link, beware the C sequence-point UB in the reference function):

Try it online!

NASM source:

globalshr string_connected string_connected: ;;; input: char *RSIal, length RCX5 ;;; output: AL=non-zero => connected.  AL=zero25 disconnected .loop:00000006 3206 xor al, [rsi] ; do {  lodsb ; compare with next char  26 00000008 2401 ; al = *p++  shrand al, 51  xor al, [rsi]  ;27 compare0000000A withE0F4 next char  and al, 1  loopne .loop ; }while(--rcx && al&1); ret 

test cases:

 reference asm 28 V V.early_exit:   __/~~\/\_/~ -> 1 1 29 0000000C C3 ret 

Callable from C as unsigned char string_connected(int dummy_rdi, const char *s, int dummy_rdx, size_t transitions); with the x86-64 System V calling convention. Not bool because the transitions=0 case returns an ASCII code, not 1.

RCX = len = strlen(s) - 1. i.e. the number of character-boundaries = transitions to check in the explicit-length string.

For transitions > 0, returns 0 (mismatch) or 1 (connected) and leaves ZF set accordingly. For transitions == 0, returns the single byte of the string (which is non-zero and thus also truthy). If not for that special case, we could drop the early-exit JRCXZ. It's inside the loop only because AL is non-zero there.


The bit-position logic is based on the observation that bit 0 of the ASCII code tells you the starting height, and bit 5 tells you the ending height.

;;; _ -> 1 0 mismatch1011111  ;;; \ -> 1 0 mismatch1011100  ;;; / -> 1 0  mismatch101111  ;;; ~ -> 1 0  mismatch1111110  ;;; ^ ___ -> 1 1^  \/ -> 1 1   ; end /\/condition ->(c>>5) 1& 1  => /\/\ ->0 1= 1low   ; start cond: c&1 ~~~=> ->0 1= 1high   ; (prev>>5)&1 == curr&1 means we have a discontinuity  ; ((prev>>5) ^ ...curr) & 1 == all0 other testcasesmeans matchwe thehave referencea implementationdiscontinuity 

Test harness (modified from attinat's TIO link, beware the C sequence-point UB in that C reference function). Try it online!. This function is correct for all 30 cases. (Including the single-character cases where the return value doesn't match: both are truthy with different non-zero values in that case.)

x86 machine code (unfinished, fails for length 1), 11 bytes

Uses the bit-position check from @attinat's C answer

objdump -drwC -Mintel disassembly: 0000000000001450 <string_connected>:   1450: ac lods al,BYTE PTR ds:[rsi]  1451: c0 e8 05 shr al,0x5  1454: 32 06 xor al,BYTE PTR [rsi]  1456: 24 01 and al,0x1  1458: e0 f6  loopne 1450 <string_connected> 145a: c3 ret  

The loop condition is effectively }while(--rcx && al&1);

callable from C as unsigned char string_connected(int dummy_rdi, const char *p, int dummy_rdx, size_t len); with the x86-64 System V calling convention.

RCX = len = strlen(p) - 1. i.e. the number of character-boundaries to check in the explicit-length string.


The bit-position logic is based on the observation that bit 0 of the ASCII code tells you the starting height, and bit 5 tells you the ending height.

;;; _ 1011111 ;;; \ 1011100 ;;; / 101111 ;;; ~ 1111110 ;;; ^ ^  ; end condition (c>>5) & 1 => 0 = low  ; start cond: c&1 => 0 = high  ; (prev>>5)&1 == curr&1 means we have a discontinuity  ; ((prev>>5) ^ curr) & 1 == 0 means we have a discontinuity 

test harness (modified from attinat's TIO link, beware the C sequence-point UB in the reference function):

Try it online!

NASM source:

global string_connected string_connected: ;;; input: char *RSI, length RCX ;;; output: AL=non-zero => connected.  AL=zero disconnected .loop: ; do {  lodsb ; al = *p++  shr al, 5  xor al, [rsi]  ; compare with next char  and al, 1  loopne .loop ; }while(--rcx && al&1); ret 

test cases:

 reference asm  V V   __/~~\/\_/~ -> 1 1  _ -> 1 0 mismatch   \ -> 1 0 mismatch   / -> 1 0  mismatch   ~ -> 1 0  mismatch   ___ -> 1 1  \/ -> 1 1   /\/ -> 1 1  /\/\ -> 1 1   ~~~ -> 1 1   ... all other testcases match the reference implementation 

x86 machine code, 13 bytes.

(Or 11 bytes without handling single-character strings that are trivially stringy.)

Uses the bit-position check from @attinat's C answer

Same machine code works in 16, 32, and 64-bit modes. The source is NASM for 64-bit mode.

nasm -felf64 -l/dev/stdout listing 17 addr global string_connected  18 code string_connected:  19 bytes ;;; input: char *RSI, transitions to check=RCX  20 ;;; output: AL=non-zero => connected. AL=zero disconnected  21 .loop: ; do { 22 00000000 AC lodsb ; al = *p++ 23 00000001 E309 jrcxz .early_exit ; transitions=0 special case. Checking before the loop would require extra code to set AL.  24 00000003 C0E805 shr al, 5 25 00000006 3206 xor al, [rsi] ; compare with next char  26 00000008 2401 and al, 1 27 0000000A E0F4 loopne .loop ; }while(--rcx && al&1); 28 .early_exit: 29 0000000C C3 ret 

Callable from C as unsigned char string_connected(int dummy_rdi, const char *s, int dummy_rdx, size_t transitions); with the x86-64 System V calling convention. Not bool because the transitions=0 case returns an ASCII code, not 1.

RCX = len = strlen(s) - 1. i.e. the number of character-boundaries = transitions to check in the explicit-length string.

For transitions > 0, returns 0 (mismatch) or 1 (connected) and leaves ZF set accordingly. For transitions == 0, returns the single byte of the string (which is non-zero and thus also truthy). If not for that special case, we could drop the early-exit JRCXZ. It's inside the loop only because AL is non-zero there.


The bit-position logic is based on the observation that bit 0 of the ASCII code tells you the starting height, and bit 5 tells you the ending height.

;;; _ 1011111 ;;; \ 1011100 ;;; / 101111 ;;; ~ 1111110 ;;; ^ ^ ; end condition (c>>5) & 1 => 0 = low ; start cond: c&1 => 0 = high ; (prev>>5)&1 == curr&1 means we have a discontinuity  ; ((prev>>5) ^ curr) & 1 == 0 means we have a discontinuity 

Test harness (modified from attinat's TIO link, beware the C sequence-point UB in that C reference function). Try it online!. This function is correct for all 30 cases. (Including the single-character cases where the return value doesn't match: both are truthy with different non-zero values in that case.)

Source Link
Peter Cordes
  • 5.1k
  • 1
  • 24
  • 35

x86 machine code (unfinished, fails for length 1), 11 bytes

Uses the bit-position check from @attinat's C answer

objdump -drwC -Mintel disassembly: 0000000000001450 <string_connected>: 1450: ac lods al,BYTE PTR ds:[rsi] 1451: c0 e8 05 shr al,0x5 1454: 32 06 xor al,BYTE PTR [rsi] 1456: 24 01 and al,0x1 1458: e0 f6 loopne 1450 <string_connected> 145a: c3 ret 

The loop condition is effectively }while(--rcx && al&1);

callable from C as unsigned char string_connected(int dummy_rdi, const char *p, int dummy_rdx, size_t len); with the x86-64 System V calling convention.

RCX = len = strlen(p) - 1. i.e. the number of character-boundaries to check in the explicit-length string.


The bit-position logic is based on the observation that bit 0 of the ASCII code tells you the starting height, and bit 5 tells you the ending height.

;;; _ 1011111 ;;; \ 1011100 ;;; / 101111 ;;; ~ 1111110 ;;; ^ ^ ; end condition (c>>5) & 1 => 0 = low ; start cond: c&1 => 0 = high ; (prev>>5)&1 == curr&1 means we have a discontinuity ; ((prev>>5) ^ curr) & 1 == 0 means we have a discontinuity 

test harness (modified from attinat's TIO link, beware the C sequence-point UB in the reference function):

Try it online!

NASM source:

global string_connected string_connected: ;;; input: char *RSI, length RCX ;;; output: AL=non-zero => connected. AL=zero disconnected .loop: ; do { lodsb ; al = *p++ shr al, 5 xor al, [rsi] ; compare with next char and al, 1 loopne .loop ; }while(--rcx && al&1); ret 

test cases:

 reference asm V V __/~~\/\_/~ -> 1 1 _ -> 1 0 mismatch \ -> 1 0 mismatch / -> 1 0 mismatch ~ -> 1 0 mismatch ___ -> 1 1 \/ -> 1 1 /\/ -> 1 1 /\/\ -> 1 1 ~~~ -> 1 1 ... all other testcases match the reference implementation