Enumerate valid Brainf**k programs

Question

Golunar/Unary is a way to encode all valid Brainfuck programs, but it is not an enumeration, since most natural numbers do not correspond to a valid program.

For the purpose of this challenge, assume a doubly infinite tape and no comments, i.e., a Brainfuck program is valid if and only if it consists solely of the characters <>+-.,[] and all left and right brackets match.

For example, the empty program, ,[+][-]., [>+<[--].] and +[+[+][+[+]+]+]+. are valid Brainfuck programs, while ][, and a[] are not.

Task

Write a program or function that accepts a valid Brainfuck program as input and returns a natural number (1, 2, 3, …), with the following constraints:

The generated output must be different for all valid Brainfuck programs.
For every natural number n, there must be a valid Brainfuck program that, when provided as input, generates the output n.

Additional rules

Given a Brainfuck program of 100 or less bytes, your program or function must finish within one minute.

This means that you cannot iterate over all valid Brainfuck programs until you match the input.
Standard code-golf rules apply.

I was thinking to just encode it as octal but the matching brackets are making this tricky. — DankMemes
– DankMemes, Commented Aug 27, 2015 at 0:42
Is the empty program a valid Brainfuck program? Must it be mapped to a natural integer as well? — orlp
– orlp, Commented Aug 27, 2015 at 11:16
Why the close vote? This is a fascinating question and the problem happens to have the size and variety for a great golf. — trichoplax is on Codidact now
– trichoplax is on Codidact now, Commented Aug 27, 2015 at 11:17

TheNumberOne · Accepted Answer · 2015-08-28 02:22:49Z

Python 3, 443 158 155 154 134 131 128 124 117 116 115 bytes

c=d=C=D=0 for e in input():v='[<>,.-+]'.find(e);d=d*8+v;c+=c<0<6<v;c-=d>1>v;C,D=(c,C+1,d,D)[v>6::2] print(-~D*8**C)

Several bytes thanks to Sp3000 and Mitch Schwartz :D

How this works:

This maps all valid BF programs into all possible, valid or invalid, BF programs, that don't start with a [, in a one-to-one ratio. After that, the new program is simply converted into octal.

Here is the mapping formula:

Separate a BF program into 3 parts. The first part is the largest prefix consisting of only [ characters. The third part is the largest postfix consisting of only ] characters. The second part is the middle.
Dispose of the first part. These can be recomputed later.
Remove all ] brackets in the third part that match [ brackets in the second part. These can also be recomputed later.
Concatenate the second and third parts together.

If you don't understand this explanation, you can find an extended explanation in chat starting here.

For reference, here are the first 20 programs:

1 : 2 : < 3 : > 4 : , 5 : . 6 : - 7 : + 8 : [] 9 : <[] 10 : << 11 : <> 12 : <, 13 : <. 14 : <- 15 : <+ 16 : [<] 17 : >[] 18 : >< 19 : >> 20 : >,

Here are the first 1000 programs: http://pastebin.com/qykBWhmD
Here is the program I used to generate them: http://ideone.com/e8oTVl

Here is Hello, World!:

>>> ++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++. 457711481836430915510337664562435564418569135809989841510260388418118348571803953323858180392373

@BetaDecay Yes: codegolf.stackexchange.com/questions/55363/… — TheNumberOne
– TheNumberOne, Commented Aug 27, 2015 at 13:04

Sp3000 · Accepted Answer · 2015-08-27 13:50:12Z

Python 2, 157 bytes

def f(s,o=0,d=0,D={}):T=s,o,d;x=D[T]=D[T]if T in D else~o and 0**o+sum(f(s[1:],cmp(c,"[")%-3-~o,d or cmp(c,s[0]))for c in"+,-.<>[]")if s else~d<0==o;return+x

Still looks pretty golfable, but I'm posting this for now. It uses recursion with a bit of caching. Annoyingly, D.get doesn't short circuit for the caching, so I can't save 9 bytes that way...

The mapping prioritises length first, then lexicographical order over the ordering "][><.-,+" (see output examples below). The main idea is to compare prefixes.

The variable o keeps track of the number of [ brackets still open for the current prefix, while the variable d takes one of three values indicating:

d = 1: The current prefix is lexicographically earlier than s. Add all programs with this prefix and length <= s,
d = -1: The current prefix is lexicographically later than s. Add all programs with this prefix and length < s.
d = 0: The current prefix is a prefix of s, so we might change d to 1 or -1 later.

For example, if we have s = "[-]" and our current prefix is p = "+", since p is later than s lexicographically we know only to add the programs starting with p which are strictly shorter than s.

To give a more detailed example, suppose we have an input program s = "-[]". The first recursive expansion does this:

 (o == 0) # Adds a program shorter than s if it's valid # For the first expansion, this is 1 for the empty program + f(s[1:], o=-1, d=1) # ']', o goes down by one due to closing bracket + f(s[1:], o=1, d=1) # '[', o goes up by one due to opening bracket + f(s[1:], o=0, d=1) # '>' + f(s[1:], o=0, d=1) # '<' + f(s[1:], o=0, d=1) # '.', d is set to 1 for this and the previous branches # since they are lexicographically earlier than s's first char + f(s[1:], o=0, d=0) # '-', d is still 0 since this is equal to s's first char + f(s[1:], o=0, d=-1) # ',', d is set to -1 for this and the later branches # since they are lexicographically later than s's first char + f(s[1:], o=0, d=-1) # '+'

Note how we don't actually use the prefixes in the recursion - all we care about them is captured through the variables d, o and the shrinking input program s. You'll notice a lot of repetition above - this is where caching comes in, allowing us to process 100-char programs well within the time limit.

When s is empty, we look at (d>=0 and o==0), which decides whether to return 1 (count this program because it's lexicographically early/equal and the program is valid), or 0 (don't count this program).

Any situtation with o < 0 immediately returns 0, since any programs with this prefix have more ]s than [, and are thus invalid.

The first 20 outputs are:

 1 > 2 < 3 . 4 - 5 , 6 + 7 [] 8 >> 9 >< 10 >. 11 >- 12 >, 13 >+ 14 <> 15 << 16 <. 17 <- 18 <, 19 <+ 20

Using the same Hello World example as @TheNumberOne's answer:

>>> f("++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++.") 3465145076881283052460228065290888888678172704871007535700516169748342312215139431629577335423L

Community · Accepted Answer · 2017-04-13 12:39:07Z

This answer is a formal proof for the answer by TheNumberOne, Enumerate valid Brainf**k programs, where it can be a bit hard to understand the fine points why the enumeration is correct. It's nontrivial to understand why there isn't some invalid program that maps to a number not covered by a valid program.

Throughout this answer capitals are used to denote programs, and lowercase variables are used for functions and integers. ~ is the concatenation operator.

Proposition 1:

Let the function f be the program described in that answer. Then for every program U there exists a valid program V such that f(U) = f(V)

Definition 1:

Let g(X) be the number of [ that appear in the program X, and let h(X) be the number of ] that appear.

Definition 2:

Define P(x) to be this function:

P(x) = "" (the empty program) when x <= 0 P(x) = "]" when x = 1 P(x) = "]]" when x = 2 etcetera

Definition 3:

Given a program X, denote X1 to be it's largest prefix of [ characters, X2 its center, and X3 its largest suffix of ] characters.

Proof of proposition 1:

If g(U) = h(U) then U is a valid program, and we can take V=U. (trivial case).

If g(U) < h(U) then we can create V by prepending n = h(U) - g(U) [ symbols. Obviously f(V) = f(U) as all [ symbols in the prefix are removed.

Now consider g(U) > h(U). Define T = U2 ~ U3. if g(T) <= h(T), then we can construct V by removing n = g(U) - h(U) [ symbols.

So we can assume that h(T) < g(T). Construct V = T ~ P(g(T) - h(T)).

We need three small facts to proceed:

Claim 1: g(U2) = g(T)

U3 does not contain any [ symbols by its definition. As T = U2 ~ U3, its [ symbols are all in the first part.

Claim 2: h(U3) < g(T)

This follows from noting that h(T) < g(T) and h(U3) < h(U3 ~ U2) = h(T).

Claim 3: h(V3) = g(U2) - h(U2)

h(V3) = h(U3) + g(T) - h(T) using the construction of V h(V3) = h(U3) + g(U2) + g(U3) - h(U2) - h(U3) apply the definition of T h(V3) = g(U2) - h(U2) *one term cancels, g(U3) is always zero, as U3 contains only `]` symbols*

Now we show that f(V) = f(U).

f(U) = U2 ~ P(h(U3) - g(U2)) = U2 claim 2, definition of P f(V) = U2 ~ P(h(V3) - g(V2)) = U2 ~ P(h(V3) - g(U2)) = U2 ~ P(g(U2) - h(U2) - g(U2)) claim 3 = U2 ~ P(-h(U2)) = U2 definition P

This completes the proof. Q.E.D.

Let's do uniqueness as well.

Proposition 2:

Let U, V be two diffrent, valid programs. Then f(U) != f(V)

This is fairly straightforward compared to the previous proposition.

Let's assume that U2 = V2. But then the only way U and V can be diffrent is by adding or removing n [ and ] symbols to U1 and U3 respectively. Yet this changes the output of f, since f will count the number of unmatched ] symbols in the suffix.

Thus U2 != V2.

Obviously, this leads to a contradiction. As U2 and V2 are literally contained in the output of f(U) and f(V) respectively, they cannot differ, except at the 'edge', the place where U2 is concatenated with U3. But the first and last symbols of U2 and V2 cannot be [ or ] by definition, while those are the only symbols allowed in U1, U3, V1, V3 respectively and respectively again. Thus we get U2 = V2. Q.E.D.

Stack Exchange Network

Enumerate valid Brainf**k programs

Task

Additional rules

3 Answers 3

Python 3, 443 158 155 154 134 131 128 124 117 116 115 bytes

How this works:

Python 2, 157 bytes

Linked

Hot Network Questions

Enumerate valid Brainf**k programs

Task

Additional rules

3 Answers 3

Python 3, 443 158 155 154 134 131 128 124 117 116 115 bytes

How this works:

Python 2, 157 bytes

Linked

Related

Hot Network Questions