13
\$\begingroup\$

You are sick of all of the codegolf challenges. Hence you decide to write a program that will automatically golf some Python code for you. There are 3 test cases:

print quickSort([0,7,3,-1,8,10,57,2]) def quickSort(arr): less = [] pivotList = [] more = [] if len(arr) <= 1: return arr else: pivot = arr[0] for i in arr: if i < pivot: less.append(i) elif i > pivot: more.append(i) else: pivotList.append(i) less = quickSort(less) more = quickSort(more) return less + pivotList + more 

for i in xrange(1, 101): if i % 15 == 0: print "FizzBuzz" elif i % 3 == 0: print "Fizz" elif i % 5 == 0: print "Buzz" else: print i 

from sys import argv def randomGenerator(seed=1): max_int32 = (1 << 31) - 1 seed = seed & max_int32 while True: seed = (seed * 214013 + 2531011) & max_int32 yield seed >> 16 def deal(seed): nc = 52 cards = range(nc - 1, -1, -1) rnd = randomGenerator(seed) for i, r in zip(range(nc), rnd): j = (nc - 1) - r % (nc - i) cards[i], cards[j] = cards[j], cards[i] return cards def show(cards): l = ["A23456789TJQK"[c / 4] + "CDHS"[c % 4] for c in cards] for i in range(0, len(cards), 8): print " ", " ".join(l[i : i+8]) if __name__ == '__main__': seed = int(argv[1]) if len(argv) == 2 else 11982 print "Hand", seed deck = deal(seed) show(deck) 

Rules:

  1. Your program must not target the code I posted specifically, and should work with any Python 2 code. I reserve the right to change the source code being codegolfed. You may assume that there are no multi-line strings (so you don't have build a full-blown parser), and that locals() is not called.

  2. The output of your program should run in an identical manner as the original source code. (Namely, it must produce the same output. Variable names and language constructs can be changed, as long as the output remains the same)

  3. You may use STDIO or a File to do your input/output of the source code.

Your score will be the sum of the bytes of your program's output.

(The code listed above has been taken from http://rosettacode.org/ under the GNU Free Documentation License 1.2)

\$\endgroup\$
17
  • 3
    \$\begingroup\$ This seems very similar to codegolf.stackexchange.com/questions/3652/write-a-code-golfer/… \$\endgroup\$ Commented Jan 7, 2015 at 16:34
  • 3
    \$\begingroup\$ Here's a bonus test case for people to try, to be devious. \$\endgroup\$ Commented Jan 7, 2015 at 16:51
  • 5
    \$\begingroup\$ What's your model for determining whether the output "[runs] in an identical manner as the original source code"? E.g. for the second example, I believe that removing if __name__ == '__main__': would affect the behaviour in some contexts but not others. For another example, if the ungolfed input assumes that it reads an int from stdin and throws one type of exception if given something else, could the golfed input throw a different type of exception if given a non-integer? \$\endgroup\$ Commented Jan 7, 2015 at 17:00
  • 2
    \$\begingroup\$ What about a program like this: random_long_variable=0;print locals()? \$\endgroup\$ Commented Jan 7, 2015 at 20:49
  • 2
    \$\begingroup\$ Multiline strings aren't the only way to make empty lines functional \$\endgroup\$ Commented Jan 8, 2015 at 1:15

3 Answers 3

4
\$\begingroup\$

Python 2.7, 794

I have been meaning to build a minifier for Python for a while, so this is a good opportunity to investigate the problem.

The program uses a mix of regular expression analysis, and Python parser operations. White space is minimised. Variable that are defined by the user are replaced by a single letter variable (which is not in use!). Finally the while True statement is put on a diet.

The three test cases all verify as running correctly. I could imagine some pathological examples which could result in errors in the generated code but the algorithm should be robust under most circumstances.

Results

228 t1.py 128 t2.py 438 t3.py 794 total 

Output

def c(a): b=[] d=[] f=[] if len(a)<=1: return a else: e=a[0] for i in a: if i<e: b.append(i) elif i>e: f.append(i) else: d.append(i) b=c(b) f=c(f) return b+d+f print c([0,7,3,-1,8,10,57,2]) for i in xrange(1,101): if i%15==0: print"FizzBuzz" elif i%3==0: print"Fizz" elif i%5==0: print"Buzz" else: print i from sys import argv def a(k=1): b=(1<<31)-1 k=k&b while 1: k=(k*214013+2531011)&b yield k>>16 def d(k): f=52 h=range(f-1,-1,-1) g=a(k) for i,r in zip(range(f),g): j=(f-1)-r%(f-i) h[i],h[j]=h[j],h[i] return h def m(h): l=["A23456789TJQK"[c/4]+"CDHS"[c%4]for c in h] for i in range(0,len(h),8): print" "," ".join(l[i:i+8]) if __name__=='__main__': k=int(argv[1])if len(argv)==2 else 11982 print"Hand",k e=d(k) m(e) 

Code

import sys import re from tokenize import generate_tokens from token import tok_name from keyword import iskeyword wr = sys.stdout.write def pyparse(text): 'Return [TYPE,TOKEN] pair list' # Use KEYWORD,NAME,NUMBER,OP,STRING,NL,NEWLINE,COMMENT,INDENT,DEDENT rawtokens = generate_tokens(text.readline) tokens = [[tok_name[n], t] for n,t,p1,p2,dx in rawtokens] for tpair in tokens: if tpair[0] == 'NAME' and iskeyword(tpair[1]): tpair[0] = 'KEYWORD' return tokens def finduservars(filename): 'Return a set of user variables that we can replace with a-zA-Z' varset = set() for line in open(filename): line = line.strip() match = re.match(r'def\s+(\w+)\s*\((.*)\)\s*:', line) if match: func, args = match.groups() varset.add(func) arglist = re.findall(r'(\w+|=)', args) for a in arglist: if a == '=': break # keyword args follow - too hard to parse varset.add(a) continue match = re.match(r'(\w+)\s*=.+', line) if match: assigned = match.group(1) varset.add(assigned) continue return set(v for v in list(varset) if len(v) > 1) filename = sys.argv[1] tokenlist = pyparse(open(filename)) # Build map for var->char conversion: varset = finduservars(filename) singles = [text for tok,text in tokenlist if tok=='NAME' and len(text)==1] allvar = [chr(n) for n in range(97,123)+range(65,91)] charvar = [c for c in allvar if c not in singles] varreplaced = list(varset)[:len(charvar)] varmap = dict((v, charvar.pop(0)) for v in varreplaced) prev = 'NONE' indent = [''] output = [] add = output.append for tok, text in tokenlist: if tok == 'NL': continue elif tok == 'INDENT': indent.append( text.replace(' ', ' ') ) output[-1] = indent[-1] elif tok == 'DEDENT': indent.pop(-1) output[-1] = indent[-1] elif tok == 'NEWLINE': add(text) add(indent[-1]) elif tok in 'KEYWORD,NAME,NUMBER': if prev in 'KEYWORD,NAME,NUMBER': add(' ') if tok == 'NAME': if output[-2] == 'while' and text == 'True': add('1') # common verbose idiom else: add(varmap.get(text, text)) else: add(text) else: add(text) prev = tok wr(''.join(output)) 
\$\endgroup\$
4
\$\begingroup\$

sed, 1074 (down from 1390)

Very mild, low-hanging-fruit type of answer, to get the ball rolling:

/^$/d # Remove empty lines /^[ <--TAB-->]*#/d # Remove whole-line comments s/ /<--TAB-->/g # Replace 4 spaces with tabs /^[^'"]*$/s/ *([|&:,<>=*/%+-]) */\1/g # Remove spaces before/after operators 

Replace <--TAB--> with real TAB characters

Obvious shortcoming:

  • Indents assumed to be exactly 4 spaces in input code.

Since we can assume no multi-line strings, then we only strip leading/trailing spaces from operators if there are no ' or " on the given line. This could be improved, but <mumbles something about sed regex always being greedy>.

Test as follows:

$ cat qs.py fizzbuzz.py cards.py | wc -c 1390 $ sed -rf pygolf.sed qs.py fizzbuzz.py cards.py | wc -c 1074 $ sed -rf pygolf.sed qs.py fizzbuzz.py cards.py | python [-1, 0, 2, 3, 7, 8, 10, 57] 1 2 Fizz ... 98 Fizz Buzz Hand 11982 AH AS 4H AC 2D 6S TS JS 3D 3H QS QC 8S 7H AD KS KD 6H 5S 4D 9H JH 9S 3C JC 5D 5C 8C 9D TD KH 7C 6C 2C TH QH 6D TC 4S 7S JD 7D 8H 9C 2H QD 4C 5H KC 8D 2S 3S $ 
\$\endgroup\$
2
  • \$\begingroup\$ You don't need to check for multi-line strings, but your last two definitely need to be updated. \$\endgroup\$ Commented Jan 7, 2015 at 20:41
  • \$\begingroup\$ @NathanMerrill yup. The operator leading/trailing space one is a bit better now, but the indent one will be much harder to generalize - and is where I get about 2/3 of the gain. \$\endgroup\$ Commented Jan 7, 2015 at 21:01
3
\$\begingroup\$

Python 3.4, 1134

This program should work alright for most programs. Strangely, Sp3000 test case is much easier to optimize for my program than your programs. Input is accepted through the file specified in the first argument. The actual file is modified.

import subprocess from sys import argv progamtext = open(argv[1]).read() if 'argv' in progamtext or 'input' in progamtext or 'open' in programtext:#Make sure the program always produces the same results. exit(0) program = subprocess.Popen(['C:\Python27\python', argv[1]], stdout=subprocess.PIPE, stderr=subprocess.PIPE) program.wait() erroroutput1 = str(program.stderr.read()) output1 = str(program.stdout.read()) program = subprocess.Popen(['C:\Python27\python', argv[1]], stdout=subprocess.PIPE, stderr=subprocess.PIPE) program.wait() erroroutput2 = str(program.stderr.read()) output2 = str(program.stdout.read()) if erroroutput1 != erroroutput2 or output1 != output2:#Make sure the program always produces the same results. exit(0) newprog = '' if erroroutput1: newprog += "import sys\n" + "sys.stderr.write("+ erroroutput1 + ')' if output1: newprog += "\n" if output1: newprog += 'print ' + output1 if len(newprog) > len(progamtext): exit(0) open(argv[1],mode='w').write(newprog) 

How it works:

First, this program checks to see if your program interacts with the user at all or uses random. If it does, the program is unmodified. Next, the program is ran. The program is then replaced with print "output". Finally, if the program is shorter than its output, it is unmodified.

Sp3000's program, optimized:

import sys sys.stderr.write(b'') print b'0.540377721372\r\n3\r\n1\r\n7\r\n99\r\nf\r\n[5, 5]\r\n53\r\n53\r\n53\r\n' 

Sp3000's super bonus program, optimized:

The optimized version is only off .001% of the time.

import sys sys.stderr.write(b'') print b'B\r\n' 
\$\endgroup\$
6
  • 1
    \$\begingroup\$ I'm sure there are other external effects than argv, input and random, which your code would break. ;) \$\endgroup\$ Commented Jan 8, 2015 at 0:41
  • 2
    \$\begingroup\$ Hah. Maybe I should have put in some non-determinism - print id(0) is a good one. \$\endgroup\$ Commented Jan 8, 2015 at 0:50
  • \$\begingroup\$ @Martin Fixed (mostly). :) \$\endgroup\$ Commented Jan 8, 2015 at 0:59
  • 3
    \$\begingroup\$ Here's a super bonus test case, just for you. \$\endgroup\$ Commented Jan 8, 2015 at 1:02
  • \$\begingroup\$ Heh, very creative. \$\endgroup\$ Commented Jan 8, 2015 at 1:06

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.