What general tips do you have for golfing in Python? I'm looking for ideas which can be applied to code-golf problems and which are also at least somewhat specific to Python (e.g. "remove comments" is not an answer).
Please post one tip per answer.
not in, use in with ^1 or 1-Often, it helps to replace a not in b with (a in b)^1. By themselves,it doesn't save any bytes, but it often means that you can remove whitespace:
For example, if a not in b: can just be if(a in b)^1: to save a byte.
1e_I'm not sure if this has been mentioned before (likely), so either I'm going crazy or I'm blind.
Long numbers like 10000 can be written as 1e4 for short. Do note that using 1e_ gives a decimal, not a integer, so 1e4 in fact gives 10000.0, so be wary when doing stuff like:
for x in range(1,1e4):print(x%5) which gives an error since when using range you have to use integers.
range requires ints and not floats, and it's not because of the modulo. \$\endgroup\$ Sometimes you can use Python's exec-statement combined with string repetition, to shorten loops. Unfortunately, you can't often use this, but when you can you can get rid of a lot of long loop constructs. Additionally, because exec is a statement you can't use it in lambdas, but eval() might work (but eval() is quite restricted in what you can do with it) although it's 2 characters longer.
Here is an example of this technique in use: GCJ 2010 1B-A Codegolf Python Solution
eval() I was most likely thinking of. You can't do eval("print 1") because print 1 is a statement. I'll update the post. \$\endgroup\$ eval("print(1)") since print() is now a function. \$\endgroup\$ exec("print(1)") since exec() is now a function. \$\endgroup\$ Run your code through an space-remover, like this one:
#Pygolfer a=raw_input() for i in [i for i in range(len(a)) if a[i]==" "]: try:b=a[:i]+a[i+1:];eval(b);a=b;print a except:pass (This just tries to remove the spaces one by one, and try if the code still works. Please still do check your code manually.)
Manual things to do: print'string' works.
[str(i)for i in(1,2,3,4)] works.
etc.
If you rely on data (mostly for kolmogorov-complexity problems), use the built-in zip encoding/decoding and store the data in a file (add +1 for the filename):
open('f','rb').read().decode('zip') If you have to store the data in the source code, then you need to encode the zip with base64 and do:
"base64literal".decode('base64').decode('zip') These don't necessarily save characters in all instances, though.
Instead of
isinstance(x,C) # 15 bytes there are several alternatives:
x.__class__==C # 14 bytes 'a'in dir(x) # 12 bytes, if the class has a distinguishing attribute 'a' type(x)==C # 10 bytes, doesn't work with old-style classes 'K'in`x` # 8 bytes, only in python 2, if no other classes contain 'K' # watch out for false positives from the hex address Some of them may save extra bytes depending on the context, because you can eliminate a space before or after the expression.
Thanks Sp3000 for contributing a couple of tips.
Suppose we want to write a recursive function that prepends to a sequence (e.g. list, tuple) each time. For example, the Python 3 program
def f(n,k,L=[]):n and[f(n-1,k,[b]+L)for b in range(k)]or print(L) works like itertools.product, taking n,k and printing all length n lists of numbers taken from range(k). (Example thanks to @xnor)
If we don't need L to be a list specifically, we can save on the optional empty list argument by making use of unpacking, like so:
def f(n,k,*T):n and[f(n-1,k,b,*T)for b in range(k)]or print(T) where T is now a tuple instead. In the general case, this saves 3 bytes!
In Python 3.5+, this also works if we're appending to the end of a sequence, i.e. we can change f(n-1,k,L+[b]) to f(n-1,k,*T,b). The latter is a syntax error in earlier versions of Python though.
In IDLE versions 3.1 to 3.3, the command input() reads an entire multiline string like "line1\nline2", rather than a single line at a time as per the spec. This was fixed in version 3.4.
Calling input() only once is very convenient for golfing. Whether one can take advantage of this is debatable, but I think it is an acceptable interpreter- or environment-specific behavior.
>>> a=5 >>> b=4 >>> a,b=b,a >>> a 4 >>> b 5 To assign to a tuple, don't use parentheses. For example, a=1,2,3 assigns a to the tuple (1, 2, 3). b=7, assigns b to the tuple (7,). This works in both Python 2 and Python 3.
Warning: Python is the language which worships readability above all else; so coding this way is a Mortal Sin.
This sort of thing comes up a lot; such as here where for a given digit in 0<=d<=9, we can get the 7-bit segment b value as a hex string from the list
b=['7e','30','6d','79','33','5b','5f','70','7f','7b'][d] If the length of such a list is more than just a few elements, you're usually better off at least using split because you can replace a bunch of "','"s with a single character " " as delimiter. E.g.:
b='7e 30 6d 79 33 5b 5f 70 7f 7b'.split()[d] This can be used for almost any list of strings (possibly at a small additional cost using a delimiter such as ",").
But if in addition, the strings we are selecting for all have the same length k (k==2 in our example), then with the magic of Python slicing, we can write the above as:
b='7e306d79335b5f707f7b'[2*d:][:2] which saves a lot of bytes because we don't need character delimiters at all. But in that case, usually even shorter would be:
b='7367355777e0d93bf0fb'[d::10] If you only need the first few values in the array
>>> a, b, *c = [1, 2, 3, 4, 5] >>> a 1 >>> b 2 >>> c [3, 4, 5] Same applies to when you need last few values
>>> *a, b, c = [1, 2, 3, 4, 5] >>> a [1, 2, 3] >>> b 4 >>> c 5 Or even with the first few and last few
>>> a, *b, c = [1, 2, 3, 4, 5] >>> a 1 >>> b [2, 3, 4] >>> c 5 a,*b,c=1,2 or a,*b,c="ab" \$\endgroup\$ When you want to use map with list then cast it to list, use * instead of list(...).
new_a=list(map(f,a)) new_a=[*map(f,a)] # -3 char Moreover, you also can convert some iterable things to list with saving 3 characters:
a=list("abc") a=[*"abc"] b=list(range(x,y)) b=[*range(x,y)] *a,=map(f,a) *b,=range(x,y) \$\endgroup\$ You can generate the nonempty contiguous substrings of a string s with a recursive function (Python 3 for [*s]).
40 bytes (TIO)
f=lambda s:[*s]and[s]+f(s[1:])+f(s[:-1]) This will repeat substrings multiple times even if each substring appears once. You can making this a set to remove duplicates. This won't run if s is a list instead of a string, but a tuple will work.
40 bytes (TIO)
f=lambda s:{*s}and{s}|f(s[1:])|f(s[:-1]) Or, you can do this to make a list where each substring appears one for each time it's present:
46 bytes (TIO)
f=lambda s,b=-1:[*s]and[s]+f(s[1:],0)+f(s[:b]) Thanks for loopy walt for the s[:b] optimization.
This can be used to sum over some expression g over all substrings, for instance to count nonempty substrings with some Boolean property g.
48 bytes (TIO)
f=lambda s,b=-1:s>""and g(s)+f(s[1:],0)+f(s[:b]) Python 2 and 3 differences
A recent challenge pushed me to search for differences in two major versions of Python. More precisely, code that returns different results in different versions. This might be helpful in other polyglot challenges.
'' == b'''' != b''round(1*0.5) = 1.0round(1*0.5) = 010/11 = 010/11 = 0.9090909090909091perm instead of factorialThe doc for perm is here:
math.perm = perm(n, k=None, /) Number of ways to choose k items from n items without repetition and with order. Evaluates to n! / (n - k)! when k <= n and evaluates to zero when k > n. If k is not specified or is None, then k defaults to n and the function returns n!. perm is shorter and does the same thing as factorial when there is only 1 argument.
Suppose you have x as a list and y as a value. You want to append y to x.
You can do this:
x+=y,#add y to the end of x, 5 bytes #instead of x+=[y]#add y to the end of x, 6 bytes If z is also a value and you want to append y and z to x, do this:
x+=y,z#add y to the end of x, 6 bytes #instead of x+=[y,z]#add y to the end of x, 8 bytes x+='c', or x+='cd' to append multiple \$\endgroup\$ Was somewhat mentioned but I want to expand:
[a,b],[c,d]=[[1,2],[3,4]] works as well as simple a,b=[1,2]. Another great thing is to use ternary operator (similiar to C-like ?:)
x if x<3 else y and no one mentioned map. Map will call first function given as first argument on each item from second argument. For example assume that a is a list of strings of integers (from user input for example):
sum(map(int,a)) will make sum of all integers.
x if cond else y == cond and x or y. \$\endgroup\$ You can generate pseudo random numbers using hash.
hash('V~')%10000
Will print 2014.
2014, but Python 3.4.0 returns more a more random number per session, like 6321, 3744, and 5566. \$\endgroup\$ When your program needs to return a value, you might be able to use a yield, saving one character:
def a(b):yield b However, to print it you'd need to do something like
for i in a(b):print i print next(i()) will work too. \$\endgroup\$ print[*a(b)] \$\endgroup\$ By default, submissions may be functions and functions may be anonymous. A lambda expression is often the shortest framework for input/output. Compare:
lambda s:s+s[::-1] def f(s):return s+s[::-1] s=input();print s+s[::-1] (These concatenate a string with its reverse.)
The big limitation is that the body of a lambda must be a single expression, and so cannot contain assignments. For built-ins, you can do assignments like e=enumerate outside the function body or as an optional argument.
This doesn't work for expressions in terms of the inputs. But, note that using a lambda might still be worth repeating a long expression.
lambda s:s.lower()+s.lower()[::-1] def f(s):t=s.lower();return t+t[::-1] The lambda is shorter even though we save a char in the named function by having it print rather than return. The break-even point for two uses is length 12.
However, if you have many assignments or complex structures like loops (that are hard to make recursive calls), you're probably be better off taking the hit and write a named function or program.
cmp in Python 2Say you want to output P if x>0, N if x<0, and Z if x==0.
"ZPN"[cmp(x,0)] This function was removed in Python 3.0.1, although it remained in Python 3.0 by mistake.
or in lambdasI'm surprised this isn't in here yet, but if you need a multi-statement lambda, or evaluates both of its operands, as opposed to and which doesn't evaluate the second one if the first one is not True. For instance, a contrived example, to print the characters in a string one by one with an interval:
list( map( (lambda i: sleep(.06) or print(i) or print(ord(i)) # all of these get executed ), "compiling... " ) ) In this case it isn't shorter, but I've found it to be, sometimes.
or, lambda, evaluation etc but didn't see that \$\endgroup\$ lambda i:[sleep(.06),print(i),print(ord(i))] \$\endgroup\$ Python tokens only need to separated by a space for
In all other cases, the space can be omitted (with a few exceptions). Here's a table.
L D S +----- L|s s n D|n - n S|n n n First token is row, second token is column L: Letter D: Digit S: Symbol s: space n: no space -: never happens (except multidigit numbers) Letter followed by letter: Space
not b for x in l: lambda x: def f(s): x in b"abc" Letter followed by digit: Space
x or 3 while 2<x: Letter followed by symbol: No space
c<d if~x: x and-y lambda(a,b): print"yes" return[x,y,z] Digit followed by letter: No space
x+1if x>=0else 2 0in l (Some versions of Python 2 will fail on a digit followed by else or or.)
Digit followed by digit: Never occurs
Consecutive digits make a multidigit number. I am not aware of any situation where two digits would be separated by a space.
Digit followed by symbol: No space
3<x 12+n l=0,1,2 A space is needed for 1 .__add__ and other built-ins of integers, since otherwise the 1. is parsed as a float.
Symbol followed by letter: No space
~m 2876<<x&1 "()"in s Symbol followed by digit: No space
-1 x!=2 Symbol followed by symbol: No space
x*(a+b)%~-y t**=.5 {1:2,3:4}.get() "% 10s"%"|" e is expected to be a float literal, so something like 1else wouldn't work for versions of python that support exponents in the literal. Similarly, as 0o is the prefix of an octal literal, o can follow any digit but 0. For the complete lexical rules, refer to docs.python.org/2/reference/lexical_analysis.html \$\endgroup\$ assert True == 1 assert False == 0 assert 2 * True == 2 assert 3 * False == 0 assert (2>1)+(1<2) == 2 If you have a statement like [a,a+x][c] (where c is some boolean expression), you can do a+x*c instead and save a few bytes. Doing arithmetic with booleans can save you lots of bytes!
If you're drawing, for colors, instead of typing:
'#000' for black you can just use 0 (no apostrophes)
'#fff' for white you can simply use ~0 (no apostrophes)
'#f00' for red you can just use 'red'
Example of white being used with ~0
from PIL.ImageDraw import* i=Image.new('RGB',(25,18),'#d72828') Draw(i).rectangle((1,1,23,16),'#0048e0',~0) i.show() 255 is even shorter than 'red'. Some more ideas: ~255 is '#0ff' (cyan). 1<<7 is #800000 (half-brightness red); similarly 1<<15 and 1<<23 are half-brightness green and blue. \$\endgroup\$ Empty : `a==[]` but just checking if it's non empty and swapping the if and the else can be shorter Non-Empty : `a` (assuming it is in a situation where it will be interpreted as a boolean) len(a)>i : `a>a[:i]` if the list is non-empty [] is falsy, if want to check if a list is not empty, you can simply do if a:. \$\endgroup\$ 1==len(a) also works for that. \$\endgroup\$ a<a[:2] is shorter. \$\endgroup\$ Large hard coded numbers can be represented in larger bases, but there is a trade off. Higher bases only become worthwhile after a certain cutoff.
The only three bases you're likely to need to worry about are 10, 16, and 36. These are the cutoffs:
1000000000000 (13 bytes) -> 0xe8d4a51000 (12 bytes) 0x10000000000000000000000000000000000000 (40 bytes) -> int("9gmd8o3gbbaz3m2ydgtgwn9qo6xog",36) (39 bytes) a%b==a if b has a constant signFor two expressions a and b, where each one results in an int (or long in Python 2) or float, you can replace these:
a%b==a
a==a%b with these, if b is positive:
0<=a<b
b>a>=0 or these, if b is negative:
b<a<=0
0>=a>b I'm presenting two expressions for each case because sometimes you may want to use one over the other to eliminate a space to separate expression b from an adjacent token. They both have the same precedence, so you're not usually going to need to surround the second expression with () if you don't need to do so to the first one.
This is useful if expression a is more than 1 byte long or b is negative, because it removes one occurrence of a from the expression. If \$a,b\$ are the lengths of expressions a and b respectively, and \$l\$ is the length of the original expression, the resulting expression will be \$l-a+1\$ bytes long. Note that this method is always going to be shorter than assigning expression a to a separate variable.
For example,
(a+b)%c==a+b can be replaced with
0<=a+b<c for a total saving of 4 bytes.
Let's define the operator \$x\mathbin\%y\$ for \$x,y\in\mathbb Q\$.
Every rational number \$a\$ can be represented as \$a=bq+r\$, where \$q\in\mathbb Z,0\le r<|b|\$. Therefore, we can define an operator \$a\mathbin\%b\$, where the result has the same sign as \$b\$:
$$a=bq+r,q\in\mathbb Z,0\le r<|b|\\a\mathbin\%b=\begin{cases}\begin{align}r\quad b>0\\-r\quad b<0\end{align}\end{cases}$$
This represents the % operator in Python, which calculates the remainder of the division of two numbers. a % b is the same as abs(a) % b, and the result has the same sign as the divisor, b. For the \$a\mathbin\%b\$ operator, this equality holds:
$$(a\pm b)\mathbin\%b=a\mathbin\%b$$
Proof:
$$a=bq+r\leftrightarrow a\pm b=bq+r\pm b=(bq\pm b)+r=b(q\pm1)+r$$
Moreover, for \$b>0\$, we have:
$$a\mathbin\%b=a\leftrightarrow r=a\leftrightarrow0\le a<b$$
Proof for \$r=a\leftarrow0\le a<b\$:
$$0\le a<b\leftrightarrow0\le bq+r<b\leftrightarrow bq=0\leftrightarrow a=r$$
Similarly, for \$b<0\$, we have \$b<a\le0\$.
Therefore, \$a\mathbin\%b=a\leftrightarrow\begin{cases}\begin{align}0\le a<b\quad b>0\\b<a\le0\quad b<0\end{align}\end{cases}\$, or, equivalently, \$(0\le a<b)\lor(b<a\le0)\$.
Functions are allowed to print as programs do. A recursive function that prints can be shorter than both a pure function and a pure program.
Compare these Python 2 submissions to make a list of iteratively floor-halving a number while it's positive, like 10 -> [10, 5, 2, 1].
# 30 bytes: Program n=input() while n:print n;n/=2 # 29 bytes: Function f=lambda n:n*[0]and[n]+f(n/2) # 27 bytes: Function that prints def g(n):1/n;print n;g(n/2) The function that prints uses 1/n to terminate with error on hitting n=0 after having printing the desired numbers. This saves characters over the program's while and the pure function's base case, giving it the edge in byte count. Often, the termination can be shorter as part of the expression to print or the recursive call. It might even happen on its own for free, like terminating on an empty string when the first character is read.
The key property of our function here is that we're repeatedly applying an operation and listing the results at each step, in order. Additional variables can still be used this way by having them as optional inputs to the function that are passed in the recursive call. Moreover, because we're def'ing a function rather than writing a lambda, we can put statements such as variable assignments in its body.
:=operator in 3.8 \$\endgroup\$