How can I remove the ANSI escape sequences from a string in python

Question

Here is a snippet that includes my string.

'ls\r\n\x1b[00m\x1b[01;31mexamplefile.zip\x1b[00m\r\n\x1b[01;31m'

The string was returned from an SSH command that I executed. I can't use the string in its current state because it contains ANSI standardized escape sequences. How can I programmatically remove the escape sequences so that the only part of the string remaining is 'examplefile.zip'.

possible duplicate of Filtering out ANSI escape sequences

fuenfundachtzig
– fuenfundachtzig

2015-06-18 15:00:17 +00:00
Commented Jun 18, 2015 at 15:00 — fuenfundachtzig
– fuenfundachtzig, Commented Jun 18, 2015 at 15:00
sure - same question, a few months later.

Thomas Dickey
– Thomas Dickey

2023-03-26 23:43:42 +00:00
Commented Mar 26, 2023 at 23:43 — Thomas Dickey
– Thomas Dickey, Commented Mar 26, 2023 at 23:43

Martijn Pieters · Accepted Answer · 2025-04-05 17:50:21Z

Delete them with a regular expression:

import re # 7-bit C1 ANSI sequences ansi_escape = re.compile(r''' \x1B # ESC (?: # 7-bit C1 Fe (except CSI) [@-Z\\-_] | # or [ for CSI, followed by a control sequence \[ [0-?]* # Parameter bytes [ -/]* # Intermediate bytes [@-~] # Final byte ) ''', re.VERBOSE) result = ansi_escape.sub('', sometext)

or, without the VERBOSE flag, in condensed form:

ansi_escape = re.compile(r'\x1B(?:[@-Z\\-_]|\[[0-?]*[ -/]*[@-~])') result = ansi_escape.sub('', sometext)

Demo:

>>> import re >>> ansi_escape = re.compile(r'\x1B(?:[@-Z\\-_]|\[[0-?]*[ -/]*[@-~])') >>> sometext = 'ls\r\n\x1b[00m\x1b[01;31mexamplefile.zip\x1b[00m\r\n\x1b[01;31m' >>> ansi_escape.sub('', sometext) 'ls\r\nexamplefile.zip\r\n'

The above regular expression covers all 7-bit ANSI C1 escape sequences, but not the 8-bit C1 escape sequence openers. The latter are never used in today's UTF-8 world where the same range of bytes have a different meaning.

If you do need to cover the 8-bit codes too (and are then, presumably, working with bytes values) then the regular expression becomes a bytes pattern like this:

# 7-bit and 8-bit C1 ANSI sequences ansi_escape_8bit = re.compile(br''' (?: # either 7-bit C1, two bytes, ESC Fe (omitting CSI) \x1B [@-Z\\-_] | # or a single 8-bit byte Fe (omitting CSI) [\x80-\x9A\x9C-\x9F] | # or CSI + control codes (?: # 7-bit CSI, ESC [ \x1B\[ | # 8-bit CSI, 9B \x9B ) [0-?]* # Parameter bytes [ -/]* # Intermediate bytes [@-~] # Final byte ) ''', re.VERBOSE) result = ansi_escape_8bit.sub(b'', somebytesvalue)

which can be condensed down to

# 7-bit and 8-bit C1 ANSI sequences ansi_escape_8bit = re.compile( br'(?:\x1B[@-Z\\-_]|[\x80-\x9A\x9C-\x9F]|(?:\x1B\[|\x9B)[0-?]*[ -/]*[@-~])' ) result = ansi_escape_8bit.sub(b'', somebytesvalue)

For more information, see:

the ANSI escape codes overview on Wikipedia
ECMA-48 standard, 5th edition (especially sections 5.3 and 5.4)

The example you gave contains 4 CSI (Control Sequence Introducer) codes, as marked by the \x1B[ or ESC [ opening bytes, and each contains a SGR (Select Graphic Rendition) code, because they each end in m. The parameters (separated by ; semicolons) in between those tell your terminal what graphic rendition attributes to use. So for each \x1B[....m sequence, the 3 codes that are used are:

0 (or 00 in this example): reset, disable all attributes
1 (or 01 in the example): bold
31: red (foreground)

However, there is more to ANSI than just CSI SGR codes. With CSI alone you can also control the cursor, clear lines or the whole display, or scroll (provided the terminal supports this of course). And beyond CSI, there are codes to select alternative fonts (SS2 and SS3), to send 'private messages' (think passwords), to communicate with the terminal (DCS), the OS (OSC), or the application itself (APC, a way for applications to piggy-back custom control codes on to the communication stream), and further codes to help define strings (SOS, Start of String, ST String Terminator) or to reset everything back to a base state (RIS). The above regexes cover all of these.

Note that the above regex only removes the ANSI C1 codes, however, and not any additional data that those codes may be marking up (such as the strings sent between an OSC opener and the terminating ST code). Removing those would require additional work outside the scope of this answer.

JAYD3V · Accepted Answer · 2022-03-12 08:11:03Z

57

The accepted answer only takes into account ANSI Standardized escape sequences that are formatted to alter foreground colors & text style. Many sequences do not end in 'm', such as: cursor positioning, erasing, and scroll regions. The pattern bellow attempts to cover all cases beyond setting foreground color and text-style.

Below is the regular expression for ANSI standardized control sequences:

/(\x9B|\x1B\[)[0-?]*[ -\/]*[@-~]/

Additional References:

edited Mar 12, 2022 at 8:11

JAYD3V

12.6k8 gold badges67 silver badges108 bronze badges

answered Nov 25, 2015 at 20:02

Jeff

2,27327 silver badges19 bronze badges

10 Comments

Thomas Dickey Over a year ago

It misses OSC (both beginning and end).

Jeff Over a year ago

OSC is in ECMA-48 sec. 5.6 - what is the point of bring that up here?

Thomas Dickey Over a year ago

OSC is an "ANSI escape sequence", is frequently used, and would begin with a different pattern. Your answer is incomplete.

Hubro Over a year ago

This doesn't work for color codes produced by bluetoothctl, example: \x1b[0;94m. Making the expression case insensitive or replacing 1B with 1b in the pattern made no difference. I'm using Python and the line re.compile(r'/(\x9b|\x1b\[)[0-?]*[ -\/]*[@-~]/', re.I). Then I'm doing pattern.sub("", my_string) which doesn't accomplish anything. Am I doing something wrong?

Martijn Pieters Over a year ago

I see three issues with this answer: 1) /.../ is not Python syntax, but rather syntax you'd use in VI or Perl or awk. 2) the \x9B opener (for CSI codes) is incompatible with UTF-8 and so now rarely used, and ESC [ is preferred and 3) your pattern only covers CSI codes, not the whole range of ANSI escapes (which not only includes OSC, which Thomas Dickly mentions, but SS2, SS3, DCS, ST, OSC, SOS, PM, APC and RIS as well)!

|

Édouard Lopez · Accepted Answer · 2019-07-30 09:31:28Z

Function

Based on Martijn Pieters♦'s answer with Jeff's regexp.

def escape_ansi(line): ansi_escape = re.compile(r'(?:\x1B[@-_]|[\x80-\x9F])[0-?]*[ -/]*[@-~]') return ansi_escape.sub('', line)

Test

def test_remove_ansi_escape_sequence(self): line = '\t\u001b[0;35mBlabla\u001b[0m \u001b[0;36m172.18.0.2\u001b[0m' escaped_line = escape_ansi(line) self.assertEqual(escaped_line, '\tBlabla 172.18.0.2')

Testing

If you want to run it by yourself, use python3 (better unicode support, blablabla). Here is how the test file should be:

import unittest import re def escape_ansi(line): … class TestStringMethods(unittest.TestCase): def test_remove_ansi_escape_sequence(self): … if __name__ == '__main__': unittest.main()

Why have you left the / escaped in the second to last character set [ -\/]?
My regex has long since been expanded to cover all of ANSI C1 codes (7 bits) and I added a separate 8-bit variant as well today.
shouldn't the function be called remove_ansi instead of escape_ansi?

kfir · Accepted Answer · 2017-08-01 21:47:25Z

The suggested regex didn't do the trick for me so I created one of my own. The following is a python regex that I created based on the spec found here

ansi_regex = r'\x1b(' \ r'(\[\??\d+[hl])|' \ r'([=<>a-kzNM78])|' \ r'([\(\)][a-b0-2])|' \ r'(\[\d{0,2}[ma-dgkjqi])|' \ r'(\[\d+;\d+[hfy]?)|' \ r'(\[;?[hf])|' \ r'(#[3-68])|' \ r'([01356]n)|' \ r'(O[mlnp-z]?)|' \ r'(/Z)|' \ r'(\d+)|' \ r'(\[\?\d;\d0c)|' \ r'(\d;\dR))' ansi_escape = re.compile(ansi_regex, flags=re.IGNORECASE)

I tested my regex on the following snippet (basically a copy paste from the ascii-table.com page)

\x1b[20h Set \x1b[?1h Set \x1b[?3h Set \x1b[?4h Set \x1b[?5h Set \x1b[?6h Set \x1b[?7h Set \x1b[?8h Set \x1b[?9h Set \x1b[20l Set \x1b[?1l Set \x1b[?2l Set \x1b[?3l Set \x1b[?4l Set \x1b[?5l Set \x1b[?6l Set \x1b[?7l Reset \x1b[?8l Reset \x1b[?9l Reset \x1b= Set \x1b> Set \x1b(A Set \x1b)A Set \x1b(B Set \x1b)B Set \x1b(0 Set \x1b)0 Set \x1b(1 Set \x1b)1 Set \x1b(2 Set \x1b)2 Set \x1bN Set \x1bO Set \x1b[m Turn \x1b[0m Turn \x1b[1m Turn \x1b[2m Turn \x1b[4m Turn \x1b[5m Turn \x1b[7m Turn \x1b[8m Turn \x1b[1;2 Set \x1b[1A Move \x1b[2B Move \x1b[3C Move \x1b[4D Move \x1b[H Move \x1b[;H Move \x1b[4;3H Move \x1b[f Move \x1b[;f Move \x1b[1;2 Move \x1bD Move/scroll \x1bM Move/scroll \x1bE Move \x1b7 Save \x1b8 Restore \x1bH Set \x1b[g Clear \x1b[0g Clear \x1b[3g Clear \x1b#3 Double-height \x1b#4 Double-height \x1b#5 Single \x1b#6 Double \x1b[K Clear \x1b[0K Clear \x1b[1K Clear \x1b[2K Clear \x1b[J Clear \x1b[0J Clear \x1b[1J Clear \x1b[2J Clear \x1b5n Device \x1b0n Response: \x1b3n Response: \x1b6n Get \x1b[c Identify \x1b[0c Identify \x1b[?1;20c Response: \x1bc Reset \x1b#8 Screen \x1b[2;1y Confidence \x1b[2;2y Confidence \x1b[2;9y Repeat \x1b[2;10y Repeat \x1b[0q Turn \x1b[1q Turn \x1b[2q Turn \x1b[3q Turn \x1b[4q Turn \x1b< Enter/exit \x1b= Enter \x1b> Exit \x1bF Use \x1bG Use \x1bA Move \x1bB Move \x1bC Move \x1bD Move \x1bH Move \x1b12 Move \x1bI \x1bK \x1bJ \x1bZ \x1b/Z \x1bOP \x1bOQ \x1bOR \x1bOS \x1bA \x1bB \x1bC \x1bD \x1bOp \x1bOq \x1bOr \x1bOs \x1bOt \x1bOu \x1bOv \x1bOw \x1bOx \x1bOy \x1bOm \x1bOl \x1bOn \x1bOM \x1b[i \x1b[1i \x1b[4i \x1b[5i

Hopefully this will help others :)

That spec is also not complete, the standard allows for a lot of expansion that VT100 didn't use but other terminals do, and your regex is overly verbose for the purpose.
Your pattern has several weird discrepancies as well; ESC-O (SS3) 'shifts' the terminal into an alternate font mode, and the next byte is interpreted in that specific mode. The possible values in that mode are not limited to m, n, l, or p through z. I'd not even strip the byte following SS3. SS2 is basically the same functionality (just a different font), but your regex doesn't pull in the next byte.
Last but not least, your regex fails to actually remove the full ANSI codes in the question example, as it leaves behind the m final byte.

milahu · Accepted Answer · 2022-04-08 13:34:02Z

none of the regex solutions worked in my case with OSC sequences (\x1b])

to actually render the visible output, you will need a terminal emulator like pyte

#! /usr/bin/env python3 import pyte # terminal emulator: render terminal output to visible characters pyte_screen = pyte.Screen(80, 24) pyte_stream = pyte.ByteStream(pyte_screen) bytes_ = b''.join([ b'$ cowsay hello\r\n', b'\x1b[?2004l', b'\r', b' _______\r\n', b'< hello >\r\n', b' -------\r\n', b' \\ ^__^\r\n', b' \\ (oo)\\_______\r\n', b' (__)\\ )\\/\\\r\n', b' ||----w |\r\n', b' || ||\r\n', b'\x1b]0;user@laptop1:/tmp\x1b\\', b'\x1b]7;file://laptop1/tmp\x1b\\', b'\x1b[?2004h$ ', ]) pyte_stream.feed(bytes_) # pyte_screen.display always has 80x24 characters, padded with whitespace # -> use rstrip to remove trailing whitespace from all lines text = ("".join([line.rstrip() + "\n" for line in pyte_screen.display])).strip() + "\n" print("text", text) print("cursor", pyte_screen.cursor.y, pyte_screen.cursor.x) print("title", pyte_screen.title)

Do NOT add a poorly tested dependency to your code, github.com/selectel/pyte/issues/56
i guess the impact of a broken terminal emulator is rather small: the user will see broken output immediately. vt100-parser has more tests than pyte.

Rory · Accepted Answer · 2019-04-18 00:22:18Z

If it helps future Stack Overflowers, I was using the crayons library to give my Python output a bit more visual impact, which is advantageous as it works on both Windows and Linux platforms. However I was both displaying onscreen as well as appending to log files, and the escape sequences were impacting legibility of the log files, so wanted to strip them out. However the escape sequences inserted by crayons produced an error:

expected string or bytes-like object

The solution was to cast the parameter to a string, so only a tiny modification to the commonly accepted answer was needed:

def escape_ansi(line): ansi_escape = re.compile(r'(\x9B|\x1B\[)[0-?]*[ -/]*[@-~]') return ansi_escape.sub('', str(line))

That's not really the same problem though. There are loads of different libraries that might produce custom objects that wrap a string, we don't need answers here for every variant that needs conversion to string before a regex works on them.
Thats exactly what I was searching for. If you do sub-process control you get bytes; out.decode("utf-8") will clash with ansi control codes raising: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf7 in position 13894: invalid start byte and the regex won't work on the bytes object.

y.tk · Accepted Answer · 2023-10-17 08:38:54Z

my case when i use pexect.

child.sendline("ls backup") child.expect(r"[0-9]{8}_[0-9]{4}.*") print(child.after.split()) -> NG ['20231016_1603\x1b[0m', '\x1b[01;34m20231016_1606\x1b[0m'] * Add ls option child.sendline("ls --color=never backup") child.expect(r"[0-9]{8}_[0-9]{4}.*") print(child.after.split()) -> OK! ['20231016_1603', '20231016_1606']

s3dev · Accepted Answer · 2024-03-06 13:15:47Z

Just for a different approach from regex, this function iterates the bytes of a string (more specifically an io.StringIO stream) in search of the CSI \x1b, and 'fast-forwards' until the final byte of 'm' is found. Any remaining characters are returned by the function. More specifically, yielded as a generator.

This is designed for ANSI colour sequences only.

Code:

import io def strip_ansi_colour(text: str) -> iter: """Strip ANSI colour sequences from a string. Args: text (str): Text string to be stripped. Returns: iter[str]: A generator for each returned character. Note, this will include newline characters. """ buff = io.StringIO(text) while (b := buff.read(1)): if b == '\x1b': while (b := buff.read(1)) != 'm': continue else: yield b

Example using OP's string:

>>> s = 'ls\r\n\x1b[00m\x1b[01;31mexamplefile.zip\x1b[00m\r\n\x1b[01;31m' >>> ''.join(strip_ansi_colour(s))[2:].strip() # Trim ls and newlines 'examplefile.zip'

Example 2:

>>> s = '\x1b[93;m\x1b[40;m\x1b[22;m\nFoo, spam and eggs.\x1b[0m\n' >>> ''.join(strip_ansi_colour(s)) '\nFoo, spam and eggs.\n'

Community · Accepted Answer · 2017-05-23 12:34:22Z

if you want to remove the \r\n bit, you can pass the string through this function (written by sarnold):

def stripEscape(string): """ Removes all escape sequences from the input string """ delete = "" i=1 while (i<0x20): delete += chr(i) i += 1 t = string.translate(None, delete) return t

Careful though, this will lump together the text in front and behind the escape sequences. So, using Martijn's filtered string 'ls\r\nexamplefile.zip\r\n', you will get lsexamplefile.zip. Note the ls in front of the desired filename.

I would use the stripEscape function first to remove the escape sequences, then pass the output to Martijn's regular expression, which would avoid concatenating the unwanted bit.

The question doesn't ask for whitespace to be removed, only ANSI escape codes. Your translation of sarnold's string.translate() option is not exactly idiomatic either (why use while when for over xrange() would do, e.g. ''.join([chr(i) for i in range(0x20)])), and not applicable to Python 3 (where you could just use dict.fromkeys(range(0x20))) as the string.translate() map).

Vova Ignatov · Accepted Answer · 2020-12-14 19:45:52Z

-4

For 2020 with python 3.5 it as easy as string.encode().decode('ascii')

ascii_string = 'ls\r\n\x1b[00m\x1b[01;31mexamplefile.zip\x1b[00m\r\n\x1b[01;31m' decoded_string = ascii_string.encode().decode('ascii') print(decoded_string) >ls >examplefile.zip >

answered Dec 14, 2020 at 19:45

Vova Ignatov

234 bronze badges

2 Comments

Leonardo Over a year ago

This code doesn't do anything: repr(decoded_string) yelds "'ls\\r\\n\\x1b[00m\\x1b[01;31mexamplefile.zip\\x1b[00m\\r\\n\\x1b[01;31m'", while using the \x1B(?:[@-Z\\-_]|\[[0-?]*[ -/]*[@-~]) regex yields "'ls\\r\\nexamplefile.zip\\r\\n'"

Vova Ignatov Over a year ago

There were no requests for a change of string representation In original post. It is enough for printing or passing to some api methond

Collectives™ on Stack Overflow

How can I remove the ANSI escape sequences from a string in python

Here is a snippet that includes my string.

10 Answers 10

Comments

Below is the regular expression for ANSI standardized control sequences:

Additional References:

10 Comments

Function

Test

Testing

4 Comments

3 Comments

2 Comments

2 Comments

Comments

Code:

Example using OP's string:

Example 2:

Comments

1 Comment

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

Here is a snippet that includes my string.

10 Answers 10

Comments

Below is the regular expression for ANSI standardized control sequences:

Additional References:

10 Comments

Function

Test

Testing

4 Comments

3 Comments

2 Comments

2 Comments

Comments

Code:

Example using OP's string:

Example 2:

Comments

1 Comment

2 Comments

Linked

Related