72

How can I split a byte string into a list of lines?

In python 2 I had:

rest = "some\nlines" for line in rest.split("\n"): print line 

The code above is simplified for the sake of brevity, but now after some regex processing, I have a byte array in rest and I need to iterate the lines.

2
  • Do you have rest = "some\nlines" or rather rest = b"some\nlines" in Python3? Commented Dec 13, 2012 at 10:35
  • 1
    @Flavius Then try to identify at what point in the process your string becomes a bytes object... then you can improve that point. Commented Dec 13, 2012 at 10:39

3 Answers 3

133

There is no reason to convert to string. Just give split bytes parameters. Split strings with strings, bytes with bytes.

>>> a = b'asdf\nasdf' >>> a.split(b'\n') [b'asdf', b'asdf'] 

Also, since you're splitting on newlines, you could slightly simplify that by using splitlines() (available for both str and bytes):

>>> a = b'asdf\nasdf' >>> a.splitlines() [b'asdf', b'asdf'] 
Sign up to request clarification or add additional context in comments.

2 Comments

Clean and easy.. Nevertheless, I see no reason why python does not handle string as byte automatically when applying it on a byte typed variable
@gies0r Same reason Python doesn't do duck typing in general: It enables sloppy code and forces Python to have a "native" encoding
25

Decode the bytes into unicode (str) and then use str.split:

Python 3.2.3 (default, Oct 19 2012, 19:53:16) [GCC 4.7.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> a = b'asdf\nasdf' >>> a.split('\n') Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: Type str doesn't support the buffer API >>> a = a.decode() >>> a.split('\n') ['asdf', 'asdf'] >>> 

You can also split by b'\n', but I guess you have to work with strings not bytes anyway. So convert all your input data to str as soon as possible and work only with unicode in your code and convert it to bytes when needed for output as late as possible.

Comments

10

try this.. .

rest = b"some\nlines"
rest=rest.decode("utf-8")

then you can do rest.split("\n")

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.