How to split a string at line breaks in python?

Question

I want to copy some tabular data from Excel into a python array. That is, user willselect a range in an Excel table, press "Copy" (CTRL+C) so that the range will be copied to clipboard. Then I will get this clipboard data into a python array (list). I use win32clipboard from pywin32 to get clipboard data into an array:

import win32clipboard def getClip(): win32clipboard.OpenClipboard() data = win32clipboard.GetClipboardData() win32clipboard.CloseClipboard() return data

I copy the following range A1:B5 from Excel:

enter image description here

When I use the function above, I get a string like:

'365\t179\r\n96\t-90\r\n48\t-138\r\n12\t-174\r\n30\t-156\r\n'

How to split this string into a list, so that the list will look like:

[(365,179), (96, -90), (48, -138), (12, -174), (30, -156)]

I use split method, but it doesn't give me what I want.

data.split("\n") ['365\t179\r', '96\t-90\r', '48\t-138\r', '12\t-174\r', '30\t-156\r', '']

poke · Accepted Answer · 2014-01-18 14:22:19Z

There’s actually a str.splitlines method which will split the string by line breaks, regardless of which line breaks are used. So this works on Unix systems with just an \n, on Windows with \r\n and even on old Mac systems where the line break was just an \r.

>>> s = '365\t179\r\n96\t-90\r\n48\t-138\r\n12\t-174\r\n30\t-156\r\n' >>> s.splitlines() ['365\t179', '96\t-90', '48\t-138', '12\t-174', '30\t-156']

Once you have this result, you can split by tabs to get the individual cells. So you essentially have to call cell.split('\t') on each cell. This is best done with a list comprehension:

>>> [row.split('\t') for row in s.splitlines()] [['365', '179'], ['96', '-90'], ['48', '-138'], ['12', '-174'], ['30', '-156']]

As an alternative, you could also use map to apply the splitting operation on each cell:

>>> list(map(lambda cell: cell.split('\t'), s.splitlines())) [['365', '179'], ['96', '-90'], ['48', '-138'], ['12', '-174'], ['30', '-156']]

As the copied data in the clipboard will always have the rows separated by newlines, and the columns separated by tabs, this solution is also safe to use for any range of cells you copied.

If you further want to convert integers or float to its correct datatypes in Python, I guess you could add some more conversion logic by calling int() on all cells that only have digits in them, float() on all cells that have digits and the dot in them ., leaving the rest as strings:

>>> def convert (cell): try: return int(cell) except ValueError: try: return float(cell) except ValueError: return cell >>> [tuple(map(convert, row.split('\t'))) for row in s.splitlines()] [(365, 179), (96, -90), (48, -138), (12, -174), (30, -156)]

For a different string:

>>> s = 'Foo\tbar\r\n123.45\t42\r\n-85\t3.14' >>> [tuple(map(convert, row.split('\t'))) for row in s.splitlines()] [('Foo', 'bar'), (123.45, 42), (-85, 3.14)]

Community · Accepted Answer · 2017-05-23 10:30:25Z

>>> s = '365\t179\r\n96\t-90\r\n48\t-138\r\n12\t-174\r\n30\t-156\r\n' >>> [map(int, x.split('\t')) for x in s.rstrip().split('\r\n')] [[365, 179], [96, -90], [48, -138], [12, -174], [30, -156]]

Using the code from my other answer, you can also handle other types as well:

from ast import literal_eval def solve(x): try: return literal_eval(x) except (ValueError, SyntaxError): return x s = '365\tFoo\r\nBar\t-90.01\r\n48\tspam\r\n12e10\t-174\r\n30\t-156\r\n' print [map(solve, x.split('\t')) for x in s.rstrip().split('\r\n')] #[[365, 'Foo'], ['Bar', -90.01], [48, 'spam'], [120000000000.0, -174], [30, -156]]

What if the copied Excel range has 5 rows and 10 columns? Then the list should have 5 inner-lists, and each inner-list should have 10 items. Will your solution provide this?
OK, but what if the range contains strings and integers together?
Right, I should have included some strings in that range. But wouldn't a separate question be a duplicate of this one?
@alwbtc In that case pass the inner items to ast.literal_eval instead of map(int..

thefourtheye · Accepted Answer · 2014-01-18 13:59:39Z

d = '365\t179\r\n96\t-90\r\n48\t-138\r\n12\t-174\r\n30\t-156\r\n' print [tuple(map(int,item.split(","))) for item in d.replace("\t", ",").split()]

Output

[(365, 179), (96, -90), (48, -138), (12, -174), (30, -156)]

unutbu · Accepted Answer · 2014-01-18 13:58:30Z

In [85]: zip(*[iter(map(int, data.split()))]*2) Out[85]: [(365, 179), (96, -90), (48, -138), (12, -174), (30, -156)]

Here is a breakdown of how it works:

First, split on whitespaces:

In [86]: data.split() Out[86]: ['365', '179', '96', '-90', '48', '-138', '12', '-174', '30', '-156']

Convert the strings into ints:

In [87]: map(int, data.split()) Out[87]: [365, 179, 96, -90, 48, -138, 12, -174, 30, -156]

Use the grouper recipe to group every 2 items:

In [88]: zip(*[iter(map(int, data.split()))]*2) Out[88]: [(365, 179), (96, -90), (48, -138), (12, -174), (30, -156)]

What if the copied Excel range has 5 rows and 10 columns? Then the list should have 5 tuples, and each tuple should have 10 items. Will your solution provide this?
@alwbtc: It could, but you would have to change the 2 to 10 in the grouper recipe. If you don't know how many items are supposed to be in each tuple, a solution which splits on \t would be better.

tayfun · Accepted Answer · 2014-01-18 13:59:32Z

[line.split() for line in my_str.split("\n")]

This just breaks the data into lines and then splits it according to white space. Check it out and modify according to your data.

Collectives™ on Stack Overflow

How to split a string at line breaks in python?

5 Answers 5

Comments

8 Comments

Comments

2 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

8 Comments

Comments

2 Comments

Comments

Linked

Related