1

My input is in the form of pairs of comma-separated values, e.g.,

805i,9430 3261i,9418 3950i,9415 4581i,4584i 4729i,9421 6785i,9433 8632i,9434 9391i,9393i 

and I want to read them into a list of pair of string. The below does the job for a given line in open(<filename>,'r')

bs = line.strip().split() bss = [] for b in bs : x, y = b.split(',') bss.append((x,y)) 

However is there a way I can do this in one line with a list comprehension? Note: that I could do [(b.split(',')[0], b.split(',')[1]) for b in bs], but this unnecessarily calls the split function twice.

2
  • if you would have more results with split() and you would need only first two values then you could use slicing tuple( b.split(',')[:2] ) Commented Jun 16 at 9:57
  • I also want to ensure (sanity check) that the input is in the form of pairs of comma-separated values (see comment below by ShadowRanger) Commented Jun 23 at 6:34

4 Answers 4

3

You're just looking for:

[tuple(b.split(',')) for b in bs] 

str.split returns a list, feeding that into the tuple constructor will make it a tuple. "Unpacking" is an unnecessary red herring to chase here.

Sign up to request clarification or add additional context in comments.

1 Comment

Note that without the unpacking, you'll happily accept inputs that aren't exactly length two. The unpacking makes sure such errors don't go silently unnoticed (you'll get an immediate exception) until some later point where that assumption is tested, and fails.
2

You can use an assignment expression to hold some partial state.

bss = [(v[0], v[1]) for b in bs if (v := b.split(','))] 

Another alternative is to use a nested generator expression to create the value.

bss = [(v[0], v[1]) for v in (b.split(',') for b in bs)] 

If you know there are always two values, then you can simply write:

bss = [(x, y) for x, y in (b.split(',') for b in bs)] 

For the final application, I would add an additional empty line/entry check. At that point, its best to not jam this all on one line.

The following example uses generator variables (assigned from generator expressions), which provide an efficient way to break up large comprehensions without the memory overhead you would have when storing temporary computations in a standard container.

with open("data.text") as f: pairs = (word.split(',') for line in f for word in line.split()) bss = [tuple(pair) for pair in pairs if len(pair) == 2] print(bss) 

1 Comment

in particular, the 3rd code block---thanks!
2

You can use csv.reader to split comma-separated values:

import csv bss = list(csv.reader(line.split())) 

With your sample input, bss would become:

[['805i', '9430'], ['3261i', '9418'], ['3950i', '9415'], ['4581i', '4584i'], ['4729i', '9421'], ['6785i', '9433'], ['8632i', '9434'], ['9391i', '9393i']] 

Demo: https://ideone.com/ogJes0

Comments

0

You can avoid multiple calls to str.split() by making one call to re.split() as follows:

import re from itertools import pairwise line = "805i,9430 3261i,9418 3950i,9415 4581i,4584i 4729i,9421 6785i,9433 8632i,9434 9391i,9393i" tokens = re.split(r"[\s,]+", line) output = [t for (i, t) in enumerate(pairwise(tokens)) if i % 2 == 0] print(output) 

Output:

[('805i', '9430'), ('3261i', '9418'), ('3950i', '9415'), ('4581i', '4584i'), ('4729i', '9421'), ('6785i', '9433'), ('8632i', '9434'), ('9391i', '9393i')] 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.