0

Let's say I have a list like this:

[(9600002, 42, 3), (9600001, 17, 3), (9600003, 11, 1), (9600002, 14, 5), (9600001, 17, 1), (9600003, 11, 4), (9600001, 17, 4), (9600001, 14, 3), (9600002, 42, 6), (9600002, 42, 1)] 

The first number is the user_id, the second is the tv_program_code, and the third is the season_id.

My question

How can I find out the program_code with more than 1 season subscribed, and then print the user_id and the tv_program_code? For example:

9600001 17 

Or do you have any suggestion of which data structure I should apply?

2 Answers 2

2

One method is to use collections.Counter.

The idea is to count the number of series per (user, program) combination using a dictionary.

Then filter for count greater than 1 via a dictionary comprehension.

from collections import Counter lst = [(9600002, 42, 3), (9600001, 17, 3), (9600003, 11, 1), (9600002, 14, 5), (9600001, 17, 1), (9600003, 11, 4), (9600001, 17, 4), (9600001, 14, 3), (9600002, 42, 6), (9600002, 42, 1)] c = Counter() for user, program, season in lst: c[(user, program)] += 1 print(c) # Counter({(9600002, 42): 3, (9600001, 17): 3, (9600003, 11): 2, # (9600002, 14): 1, (9600001, 14): 1}) res = {k: v for k, v in c.items() if v > 1} print(res) # {(9600002, 42): 3, (9600001, 17): 3, (9600003, 11): 2} print(res.keys()) # dict_keys([(9600002, 42), (9600001, 17), (9600003, 11)]) 

Note on Counter versus defaultdict(int)

Counter is twice as slow as defaultdict(int), see benchmarking below. You can switch easily to defaultdict(int) if performance matters and none of these features are relevant to you:

  1. Missing Counter keys don't get added automatically when querying.
  2. You can add / subtract Counter objects.
  3. Counter offers additional methods, e.g. elements, most_common.

Benchmarking on Python 3.6.2.

from collections import defaultdict, Counter lst = lst * 100000 def counter(lst): c = Counter() for user, program, season in lst: c[(user, program)] += 1 return c def dd(lst): d = defaultdict(int) for user, program, season in lst: d[(user, program)] += 1 return d %timeit counter(lst) # 900 ms %timeit dd(lst) # 450 ms 
Sign up to request clarification or add additional context in comments.

12 Comments

This looks good, but you aren't actually using the Counter interface at all, really. Might as well be a defaultdict.
@miradulo, I have this discussion every time :). I don't see why one should, by default, prefer defaultdict(int) over Counter. Can you please explain further? Relevant question
Thx,bro,and there is only one thing i don't understand res = {k: v for k, v in c.items() if v > 1} k: v for k what is the k doing here?
c.items() iterates through keys and values of your Counter dictionary. k is the tuple key (i.e. combination of user and program), while v is the count. You don't want to modify or filter by the key in this case.
@jpp The defaultdict class is written in C, whereas a collections.Counter is written in Python. But beyond that, just on principle, why use a class giving you a far larger interface that you don't actually need?
|
1

There are many ways to do this task

first using detaultdict :

import collections data=[(9600002, 42, 3), (9600001, 17, 3), (9600003, 11, 1), (9600002, 14, 5), (9600001, 17, 1), (9600003, 11, 4), (9600001, 17, 4), (9600001, 14, 3), (9600002, 42, 6), (9600002, 42, 1)] d=collections.defaultdict(list) for i in data: d[(i[0],i[1])].append(i) print(list(filter(lambda x:len(x)>1,d.values()))) 

output:

[[(9600003, 11, 1), (9600003, 11, 4)], [(9600001, 17, 3), (9600001, 17, 1), (9600001, 17, 4)], [(9600002, 42, 3), (9600002, 42, 6), (9600002, 42, 1)]] 

Second using itertools groupby :

import itertools print(list(filter(lambda x:len(x)>1,[list(j) for i,j in itertools.groupby(sorted(data),key=lambda x:(x[0],x[1]))]))) 

output:

[[(9600001, 17, 1), (9600001, 17, 3), (9600001, 17, 4)], [(9600002, 42, 1), (9600002, 42, 3), (9600002, 42, 6)], [(9600003, 11, 1), (9600003, 11, 4)]] 

Third approach

At last you can also try manual approach instead of using any import :

d={} for i in data: if (i[0],i[1]) not in d: d[(i[0],i[1])]=[i] else: d[(i[0],i[1])].append(i) print(list(filter(lambda x:len(x)>1,d.values()))) 

output:

[[(9600003, 11, 1), (9600003, 11, 4)], [(9600001, 17, 3), (9600001, 17, 1), (9600001, 17, 4)], [(9600002, 42, 3), (9600002, 42, 6), (9600002, 42, 1)]] 

3 Comments

Ayodhyankit Paul, one more thing, every sublist belongs to another class, so every sublist is an object when i try these methods i always end up getting the error [ object is not iterable ] , how can i fix this problem?
@ShaneFAN show me example , I am not able to get your issue.
link@Ayodhyankit Paul

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.