How to move list items to the front if they are in another list while keeping the order

Question

I have a two example lists,

vals = ["a", "c", "d", "e", "f", "g"] xor = ["c", "g"]

I want to sort vals list based on the xor list, i.e., the values in xor should be placed first in vals list in exact order. The rest of the values present in vals should stay in the same order.

Also, it is possible that values in xor might not be in vals in those cases just ignore those values. And also, in the case of duplicates, I only need one value.

Desired output:

vals = ["c", "g", "a", "d", "e", "f"] # here a, d, e, f are not in xor so we keep them in same order as found in vals.

My Approach:

new_list = [] for x in vals: for y in xor: if x == y: new_list.append(x) for x in vals: if x not in xor: new_list.append(x)

The vals list currently has around 800k words or phrases. The xor list has 300k words or phrases but it might increase later on. Some phrases are a bit long as well. What would be the most efficient way to address the problem?

user2390182 · Accepted Answer · 2020-12-20 07:44:59Z

Build a order dict of indeces in xor and use it as a sorting key:

order = {n: i for i, n in enumerate(xor)} sorted(vals, key=lambda x: order.get(x, len(xor))) # ['c', 'g', 'a', 'd', 'e', 'f']

Using len(vals) as a default value ensures all values not in xor will end up at the back. This of course assumes, that you want the values that are in xor to be sorted based on their order in xor (making the process O(M+NlogN)). Otherwise you can be faster (O(M+N)) with:

from operator import contains from functools import partial s = set(xor) result = list(filter(partial(contains, s), vals)) result.extend(v for v in vals if v not in s)

Or in a more readable fashion:

s = set(xor) result = [v for v in vals if v in s] result += (v for v in vals if v not in s)

What is the time complexity of this approach? I have a huge list.
O(M+NlogN) with M the size of xor and N the size of vals.
I don't want to sort the values in` vals`, other values in vals that are not found in xor should stay in the same order
@schwobaseggl then something like partial(operator.contains, a)) would be better. I took the liberty of editing

dreamcrash · Accepted Answer · 2020-12-19 21:48:06Z

Append all the values from xor that exist also in vals, with the list of all values of vals that do not exist on xor:

sorted_list = [v for v in xor if v in vals] + [v for v in vals if v not in xor]

another approach :

output = list(filter(lambda x: x in vals, xor)) + list(filter(lambda x: x not in xor, vals))

Stuart · Accepted Answer · 2020-12-22 18:14:48Z

Given the clarifications added to the question, several of the answers posted here do not actually provide the expected answer, and others are not efficient.

One solution is to take @dreamcrash's answer but using sets for the look-up:

def order_by1(vals, xor): set_vals = set(vals) set_xor = set(xor) return [v for v in xor if v in set_vals] + [v for v in vals if v not in set_xor]

This should increase efficiency by removing the time-consuming (O(n)) list lookup on each iteration of the loop.

Since you don't care about the order of the values that are not found in xor, a variant of this is to replace the 2nd list comprehension with a set operation:

def order_by2(vals, xor): set_vals = set(vals) return [v for v in xor if v in set_vals] + list(set_vals - set(xor))

Another solution is to use ordered sets

def order_by3(vals, xor): vals_set = OrderedSet(vals) return (OrderedSet(xor) & vals_set) | vals_set

Ordered sets use dictionaries under the hood. We can instead use dictionaries directly, making use of the fact that they are ordered:

def order_by4(vals, xor): """ Order list a by each item's position in list xor, if it is found in xor, otherwise by its position in list vals """ d = {k: False for k in xor} for k in vals: d[k] = True return [k for k, v in d.items() if v]

The only other solution that actually does what is required is the first one in @schwobaseggl's answer. Timing these with lists of 800,000 and 300,000 random phrases of which 150,000 overlap:

import random import string import timeit def random_phrase(): return "".join(random.choices(string.ascii_letters, k=10)) def generate_lists(M, N, overlap): a = [random_phrase() for _ in range(M)] b = ([random_phrase() for _ in range(N - overlap)] + random.sample(a, k=overlap)) random.shuffle(b) return a, b def dreamcrash(vals, xor): return [v for v in xor if v in vals] + [v for v in vals if v not in xor] def schwobaseggl(vals, xor): order = {n: i for i, n in enumerate(xor)} len_xor = len(xor) # saves some time when sorting return sorted(vals, key=lambda x: order.get(x, len_xor)) vals, xor = generate_lists(800000, 300000, overlap=150000) # this takes a while for f in [dreamcrash, order_by1, order_by2, order_by3, order_by4, schwobaseggl]: print(f, end='...') print(timeit.timeit(stmt=f"{f.__name__}(vals, xor)", number=1, globals=globals()))

I gave up on timing dreamcrash because it was taking too long. order_by2 seems fastest, followed by order_by4, order_by1, and schwobaseggl, each taking around .5 - 1.5s on my computer. The ordered set solution is a lot slower. Checking whether a set contains an item, and setting an item in a dictionary, are both O(1) in the average case, O(n) in the worst case which explains why the dictionary and set based versions have similar performance.

Not sure but think it would be similar to ordinary sets - wiki.python.org/moin/TimeComplexity - so O(aM+bN) (a and b depending on overlap between xor and vals) on average, O(MN) worst case. In practice the implementation of ordered sets in python may slow things down.

danyroza · Accepted Answer · 2020-12-19 21:17:11Z

0

This should also be one line solution for you:

vals = [1, 2, 5, 4, 3, 2, 11, 6] xor = [10, 11] new_list = xor + [elem for elem in vals if elem not in xor]

(edit: without sorted and correct variables)

edited Dec 19, 2020 at 21:17

answered Dec 19, 2020 at 21:11

danyroza

1741 silver badge10 bronze badges

2 Comments

danyroza Over a year ago

Should be fixed now. It is a matter of simple list generation then.

danyroza Over a year ago

From what he states, it is not really clear whether he wants them in the final list or not. Technically you could achieve this with another list generator instead of xor list at the start.

superb rain · Accepted Answer · 2020-12-19 23:36:03Z

Another way to do it with sorting:

vals.sort(key=set(xor).__contains__, reverse=True)

I suspect it's faster than the others, but don't feel like trying to create test data that might resemble your actual data. (Partly because your question is still not entirely clear. I'm going with what your reference implementation does, i.e., both groups retaining their order in vals.)

Collectives™ on Stack Overflow

How to move list items to the front if they are in another list while keeping the order

5 Answers 5

18 Comments

Comments

4 Comments

2 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

18 Comments

Comments

4 Comments

2 Comments

Comments

Linked

Related