Random sampling from a 2d list w/o numpy

Question

i have a huge dataset but to simplify the problem, think about a 4x4 2d list.

want make a new list with 5 elements which is choosen by randomly from the 4x4 list without using numpy or any addional libraries.

edit:

[['j', 'k', 'z','p'], [1,6,8,9], [8,True,0,'a'], [66,'False', '12', '5]]

i want to pick 5 elements of that list ramdomly with random library functions

output maybe like a new list of choosen elements:

['j', 'False', 'a', 8, 66]

hope that it's clear enough.

sadly its not clear enough yet... do you want to choose any item in a nested list? can you post what you have tried? — adir abargil
– adir abargil, Commented Nov 15, 2020 at 20:02
I'd suggest giving a rough size rather than saying "huge", order of magnitude estimates are enough. e.g. you want this to work with 2d arrays of approx 1 billion by 100, i.e. 10¹¹ elements. note that you'd almost certainly not be using the standard library at this point, but never mind — Sam Mason
– Sam Mason, Commented Nov 15, 2020 at 20:30
you're right. i'll follow your suggession from now on. thank you @SamMason — nous
– nous, Commented Nov 15, 2020 at 21:05

adir abargil · Accepted Answer · 2020-11-15 20:14:42Z

with the random module you can use a list comprehension:

import random list_2d = [... your list 4X4] randmly_chosen = [random.choice(random.choice(list_2d)) for _ in range(5)]

Scott Staniewicz · Accepted Answer · 2020-11-15 20:12:52Z

(Your list right now isn't quite valid syntax, but fixing that up):

>>> import itertools >>> import random >>> list_2d = [['j', 'k', 'z', 'p'], ... [1,6,8,9], ... [8,True,0,'a'], ... [66,'False', '12', '5']] >>> random.choices(list(itertools.chain(*list_2d)), k=5) ['False', '12', 8, 6, 'k']

The itertools.chain call will flatten it into a 1d list, which the random.choices function can handle to sample with replacement: https://docs.python.org/3/library/random.html#random.choices

Also, without more detail of the full problem it will be hard to know whether this solution will be good for the "huge dataset"- it creates a new copy of the full list as a 1d list, so if that uses too much memory, we'd need to see where the data is originally coming from, and if you could avoid the list-of-lists structure in the first place.

mathfux · Accepted Answer · 2020-11-15 20:19:19Z

Idea of flattening ls could still be borrowed from numpy without any use of its methods:

ls = [['j', 'k', 'z', 'p'], [1,6,8,9], [8,True,0,'a'], [66,'False', '12', '5']] flat_ls = [] for n in ls: flat_ls.extend(n) from random import shuffle shuffle(flat_ls) #dynamic change of flat_ls, very fast >>> flat_ls[:5] ['12', 66, 8, 'j', 'a']

Collectives™ on Stack Overflow

Random sampling from a 2d list w/o numpy

3 Answers 3

Comments

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Related