0

i have a huge dataset but to simplify the problem, think about a 4x4 2d list.

want make a new list with 5 elements which is choosen by randomly from the 4x4 list without using numpy or any addional libraries.

edit:

[['j', 'k', 'z','p'], [1,6,8,9], [8,True,0,'a'], [66,'False', '12', '5]] 

i want to pick 5 elements of that list ramdomly with random library functions

output maybe like a new list of choosen elements:

['j', 'False', 'a', 8, 66] 

hope that it's clear enough.

3
  • 1
    sadly its not clear enough yet... do you want to choose any item in a nested list? can you post what you have tried? Commented Nov 15, 2020 at 20:02
  • 1
    I'd suggest giving a rough size rather than saying "huge", order of magnitude estimates are enough. e.g. you want this to work with 2d arrays of approx 1 billion by 100, i.e. 10¹¹ elements. note that you'd almost certainly not be using the standard library at this point, but never mind Commented Nov 15, 2020 at 20:30
  • you're right. i'll follow your suggession from now on. thank you @SamMason Commented Nov 15, 2020 at 21:05

3 Answers 3

1

with the random module you can use a list comprehension:

import random list_2d = [... your list 4X4] randmly_chosen = [random.choice(random.choice(list_2d)) for _ in range(5)] 
Sign up to request clarification or add additional context in comments.

Comments

0

(Your list right now isn't quite valid syntax, but fixing that up):

>>> import itertools >>> import random >>> list_2d = [['j', 'k', 'z', 'p'], ... [1,6,8,9], ... [8,True,0,'a'], ... [66,'False', '12', '5']] >>> random.choices(list(itertools.chain(*list_2d)), k=5) ['False', '12', 8, 6, 'k'] 

The itertools.chain call will flatten it into a 1d list, which the random.choices function can handle to sample with replacement: https://docs.python.org/3/library/random.html#random.choices

Also, without more detail of the full problem it will be hard to know whether this solution will be good for the "huge dataset"- it creates a new copy of the full list as a 1d list, so if that uses too much memory, we'd need to see where the data is originally coming from, and if you could avoid the list-of-lists structure in the first place.

Comments

0

Idea of flattening ls could still be borrowed from numpy without any use of its methods:

ls = [['j', 'k', 'z', 'p'], [1,6,8,9], [8,True,0,'a'], [66,'False', '12', '5']] flat_ls = [] for n in ls: flat_ls.extend(n) from random import shuffle shuffle(flat_ls) #dynamic change of flat_ls, very fast >>> flat_ls[:5] ['12', 66, 8, 'j', 'a'] 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.