Shuffle the rows of specific columns of table in Perl

Question

I have a table file and I want to shuffle the rows of specific columns in Perl.

For example, I have this array:

a 1 b 2 c 3 d 4 e 5 f 6

and I want to shuffle the second column to get something like this:

a 2 b 1 c 3 d 4 e 5 f 6

TLP · Accepted Answer · 2020-11-25 19:04:27Z

Using List::Util::shuffle might be a good idea. I used a Schwartzian transform to create a list of random numbers, sort them, and insert the column data based on the array index.

use strict; use warnings; use feature 'say'; my @col; while (<DATA>) { push @col, [ split ]; } my @shuffled = map { $col[$_->[0]][1] } # map to @col values sort { $a->[1] <=> $b->[1] } # sort based on rand() value map { [ $_, rand() ] } # each index mapped into array of index and rand() 0 .. $#col; # list of indexes of @col for my $index (0 .. $#col) { say join " ", $col[$index][0], $shuffled[$index]; } __DATA__ a 1 b 2 c 3 d 4 e 5 f 6

Nice solution. It uses a sort which may be more CPU intensive than a simple shuffle (I think it is O[n²] vs O[n]) but it works. This script would also need to be adapted to work on any table text file.
@JeanPaul Thank you, its mainly just a demonstration of how cool Schwartzian transforms are.

Jean Paul · Accepted Answer · 2020-11-27 17:01:13Z

I can use this script to do the job:

#!/usr/bin/env perl use strict; use warnings; use List::Util qw/shuffle/; my @c = split /,/, $ARGV[0]; $_-- for @c; shift; my @lines; my @v; while ( <> ) { my @items = split; $v[$.-1] = [@items[@c]]; $lines[$.-1] = [@items]; } my @order = shuffle (0..$#lines); for my $l (0..$#lines) { my @items = @{ $lines[$l] }; @items[@c] = @{ $lines[$order[$l]] }[@c]; print "@items\n"; }

This script uses List::Util which is part of Perl core modules since perl v5.7.3: corelist List::Util

It can be launched with perl shuffle.pl 2 test.txt

Thanks for pointing this out. I know that Perl is not the only language to use 0 as first index for arrays, but I find it unnatural, especially that Perl is not C, and an array is more than an pointer to a slot in the memory.
I don't know, new Perl versions arrived but the default one used in Unix systems did not change for a long time, I'm not sure it will happen soon.
I don't know why the comment by the other guy above was removed but I looked more closely at the documentation about $[ (perldoc.perl.org/perlvar#$%5B), and I saw that it no longer works starting from Perl v5.30.0 even without doing use v5.16, so I had to remove the use of $[=1 from my script :(.

Polar Bear · Accepted Answer · 2020-11-25 02:16:12Z

Demo code for a case when external modules are not permitted.

use strict; use warnings; use feature 'say'; my %data; while( <DATA> ) { my($l,$d) = split; $data{$l} = $d; } say '- Loaded --------------------'; say "$_ => $data{$_}" for sort keys %data; for( 0..5 ) { @data{ keys %data} = @{ shuffle([values %data]) }; say "-- $_ " . '-' x 24; say "$_ => $data{$_}" for sort keys %data; } sub shuffle { my $data = shift; my($seen, $r, $i); my $n = $#$data; for ( 0..$n ) { do { $i = int(rand($n+1)); } while defined $seen->{$i}; $seen->{$i} = 1; $r->[$_] = $data->[$i]; } return $r; } __DATA__ a 1 b 2 c 3 d 4 e 5 f 6

Output

- Loaded -------------------- a => 1 b => 2 c => 3 d => 4 e => 5 f => 6 -- 0 ------------------------ a => 5 b => 4 c => 2 d => 6 e => 1 f => 3 -- 1 ------------------------ a => 3 b => 6 c => 2 d => 4 e => 1 f => 5 -- 2 ------------------------ a => 4 b => 5 c => 6 d => 1 e => 3 f => 2 -- 3 ------------------------ a => 6 b => 4 c => 1 d => 2 e => 3 f => 5 -- 4 ------------------------ a => 3 b => 4 c => 6 d => 5 e => 1 f => 2 -- 5 ------------------------ a => 6 b => 5 c => 3 d => 4 e => 2 f => 1

You can't use hash keys to store table data. What if there are duplicates? You can't even retain the original order of column 1 because hashes are not ordered. Just because they happen to be unique and sortable in the sample data doesn't mean they will be in a real scenario. A shuffle function with a while loop that waits for a free array index will be extremely inefficient for larger arrays. Imagine a file with 10000 lines, with 9999 numbers taken, each roll will have a 1/10000 chance to hit the last array index.
@TLP - OP should better describe the problem on expected input data. Your points are valid on the algorithm of index computation. Hash was used for demonstration purpose only, instead two arrays can be implemented for storage of elements.
The table I provided in my question was just an example, I would want a script capable to work on any table text file as input.
Also note that the shuffling algorithm you proposed is not optimal since it can take many time to find the last indices. See the way it is implemented in List::Util where the complexity is reduced to O[n]: stackoverflow.com/a/5168324/4374441
@JeanPaul -- what would you do if no modules allowed in the system installed due security reason. Did you took in an account UTF-8 keys and values for your tables? At least you must describe the problem in more details. There is not one solution to fit all possible cases (such solution grows exponentially in size and becomes inefficient).

Collectives™ on Stack Overflow

Shuffle the rows of specific columns of table in Perl

3 Answers 3

2 Comments

3 Comments

6 Comments

Linked

Hot Network Questions