2

I want to create a script that randomly shuffles the rows and columns of a large csv file. For example, for a initial file f.csv:

a, b, c ,d e, f, g, h i, j, k, l 

First, we shuffle the rows to obtain f1.csv:

e, f, g, h a, b, c ,d i, j, k, l 

Then, we shuffle the columns f2.csv:

g, e, h, f c, a, d, b k, i, l, j 

In order to shuffle the rows, we can use from here:

awk 'BEGIN{srand() } { lines[++d]=$0 } END{ while (1){ if (e==d) {break} RANDOM = int(1 + rand() * d) if ( RANDOM in lines ){ print lines[RANDOM] delete lines[RANDOM] ++e } } }' f.csv > f1.csv 

But, how to shuffle the columns?

2
  • just do the same thing but do it on $0 when populating lines[]. Try to code it yourself, the logics all there for you in your script. Commented Sep 15, 2014 at 17:50
  • got this error---> awk: 13: unexpected character '.'------- what i have made wrong? Commented Feb 5, 2016 at 19:15

2 Answers 2

1

Here is a way to shuffle columns using awk:

awk ' BEGIN { FS = " *, *"; srand() } { for (col=1; col<=NF; col++) { lines[NR,col] = $col; columns[col] } } END { while (1) { if (fld == NF) { break } RANDOM = int (1 + rand() * col) if (RANDOM in columns) { order[++seq] = RANDOM delete columns[RANDOM] ++fld } } for (nr=1; nr<=NR; nr++) { for (fld=1; fld<=seq; fld++) { printf "%s%s", lines[nr,order[fld]], (fld==seq?RS:", ") } } }' f.csv 

Output:

b, a, c, d f, e, g, h j, i, k, l 
Sign up to request clarification or add additional context in comments.

Comments

1

If you're open to other languages, here's a ruby solution:

$ ruby -rcsv -e 'CSV.read(ARGV.shift).shuffle.transpose.shuffle.transpose.each {|row| puts row.to_csv}' f.csv j, k, l,i f, g, h,e b, c ,d,a 

Ruby has got tons of builtin functionality, including shuffle and transpose methods on Arrays, which fits this problem exactly.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.