Perl Optimization Recommendation

Question

I'm looking for suggestions on how to optimize this perl script.

I have this script to do some minor reformatting of a file. The script does the following:

Reads a "|" delimited file from STDIN
Removes trailing whitespace,
Removes "NULL" text string
Converts columns with dates to "YYYYMMDD" format from "YYYY-MM-DD hh:mm" date format.
Prints to STDOUT and does a kluge to keep from losing the last column of data when it is NULL. The # of of columns needs to be the same for each line.

Sample Input:

.091590.S |CHF|SWX|2011-05-23 00:00| 77.25| NULL| NULL| 78.620000000000005| NULL .091590.S |CHF|SWX|2011-05-24 00:00| 77.599999999999994| NULL| NULL| 77.25| NULL .091590.S |CHF|SWX|2011-05-25 00:00| 77.760000000000005| NULL| NULL| 77.599999999999994| NULL .091590.S |CHF|SWX|2011-05-26 00:00| 77.430000000000007| NULL| NULL| 77.760000000000005| NULL .091590.S |CHF|SWX|2011-05-27 00:00| 77.909999999999997| NULL| NULL| 77.430000000000007| NULL .091590.S |CHF|SWX|2011-05-30 00:00| 78.060000000000002| NULL| NULL| 77.909999999999997| 3506

FormattingScript.pl [col]

Where [col] can be a single number or a list of numbers delimited by comma. This input determines which column or columns need date conversion.

@updcol = split(',',@ARGV[0]); while (<STDIN>) { s/.$/|DATAEND/g; ## USING THIS TO KEEP FROM TRUNCATING NULL LAST COLUMN s/^\s*//g; s/\s*$//g; s/\s*\|/\|/g; s/\|\s*/\|/g; s/\|NULL\|/\|\|/g; s/\|NULL\s*$/\|/g; s/\|NULL\s*/\|/g; s/\|NULL$/\|/g; @dataline = split('\|',$_); if (@updcol[0] != 999) { ## REFORMAT DATES IF PARAM IS NOT 999 foreach my $col (@updcol) { $dataline[$col]=substr($dataline[$col],0,4).substr($dataline[$col],5,2).substr($dataline[$col],8,2); }} $dataline[-1]=""; $line=join('|',@dataline); print substr($line,0,-1)."\n"; } exit 0;

Sample Output:

.091590.S|CHF|SWX|2011-05-23 00:00|77.25|||78.620000000000005| .091590.S|CHF|SWX|2011-05-24 00:00|77.599999999999994|||77.25| .091590.S|CHF|SWX|2011-05-25 00:00|77.760000000000005|||77.599999999999994| .091590.S|CHF|SWX|2011-05-26 00:00|77.430000000000007|||77.760000000000005| .091590.S|CHF|SWX|2011-05-27 00:00|77.909999999999997|||77.430000000000007| .091590.S|CHF|SWX|2011-05-30 00:00|78.060000000000002|||77.909999999999997|3506

Please remember the rules of Optimization Club. Why do you need to optimize this program? — user554546
– user554546, Commented Jul 24, 2012 at 21:23

ikegami · Accepted Answer · 2012-07-25 14:23:05Z

Any optimisations are going to be micro, which means you'll need to take out Benchmark and start testing different ways of doing the same thing.

You would benefit more from cleaning up the code than from optimising it.

my @date_cols = split(/,/, shift(@ARGV)); while (<>) { #chomp; # Redundant. my @fields = split(/\|/, $_, -1); for (@fields) { s/^\s+//; s/\s+\z//; s/^NULL\z//; } for (@fields[@date_cols]) { s/^(....)-(..)-(..).*/$1$2$3/s; } print(join('|', @fields), "\n"); }

Thanks @ikegami. This was a very helpful exercise. It reminds me how much I still have to learn. I did have to add <STDIN> instead of <> in order for the script to recognize both the parameter I passed and the stream.

throughnothing · Accepted Answer · 2012-07-25 04:47:32Z

You may be able to optimize your regexes using Regexp::Assemble. This will enable you to combine all your regexes into one regex that will likely execute faster than running multiple regexes.

Collectives™ on Stack Overflow

Perl Optimization Recommendation

2 Answers 2

3 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Linked

Related