1

I'm looking for suggestions on how to optimize this perl script.

I have this script to do some minor reformatting of a file. The script does the following:

  1. Reads a "|" delimited file from STDIN
  2. Removes trailing whitespace,
  3. Removes "NULL" text string
  4. Converts columns with dates to "YYYYMMDD" format from "YYYY-MM-DD hh:mm" date format.
  5. Prints to STDOUT and does a kluge to keep from losing the last column of data when it is NULL. The # of of columns needs to be the same for each line.

Sample Input:

.091590.S |CHF|SWX|2011-05-23 00:00| 77.25| NULL| NULL| 78.620000000000005| NULL .091590.S |CHF|SWX|2011-05-24 00:00| 77.599999999999994| NULL| NULL| 77.25| NULL .091590.S |CHF|SWX|2011-05-25 00:00| 77.760000000000005| NULL| NULL| 77.599999999999994| NULL .091590.S |CHF|SWX|2011-05-26 00:00| 77.430000000000007| NULL| NULL| 77.760000000000005| NULL .091590.S |CHF|SWX|2011-05-27 00:00| 77.909999999999997| NULL| NULL| 77.430000000000007| NULL .091590.S |CHF|SWX|2011-05-30 00:00| 78.060000000000002| NULL| NULL| 77.909999999999997| 3506 

FormattingScript.pl [col]

Where [col] can be a single number or a list of numbers delimited by comma. This input determines which column or columns need date conversion.

@updcol = split(',',@ARGV[0]); while (<STDIN>) { s/.$/|DATAEND/g; ## USING THIS TO KEEP FROM TRUNCATING NULL LAST COLUMN s/^\s*//g; s/\s*$//g; s/\s*\|/\|/g; s/\|\s*/\|/g; s/\|NULL\|/\|\|/g; s/\|NULL\s*$/\|/g; s/\|NULL\s*/\|/g; s/\|NULL$/\|/g; @dataline = split('\|',$_); if (@updcol[0] != 999) { ## REFORMAT DATES IF PARAM IS NOT 999 foreach my $col (@updcol) { $dataline[$col]=substr($dataline[$col],0,4).substr($dataline[$col],5,2).substr($dataline[$col],8,2); }} $dataline[-1]=""; $line=join('|',@dataline); print substr($line,0,-1)."\n"; } exit 0; 

Sample Output:

.091590.S|CHF|SWX|2011-05-23 00:00|77.25|||78.620000000000005| .091590.S|CHF|SWX|2011-05-24 00:00|77.599999999999994|||77.25| .091590.S|CHF|SWX|2011-05-25 00:00|77.760000000000005|||77.599999999999994| .091590.S|CHF|SWX|2011-05-26 00:00|77.430000000000007|||77.760000000000005| .091590.S|CHF|SWX|2011-05-27 00:00|77.909999999999997|||77.430000000000007| .091590.S|CHF|SWX|2011-05-30 00:00|78.060000000000002|||77.909999999999997|3506 
1
  • 2
    Please remember the rules of Optimization Club. Why do you need to optimize this program? Commented Jul 24, 2012 at 21:23

2 Answers 2

9

Any optimisations are going to be micro, which means you'll need to take out Benchmark and start testing different ways of doing the same thing.

You would benefit more from cleaning up the code than from optimising it.

my @date_cols = split(/,/, shift(@ARGV)); while (<>) { #chomp; # Redundant. my @fields = split(/\|/, $_, -1); for (@fields) { s/^\s+//; s/\s+\z//; s/^NULL\z//; } for (@fields[@date_cols]) { s/^(....)-(..)-(..).*/$1$2$3/s; } print(join('|', @fields), "\n"); } 
Sign up to request clarification or add additional context in comments.

3 Comments

Fixed a small bug. Removed redundant chomp.
Thanks @ikegami. This was a very helpful exercise. It reminds me how much I still have to learn. I did have to add <STDIN> instead of <> in order for the script to recognize both the parameter I passed and the stream.
@DataTsra, Oops, fixed by shifting from @ARGV
2

You may be able to optimize your regexes using Regexp::Assemble. This will enable you to combine all your regexes into one regex that will likely execute faster than running multiple regexes.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.