3

I am working on a project in which I need to remove all formatting from a text file including whitespaces and line breaks, then replace any colons with pipes. I've made some headway but I cannot find a way to mask out the parts that need to be ignored. I am new to sed and am only at novice level with Bash scripting, and am, in fact, not entirely sure sed is the right tool for the job (maybe vi? I typically use Nano). The file that I am trying to format is similar to this

== LUN mysql05-dbdat02 ==

 LUNName: mysql05-dbdat02 CollectionStartTime: 2012-09-20T15:43:03-04:00 CollectionEndTime: 2012-09-20T15:43:34-04:00 Capacity CurrentCapacity: 512 IOOperations Reads: 100 Writes: 0 ReadsPerSecond: 0.000000 WritesPerSecond: 0.000000 ReadMBPerSecond: 0.000 WriteMBPerSecond: 0.000 TotalMBPerSecond: 0.000 NonOptimizedIOPerSecond: 0.000000 CacheHitPercentage: 0.000 PerformanceMetrics TotalIOsPerSecond: 0.000 ReadIOsPerSecond: 0.000 WriteIOsPerSecond: 0.000 TotalMBPerSecond: 0.000 ReadMBPerSecond: 0.000 WriteMBPerSecond: 0.000 Performance 

== LUN mysql05-dbdat02 ==

 LUNName: mysql05-dbdat02 CollectionStartTime: 2012-09-20T15:43:03-04:00 CollectionEndTime: 2012-09-20T15:43:34-04:00 Capacity CurrentCapacity: 512 IOOperations Reads: 100 Writes: 0 ReadsPerSecond: 0.000000 WritesPerSecond: 0.000000 ReadMBPerSecond: 0.000 WriteMBPerSecond: 0.000 TotalMBPerSecond: 0.000 NonOptimizedIOPerSecond: 0.000000 CacheHitPercentage: 0.000 PerformanceMetrics TotalIOsPerSecond: 0.000 ReadIOsPerSecond: 0.000 WriteIOsPerSecond: 0.000 TotalMBPerSecond: 0.000 ReadMBPerSecond: 0.000 WriteMBPerSecond: 0.000 Performance 

and the output needs to be something like this,

cm-data-unity01|LUNNam=cm-data-unity01|CollectionStartTim=2012-09-20T15:43:03-04:00|CollectionEndTim=2012-09-20T15:43:34-04:00|Capacity|CurrentCapacit=2048|IOOperations|Read=10|Write=90|ReadsPerSecon=8.000000|WritesPerSecon=76.000000|ReadMBPerSecon=0.430|WriteMBPerSecon=0.542|TotalMBPerSecon=0.973|NonOptimizedIOPerSecon=85.000000|CacheHitPercentag=0.000|PerformanceMetrics|TotalIOsPerSecon=84.000|ReadIOsPerSecon=8.000|WriteIOsPerSecon=76.000|TotalMBPerSecon=0.973|ReadMBPerSecon=0.430|WriteMBPerSecon=0.542|Performance| 

or, all on one line.

I have written a very simple Bash script to format it, like thus

# Author Christopher George Bollinger # Comments: This script will modify the snippet.txt file. # This script is meant to, first, take a specific bit of unformatted data and remove all line breaks and non-printable characters. # Following this, the script is to replace any appropriate colons (those being used as delimiters) and replace them with the equals (=) character. #!/bin/bash echo "This script will remove line breaks, remove non-printable characters, and will replace colons used as field delimiters with the equals '(=)' character." cp snippet.txt snippetwork.txt RmLB () { tr -d '\n' < snippetwork.txt > snippetwork1.txt } RmNonPrint () { tr -cd "[:print:]" < snippetwork1.txt > snippetwork2.txt } RplcW () { sed 's/: /=/g' snippetwork2.txt > snippetwork3.txt } RmWtSpc () { tr -s ' ' '|' < snippetwork3.txt > snippetgood.txt sed 'd/(?:[a-z]=) /' } QuChek () { cat snippetgood.txt read -p "Is this satisfactory? (Y/n)" Choice case $Choice in Y|y) mv snippetgood.txt snippet.txt rm -f snippetwork* rm -f snippetgood.txt ;; N|n) exit ;; *) echo "Invalid Input." ;; esac } read -p "Would you like to begin? (Y/n)" YorN case $YorN in Y|y) RmLB RmNonPrint RplcW RmWtSpc QuChek ;; N|n) exit ;; *) echo "Invalid Selection" ;; esac 

Which functions except the output is not quite right, it gives:

==|LUN|mysql05-dbdat02|==|LUNName=|mysql05-dbdat02|CollectionStartTime=|2012-09-20T15:43:03-04:00|CollectionEndTime=|2012-09-20T15:43:34-04:00|Capacity|CurrentCapacity=|512|IOOperations|Reads=|100|Writes=|0|ReadsPerSecond=|0.000000|WritesPerSecond=|0.000000|ReadMBPerSecond=|0.000|WriteMBPerSecond=|0.000|TotalMBPerSecond=|0.000|NonOptimizedIOPerSecond=|0.000000|CacheHitPercentage=|0.000|PerformanceMetrics|TotalIOsPerSecond=|0.000|ReadIOsPerSecond=|0.000|WriteIOsPerSecond=|0.000|TotalMBPerSecond=|0.000|ReadMBPerSecond=|0.000|WriteMBPerSecond=|0.000|Performance|==|LUN|mysql05-dbdat02|==|LUNName=|mysql05-dbdat02|CollectionStartTime=|2012-09-20T15:43:03-04:00|CollectionEndTime=|2012-09-20T15:43:34-04:00|Capacity|CurrentCapacity=|512|IOOperations|Reads=|100|Writes=|0|ReadsPerSecond=|0.000000|WritesPerSecond=|0.000000|ReadMBPerSecond=|0.000|WriteMBPerSecond=|0.000|TotalMBPerSecond=|0.000|NonOptimizedIOPerSecond=|0.000000|CacheHitPercentage=|0.000|PerformanceMetrics|TotalIOsPerSecond=|0.000|ReadIOsPerSecond=|0.000|WriteIOsPerSecond=|0.000|TotalMBPerSecond=|0.000|ReadMBPerSecond=|0.000|WriteMBPerSecond=|0.000|Performance| 

the problem being the pipes appearing following the equals signs. If anyone could point me in the right direction on getting this right, or even to an online resource for some clarification, I would be immensely grateful.

Funny thing is the end game for this is that, while the immediate request is to format like the above example, to feed this into a Unix cli graphing tool (my guess is gnuplot). From what I understand, gnuplot requires the formatting to be in columns. As mentioned, this is new territory for me and I would greatly appreciate any advice given.

1
  • This really would be simpler to understand if you showed your desired output based on one of the input files shown. Where is the first cm-data-unity01 supposed to come from? What parts do you want to "mask" and how? Do you want : to become | or =? Commented May 7, 2014 at 16:36

2 Answers 2

3

I am not quite sure what you're trying to do. Using your first input file, I create this output:

LUNName=mysql05-dbdat02|CollectionStartTime=2012-09-20T15:43:03-04:00|CollectionEndTime=2012-09-20T15:43:34-04:00|Capacity|CurrentCapacity=512|IOOperations|Reads=100|Writes=0|ReadsPerSecond=0.000000|WritesPerSecond=0.000000|ReadMBPerSecond=0.000|WriteMBPerSecond=0.000|TotalMBPerSecond=0.000|NonOptimizedIOPerSecond=0.000000|CacheHitPercentage=0.000|PerformanceMetrics|TotalIOsPerSecond=0.000|ReadIOsPerSecond=0.000|WriteIOsPerSecond=0.000|TotalMBPerSecond=0.000|ReadMBPerSecond=0.000|WriteMBPerSecond=0.000|Performance| 

With this perl one liner:

perl -pe 's/\n/|/;s/\s*//g; s/:/=/; END{print "\n"}' file 

You could also do it with this:

sed -r 's/\s*//g; s/:/=/;' file | tr '\n' '|' 
3
  • 1
    You could do sed 's/:/=/;a|' | tr -dc '[:graph:]' and just let tr do basically all of the work. Commented Aug 20, 2014 at 1:09
  • @mikeserv nice! I hadn't thought of that. Commented Aug 22, 2014 at 11:21
  • Well - i only thought of it because you already had tr in there. When i read the question i was puzzling about how it might be done, then i looked at your answer and thought - no answer here from me. vote - comment. Commented Aug 22, 2014 at 11:59
1
 sed -e ':a;N;$!ba;s/\n/\|/g;s/: */=/g;s/ *//g' '<yourinputfilehere>' > '<youroutputfilehere>' 

explanation: the first part: :a;N;$!ba;s/\n/\|/g removes all line breaks and replaces them with | better explanation on the syntax is here: https://stackoverflow.com/questions/1251999/sed-how-can-i-replace-a-newline-n

the 2nd part, ;s/: */==/g replaces all colons followed by one or more spaces with ==

the 3rd part ;s/ *//g removes all singular or multiple spaces.

obviously your input file and output file need to be replaced. if you want to avoid debug output in your output file you can add 2> '/dev/null' at the end

I didnt really understand what your plan was with your input, but you should be able to implement it from here.

3
  • s/: */==/g will replace all colons, irrespective of whether they are followed by a space. * means zero or more not one or more. Commented May 7, 2014 at 16:42
  • it will do that, thats why I used two spaces between colon and asterisk, i.e. :..*. - That means it will replace one space followed by zero or more spaces. hence at least one Commented May 7, 2014 at 16:44
  • Ah, yes, indeed you did. Sorry, missed that. Have a +1! Commented May 7, 2014 at 16:44

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.