I am working on a project in which I need to remove all formatting from a text file including whitespaces and line breaks, then replace any colons with pipes. I've made some headway but I cannot find a way to mask out the parts that need to be ignored. I am new to sed and am only at novice level with Bash scripting, and am, in fact, not entirely sure sed is the right tool for the job (maybe vi? I typically use Nano). The file that I am trying to format is similar to this
== LUN mysql05-dbdat02 ==
LUNName: mysql05-dbdat02 CollectionStartTime: 2012-09-20T15:43:03-04:00 CollectionEndTime: 2012-09-20T15:43:34-04:00 Capacity CurrentCapacity: 512 IOOperations Reads: 100 Writes: 0 ReadsPerSecond: 0.000000 WritesPerSecond: 0.000000 ReadMBPerSecond: 0.000 WriteMBPerSecond: 0.000 TotalMBPerSecond: 0.000 NonOptimizedIOPerSecond: 0.000000 CacheHitPercentage: 0.000 PerformanceMetrics TotalIOsPerSecond: 0.000 ReadIOsPerSecond: 0.000 WriteIOsPerSecond: 0.000 TotalMBPerSecond: 0.000 ReadMBPerSecond: 0.000 WriteMBPerSecond: 0.000 Performance == LUN mysql05-dbdat02 ==
LUNName: mysql05-dbdat02 CollectionStartTime: 2012-09-20T15:43:03-04:00 CollectionEndTime: 2012-09-20T15:43:34-04:00 Capacity CurrentCapacity: 512 IOOperations Reads: 100 Writes: 0 ReadsPerSecond: 0.000000 WritesPerSecond: 0.000000 ReadMBPerSecond: 0.000 WriteMBPerSecond: 0.000 TotalMBPerSecond: 0.000 NonOptimizedIOPerSecond: 0.000000 CacheHitPercentage: 0.000 PerformanceMetrics TotalIOsPerSecond: 0.000 ReadIOsPerSecond: 0.000 WriteIOsPerSecond: 0.000 TotalMBPerSecond: 0.000 ReadMBPerSecond: 0.000 WriteMBPerSecond: 0.000 Performance and the output needs to be something like this,
cm-data-unity01|LUNNam=cm-data-unity01|CollectionStartTim=2012-09-20T15:43:03-04:00|CollectionEndTim=2012-09-20T15:43:34-04:00|Capacity|CurrentCapacit=2048|IOOperations|Read=10|Write=90|ReadsPerSecon=8.000000|WritesPerSecon=76.000000|ReadMBPerSecon=0.430|WriteMBPerSecon=0.542|TotalMBPerSecon=0.973|NonOptimizedIOPerSecon=85.000000|CacheHitPercentag=0.000|PerformanceMetrics|TotalIOsPerSecon=84.000|ReadIOsPerSecon=8.000|WriteIOsPerSecon=76.000|TotalMBPerSecon=0.973|ReadMBPerSecon=0.430|WriteMBPerSecon=0.542|Performance| or, all on one line.
I have written a very simple Bash script to format it, like thus
# Author Christopher George Bollinger # Comments: This script will modify the snippet.txt file. # This script is meant to, first, take a specific bit of unformatted data and remove all line breaks and non-printable characters. # Following this, the script is to replace any appropriate colons (those being used as delimiters) and replace them with the equals (=) character. #!/bin/bash echo "This script will remove line breaks, remove non-printable characters, and will replace colons used as field delimiters with the equals '(=)' character." cp snippet.txt snippetwork.txt RmLB () { tr -d '\n' < snippetwork.txt > snippetwork1.txt } RmNonPrint () { tr -cd "[:print:]" < snippetwork1.txt > snippetwork2.txt } RplcW () { sed 's/: /=/g' snippetwork2.txt > snippetwork3.txt } RmWtSpc () { tr -s ' ' '|' < snippetwork3.txt > snippetgood.txt sed 'd/(?:[a-z]=) /' } QuChek () { cat snippetgood.txt read -p "Is this satisfactory? (Y/n)" Choice case $Choice in Y|y) mv snippetgood.txt snippet.txt rm -f snippetwork* rm -f snippetgood.txt ;; N|n) exit ;; *) echo "Invalid Input." ;; esac } read -p "Would you like to begin? (Y/n)" YorN case $YorN in Y|y) RmLB RmNonPrint RplcW RmWtSpc QuChek ;; N|n) exit ;; *) echo "Invalid Selection" ;; esac Which functions except the output is not quite right, it gives:
==|LUN|mysql05-dbdat02|==|LUNName=|mysql05-dbdat02|CollectionStartTime=|2012-09-20T15:43:03-04:00|CollectionEndTime=|2012-09-20T15:43:34-04:00|Capacity|CurrentCapacity=|512|IOOperations|Reads=|100|Writes=|0|ReadsPerSecond=|0.000000|WritesPerSecond=|0.000000|ReadMBPerSecond=|0.000|WriteMBPerSecond=|0.000|TotalMBPerSecond=|0.000|NonOptimizedIOPerSecond=|0.000000|CacheHitPercentage=|0.000|PerformanceMetrics|TotalIOsPerSecond=|0.000|ReadIOsPerSecond=|0.000|WriteIOsPerSecond=|0.000|TotalMBPerSecond=|0.000|ReadMBPerSecond=|0.000|WriteMBPerSecond=|0.000|Performance|==|LUN|mysql05-dbdat02|==|LUNName=|mysql05-dbdat02|CollectionStartTime=|2012-09-20T15:43:03-04:00|CollectionEndTime=|2012-09-20T15:43:34-04:00|Capacity|CurrentCapacity=|512|IOOperations|Reads=|100|Writes=|0|ReadsPerSecond=|0.000000|WritesPerSecond=|0.000000|ReadMBPerSecond=|0.000|WriteMBPerSecond=|0.000|TotalMBPerSecond=|0.000|NonOptimizedIOPerSecond=|0.000000|CacheHitPercentage=|0.000|PerformanceMetrics|TotalIOsPerSecond=|0.000|ReadIOsPerSecond=|0.000|WriteIOsPerSecond=|0.000|TotalMBPerSecond=|0.000|ReadMBPerSecond=|0.000|WriteMBPerSecond=|0.000|Performance| the problem being the pipes appearing following the equals signs. If anyone could point me in the right direction on getting this right, or even to an online resource for some clarification, I would be immensely grateful.
Funny thing is the end game for this is that, while the immediate request is to format like the above example, to feed this into a Unix cli graphing tool (my guess is gnuplot). From what I understand, gnuplot requires the formatting to be in columns. As mentioned, this is new territory for me and I would greatly appreciate any advice given.
cm-data-unity01supposed to come from? What parts do you want to "mask" and how? Do you want:to become|or=?