2

this is my file.

... </script> <!--START: Google Analytics ---> <script type="text/javascript" src="../src/goog/ga_body.js"></script> <!--END: Google Analytics ---> </body> </html> ... 

how do I delete every thing <!--START: Google Analytics ---> and <!--END: Google Analytics ---> inclusively? So effectively this:

<!--START: Google Analytics ---> <script type="text/javascript" src="../src/goog/ga_body.js"></script> <!--END: Google Analytics ---> 

will be gone. and this will be left i.e. that is nothing, the 4 lines will be replaced with nothing.

</script> <nothing here 4 lines deleted> </body> </html> 

I am looking at doing it in bash so maybe sed and awk might be my best bet, although python might be better.



EDIT1

This is something I have written before, but it is probably very poor coding, I will work off this find2PatternsAndDeleteTextInBetween.sh:

#HEre I want to find 2 patterns and delete whats in between #this example works #this is the 2 patterns I want to fine Start and End #have to use some escape characters here for this to show properly # have to use \n for it to appear in this format #<!-- Start of StatCounter Code for DoYourOwnSite --> # text would go here #<!-- End of StatCounter Code for DoYourOwnSite -->> #b="<!-- Start of StatCounter Code for DoYourOwnSite -->" #b2="<!-- End of StatCounter Code for DoYourOwnSite -->" #p1="PATTERN-1" #p2="PATTERN-2" p1="<!-- Start of StatCounter Code for DoYourOwnSite -->" p2="<!-- End of StatCounter Code for DoYourOwnSite -->" fname="*.html" num_of_files_pattern1=ls #grep $p1 fname echo "fname(s) to apply the sed to:" echo $fname echo "num_of_files_pattern1 is:" echo $num_of_files_pattern1 echo "Pattern1 is equal to:" echo $p1 echo "Pattern2 is equal to:" echo $p2 #this is current dir where the script is DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )" echo "DIR is equal to:" echo $DIR #cd to the dir where I want to copy the files to: cd "$DIR" # this will find the pattern <\head> in all the .html files and place "This should appear before the closing head tag" this before it # it will also make a backup with .bak extension #sed -i.bak '/<\\head>/i\This should appear before the closing head tag' *.html echo "sed on the file" # this does the head part #sed '/PATTERN-1/,/PATTERN-2/d' *.txt # this works #sed "/$p1/,/$p2/d" *.txt # this works #sed "/$p1/,/$p2/d" $fname # this works sed -i.bak "/$p1/,/$p2/d" $fname # this works 


EDIT2

This is what i ended up with, but there is a more robust answer below:

# ------------------------------------------------------------------ # [author] find2PatternsAndDeleteTextInBetween.sh # Description # Here I want to find 2 patterns and delete what's in between # this example works # # EXAMPLE: # this is the 2 patterns I want to find Start and End # <!-- Start of StatCounter Code for DoYourOwnSite --> # text would go here # <!-- End of StatCounter Code for DoYourOwnSite -->> # # ------------------------------------------------------------------ p1="<!--START: Google Analytics --->" p2="<!--END: Google Analytics --->" fname=".html" echo "fname(s) to apply the sed to:" echo *"$fname" echo -e "\n" echo "Pattern1 is equal to:" echo -e "$p1\n" echo "Pattern2 is equal to:" echo -e "$p2\n" echo -e "PWD is: $PWD\n" echo "sed on the file" #sed '/PATTERN-1/,/PATTERN-2/d' *.txt # this works #sed "/$p1/,/$p2/d" *.txt # this works #sed "/$p1/,/$p2/d" $fname # this works sed -i.bak "/$p1/,/$p2/d" *"$fname" # this works 

3 Answers 3

2

sed is for this task

$ sed -i'.bak' '/<!--START/,/<!--END/d' file 

if you have other lines with similar tags add more of the pattern.

For multiple files, for example file1,..,file4

$ for f in file{1..4}; do sed -i'.bak' '/<!--START/,/<!--END/d' "$f"; done 
Sign up to request clarification or add additional context in comments.

Comments

2

Something to consider:

$ awk '/<!--(START|END): Google Analytics --->/{f=!f;next} !f' file ... </script> </body> </html> ... 

Comments

1

Judging by the script in your question it sounds like you already know how to use sed to remove the range of interest from a single file (sed -i.bak "/$p1/,/$p2/d" $fname), but are looking for a robust way to process multiple files in a script (assumes bash):

#!/usr/bin/env bash # cd to the dir. in which this script is located. # CAVEAT: Assumes that the script wasn't invoked through a *symlink* # located in a different dir. cd -- "$(dirname -- "$BASH_SOURCE")" || exit fpattern='*.html' # specify source-file globbing pattern shopt -s failglob # make sure that globbing expands to nothing if nothing matches fnames=( $fpattern ) # expand to matching files and store in array num_of_files_matching_pattern=${#fnames[@]} # count matching files (( num_of_files_matching_pattern > 0 )) || exit # abort, if no files match printf '%s\n%s\n' "Running from:" "$PWD" printf '%s\n%s\n' "Pattern matching the files to process:" "$fpattern" printf '%s\n%s\n' "# of matching files:" "$num_of_files_matching_pattern" # Determine the range-endpoint-identifier-line regular expressions. # CAVEAT: Make sure you escape any regular-expression metacharacters you want # to be treated as *literals*. p1='^<!--START: Google Analytics --->$' p2='^<!--END: Google Analytics --->$' # Remove the range identified by its endpoints from all matching input files # and save the original files with extension '.bak' sed -i'.bak' "/$p1/,/$p2/d" "${fnames[@]}" || exit 

As an aside: I suggest not using suffix .sh in your script filename:

  • The shebang line inside the file is sufficient to tell the system what shell/interpreter to pass the script to.

  • Not specifying as suffix leaves you free to change the implementation later (e.g., to Python), without breaking existing programs that rely on your scripts.

  • In the case at hand, assuming that use of bash is actually acceptable, .sh would be misleading, because its suggests a sh-features-only script.


Determining the running script's true directory, even when the script is invoked via a symlink located in a different directory:

  • If you can assume a Linux platform (or at least GNU readlink), use:

    dirname -- "$(readlink -e -- "$BASH_SOURCE")" 
  • Otherwise, a more elaborate solution with a helper function is required - see this answer of mine.

2 Comments

tks, I like the robustnes compared with mine in edit2 above. many takeaways symlink, escape any regular-expression metacharacters you want to be treated as literals, not using suffix .sh, ++
could you expand on the symlink as it might relate to what i want, as I currently have to put my script in the dir with all the files and then do a ./script.sh. Probably a whole other question but I would like to be able to run it from anywhere

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.