0

I have a file with multiple "paragraphs" like this:

<type>TRANS</type> <attributes/> <specification_method>rep_name</specification_method> <trans_object_id/> <filename/> <transname>\File Name</transname> 

I need to replace EACH occurrence of this paragraph with:

<type>TRANS</type> <attributes/> <specification_method>rep_name</specification_method> <trans_object_id/> <filename>FILE_PATH/File Name</filename> <transname/> 

I have tried MANY times to replace each occurrence:

  • breaking the stanzas apart with awk (working)
  • storing the original as $ORIGTEXT (working)
  • replacing the filename/transname lines (working)
  • storing the NEW multiline as $NEWTEXT (working, replacment of single lines)
  • sed original file to replace every paragraph with new (NOT working)

doing a sed -i "s|$ORIGTEXT|$NEWTEXT|g" but this does not work. Im guessing somtehing to do with all the special characters in the file along with newline chars in the NEW/ORIG vars.. can somebody help??

4
  • 5
    It looks like you're trying to use Regex tools to parse XML. That way lies madness and despair. Try a DOM parser like xmlstarlet perhaps? Commented Jun 29, 2024 at 4:59
  • 1
    Not sure how big your file is, but this is a very simple task for vim editor. Use right tool for the right job. Commented Jun 29, 2024 at 7:06
  • sed is fundamentally a line editor. It has options for holding part of a pattern space for later use, but that syntax is somewhat inflexible, and quickly gets out of hand. If you have coaxed awk into breaking up the stanzas and repairing the lines, it should be trivial to update the file in awk at the same time. Maybe post the awk you have so far, and we can provide the final steps. Commented Jun 29, 2024 at 8:04
  • 1
    Do you have parts of your input file where you do NOT want to change <transname>\File Name</transname> into <filename>FILE_PATH/File Name</filename> (e.g. in a different type of block or a different FILE Name or a block with a <type> that isn't TRANS, etc.? If so, please edit your question to include those cases in your sample input/output. Replacing the desired text is always much easier than not replacing similar text you do NOT want to be replaced so it's important to include the latter in examples. Commented Jun 29, 2024 at 11:23

3 Answers 3

1

After adding enclosing tag r to your XML:

xmlstarlet ed -u '//filename' -v 'FILE_PATH/File Name' \ -u '//transname' -v '' file.xml 
<?xml version="1.0"?> <r> <type>TRANS</type> <attributes/> <specification_method>rep_name</specification_method> <trans_object_id/> <filename>FILE_PATH/File Name</filename> <transname/> </r> 
5
  • 1
    Fixed with updated command Commented Jun 29, 2024 at 11:55
  • I think it'd still modify any block that contained a transname or filename tag, not just the target block which has <type>TRANS</type>, <attributes/>, <specification_method>rep_name</specification_method>, <trans_object_id/>, <filename/>, and <transname>\File Name</transname>, which may be undesirable (the OP hasn't told us yet) Commented Jun 29, 2024 at 11:58
  • The command can handle this case easily if needed by OP. Commented Jun 29, 2024 at 12:00
  • Even if this OP doesn't need it, this would be a much more useful answer in general if you could show how to do it (I'd definitely be interested at least as I come across this situation occasionally and can never get it right!). Since it can be done easily (and I don't doubt you on that, I just personally couldn't do it and I'd like to learn how), would you mind updating your answer to add a script showing how to test for just the tags+values in the specific block the OP wants to target? Commented Jun 29, 2024 at 12:16
  • No, because this is not required by OP. Feel free to ask a new question with XML/XPath tag Commented Jun 29, 2024 at 13:28
1

Assuming you want to ONLY change blocks of text that match EXACTLY the block shown in your question and that your input always looks exactly as shown in your question then using any POSIX awk you could do:

$ cat tst.sh #!/usr/bin/env bash IFS= read -r -d '' old <<-' !' <type>TRANS</type> <attributes/> <specification_method>rep_name</specification_method> <trans_object_id/> <filename/> <transname>\File Name</transname> ! IFS= read -r -d '' new <<-' !' <type>TRANS</type> <attributes/> <specification_method>rep_name</specification_method> <trans_object_id/> <filename>FILE_PATH/File Name</filename> <transname/> ! old="$old" new="$new" \ awk ' { rec = (NR>1 ? rec ORS : "") $0 } END { old = ENVIRON["old"] new = ENVIRON["new"] # Escape any possible regexp metachars in "old" and # any possible backreference metachars in "new": gsub( /[^^\\]/ , "[&]", old) gsub( /\^/ , "\\^", old) gsub( /\\/ , "\\\\", old) gsub( /&/ , "\\&", new) # Replace every "old" with "new": gsub( old, new, rec ) print rec } ' "${@:--}" 

For example, given the following input where the 3rd block is the OPs target from the question and there are other similar-looking blocks that should not be changed as they aren't exactly the OPs target block:

$ cat file <type>CONST</type> <attributes/> <specification_method>rep_name</specification_method> <trans_object_id/> <filename/> <transname>\File Name</transname> <type>TRANS</type> <attributes/> <specification_method>rep_name</specification_method> <trans_object_id/> <filename/> <transname>Foo Bar</transname> <type>TRANS</type> <attributes/> <specification_method>rep_name</specification_method> <trans_object_id/> <filename/> <transname>\File Name</transname> <type>TRANS</type> <attributes/> <specification_method>other rep_name</specification_method> <trans_object_id/> <filename/> <transname>\File Name</transname> <type>TRANS</type> <attributes/> <specification_method>rep_name</specification_method> <trans_object_id/> <filename>final<filename/> <transname>\File Name</transname> 

$ ./tst.sh file <type>CONST</type> <attributes/> <specification_method>rep_name</specification_method> <trans_object_id/> <filename/> <transname>\File Name</transname> <type>TRANS</type> <attributes/> <specification_method>rep_name</specification_method> <trans_object_id/> <filename/> <transname>Foo Bar</transname> <type>TRANS</type> <attributes/> <specification_method>rep_name</specification_method> <trans_object_id/> <filename>FILE_PATH/File Name</filename> <transname/> <type>TRANS</type> <attributes/> <specification_method>other rep_name</specification_method> <trans_object_id/> <filename/> <transname>\File Name</transname> <type>TRANS</type> <attributes/> <specification_method>rep_name</specification_method> <trans_object_id/> <filename>final<filename/> <transname>\File Name</transname> 

Note that ONLY the 3rd block, the target one, has changed and only the desired lines within that block have changed. That's easier to see using diff between the input file and the command output:

$ diff file <(./tst.sh file) 17,18c17,18 < <filename/> < <transname>\File Name</transname> --- > <filename>FILE_PATH/File Name</filename> > <transname/> 

$ diff -y file <(./tst.sh file) <type>CONST</type> <type>CONST</type> <attributes/> <attributes/> <specification_method>rep_name</specification_method> <specification_method>rep_name</specification_method> <trans_object_id/> <trans_object_id/> <filename/> <filename/> <transname>\File Name</transname> <transname>\File Name</transname> <type>TRANS</type> <type>TRANS</type> <attributes/> <attributes/> <specification_method>rep_name</specification_method> <specification_method>rep_name</specification_method> <trans_object_id/> <trans_object_id/> <filename/> <filename/> <transname>Foo Bar</transname> <transname>Foo Bar</transname> <type>TRANS</type> <type>TRANS</type> <attributes/> <attributes/> <specification_method>rep_name</specification_method> <specification_method>rep_name</specification_method> <trans_object_id/> <trans_object_id/> <filename/> | <filename>FILE_PATH/File Name</filename> <transname>\File Name</transname> | <transname/> <type>TRANS</type> <type>TRANS</type> <attributes/> <attributes/> <specification_method>other rep_name</specification_method> <specification_method>other rep_name</specification_method> <trans_object_id/> <trans_object_id/> <filename/> <filename/> <transname>\File Name</transname> <transname>\File Name</transname> <type>TRANS</type> <type>TRANS</type> <attributes/> <attributes/> <specification_method>rep_name</specification_method> <specification_method>rep_name</specification_method> <trans_object_id/> <trans_object_id/> <filename>final<filename/> <filename>final<filename/> <transname>\File Name</transname> <transname>\File Name</transname> 

Regarding the here-documents, the space before the ! in the delimiters is 4 blanks and the space at the start of each line inside each here-doc is a tab.

See:

With GNU awk you can replace the contents of the input file by using awk -i inplace '...' or, as with any tool, you can just create/use your own temp file:

tmp=$(mktemp) && trap 'rm -f "$tmp"; exit' EXIT && awk '...' "$1" > "$tmp" && mv -- "$tmp" "$1" 
2
  • 1
    awk is not XML aware. Even for CSV awk is limited. Definitely not the proper way Commented Jun 29, 2024 at 11:57
  • @GillesQuénot GNU awk has an XML extension (gnu.org/software/gawk/manual/html_node/gawkextlib.html), and GNU and BWK awk have a CSV option (--csv) so it's not accurate to say awk isn't XML/CSV aware, I'm just not using it. While in theory you should use an XML parser, in practice it can be a PITA to do so, the actual file format the OP uses is often a small subset of XML that CAN be parsed with a text processing tool (or isn't even valid XML), and often users don't have+can't install tools that have XML parsers so sometimes in real systems using awk or similar is the proper way. Commented Jun 29, 2024 at 12:06
0

Using the TXR language, we have the following program in the file data.txr:

@(repeat) @ (cases) <type>TRANS</type> <attributes/> <specification_method>rep_name</specification_method> <trans_object_id/> <filename/> <transname>\@fname</transname> @ (output) <type>TRANS</type> <attributes/> <specification_method>rep_name</specification_method> <trans_object_id/> <filename>FILE_PATH/@fname</filename> <transname/> @ (end) @ (or) @line @ (do (put-line line)) @ (end) @(end) 

applied to the following data file:

$ cat data AAAA <type>TRANS</type> <attributes/> <specification_method>rep_name</specification_method> <trans_object_id/> <filename/> <transname>\File Name AAA</transname> BBBB <type>TRANS</type> <attributes/> <specification_method>rep_name</specification_method> <trans_object_id/> <filename/> <transname>\File Name BBB</transname> CCCC 

Run:

$ txr data.txr data AAAA <type>TRANS</type> <attributes/> <specification_method>rep_name</specification_method> <trans_object_id/> <filename>FILE_PATH/File Name AAA</filename> <transname/> BBBB <type>TRANS</type> <attributes/> <specification_method>rep_name</specification_method> <trans_object_id/> <filename>FILE_PATH/File Name BBB</filename> <transname/> CCCC 

Editred in Vim with the txr.vim syntax file, it looks like this:

Syntax-colored TXR code

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.