4

I am working on a project that converts a few dozen html files into text files, and have composed the replace-regexp formulae that do the job. The question is, how to apply all six of them consecutively, and then to do so to each of the dozens of files in the directory? I've appended my org explanation that includes the regexp, but keep in mind that those aren't the problem; they do their job (after translating the ^J, etc). The question is just how to programatically apply all six of them to each (HTML) file in the directory?

* 1. Delete all until >General Conference< \(.*^J\)*.*?General Conference * 2. Delete all <p class="copyright"> and after ^.*<p class="copy\(.*^J\)* * 3. Strip all tags \(<.*?>\)* * 4. Remove whitespace lines ^\s-*^J * 5. Remove ugly numeric identifier ^\s-*[0-9].*^J * 6. Remove amp &amp; -> & 
0

3 Answers 3

4
  1. Open the directory with Dired: C-xC-ddirectoryRET
  2. Mark the files you want to change, either by pressing m (dired-mark) to mark each one individually, or some other mechanism in the Mark menu in the menu bar, like *.htmlRET (dired-mark-extension) to mark all files with an html extension.
  3. QregexRETRET (dired-do-query-replace-regexp) to replace any examples of regex with nothing. You can use Ωmega's regex for this.
  4. You can then either replace individual examples with SPC or all examples without asking further questions with !.
Sign up to request clarification or add additional context in comments.

1 Comment

Precisely what I needed! The only downside was that even with "!" it still asks when you get to the next file. So, my one improvement after trying your answer is to use iBuffer instead of Dired, which allows you to ibuffer-do-replace-regexp and hence bypass the querying (of course, you must be very confident in your regexp). That did it! Thanks!
1

It wouldn't be hard to do this pragmatically. But the idiomatic Emacs solution is to record 2 keyboard macros.

  1. Perform each of your regexp replacements with replace-regexp in a single buffer.

  2. In a dired buffer,

    1. move to the next html (with C-s)
    2. open it in other window
    3. run (1) in other window and switch back to the dired buffer.

You would then run (2) with an absurd number C-u1000 or something.

4 Comments

Great idea. Unfortunately, the regexps are complex enough that I need to yank them or use referential history commands, which screws up the macro since the history/pool are different each time I come around. I was also getting an error about the length of the macro. Perhaps I need to use a lisp script so I can save commands instead of keystrokes? Or am I missing something about making macros?
Re: "You would then run (2) with an absurd number", Note that an argument of zero means repeat-until-failure.
@WorldsEndless this is not an overly complicated macro scenario, you definitely don't need elisp. Have a look at registers.
A way around the history reference problem would be save the regular expressions into specific registers where they can be reliably yanked out by a known command that won't change each time.
0

Seems just a step for you writing a function and applying it onto a files list.

Here's a draft starting it:

(defun my-replacements () (interactive "*") (save-restriction (widen) (save-excursion (goto-char (point-min)) (while (re-search-forward "FIRST-REGEXP" nil t 1) (replace-match "FIRST-REPLACEMENT")) 

Repeat the last 3 lines until all the forms are covered.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.