0

I have 200 text files with the following structure:

n01443537_0.JPEG 0 10 63 58 ... n01443537_499.JPEG 0 3 39 42 

In every file the first part (before the underscore i.e. n01443537) is different. However, they all have the same structure i.e start with n, followed by eight digits.

I would like to have all 200 files formatted as follows:

n01443537/n01443537_0.JPEG n01443537 ... n01443537/n01443537_499.JPEG n01443537 

I found that this regex n[^_]* captures the required pattern, but having a little trouble putting it all together.

2
  • What are you doing with the rest of the info on each line? (e.g. ' 0 10 63 58')? Commented Mar 20, 2016 at 20:16
  • I just delete that data Commented Mar 20, 2016 at 20:28

2 Answers 2

1

Supposing your files are in the current directory, you can use sed from the command line, something like this:

sed --in-place 's|\(^n[0-9]*\)\(_[0-9]*\.[a-zA-Z]*\)\(.*\)|\1/\1\2 \1|' * 

This |\1/\1\2 \1| is your target, the first \1 matches the first part (e.g. n01443537) then comes a / then again the \1 then \2 (e.g. _499.JPEG) then a space and finally \1

the \[number] refers to each group closed between parenthesis here |\(^n[0-9]*\)\(_[0-9]*\.[a-zA-Z]*\)\(.*\)|

Sign up to request clarification or add additional context in comments.

Comments

1

Note: Not proficient in awk or bash.

The regex suitable for this case will be as follows.

Regex: ((n\d{8})_\d+\.JPEG).*

Replacement to do: \2/\1 \2

Regex101 Demo

2 Comments

Thanks man, I just learnt that I can use nested groups, so the expression in my answer can be rewritten as: sed 's|\(\(n[0-9]*\)_[0-9]*\.[a-zA-Z]*\)\(.*\)|\2/\1 \2|' *.
Take care white nesting. Group are numbered from outside to inside. Might confuse in cases like ( ( ( ) ( ) ) ( ) ). Experiment and find out ;-)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.