175

I have a text file that denotes remarks with a single '.

Some lines have two quotes but I need to get everything from the first instance of a ' and the line feed.

I AL01 ' A-LINE '091398 GDK 33394178 402922 0831850 ' '091398 GDK 33394179 I AL02 ' A-LINE '091398 GDK 33394180 400722 0833118 ' '091398 GDK 33394181 I A10A ' A-LINE 102 ' 53198 DJ 33394182 395335 0832203 ' ' 53198 DJ 33394183 I A10B ' A-LINE 102 ' 53198 DJ 3339418 

7 Answers 7

246
'.* 

I believe you need the option, Multiline.

Sign up to request clarification or add additional context in comments.

2 Comments

This will capture first instance of character ' and end of last line
With this you go to the end of the file or the text, not to the end of the line.
133

The appropriate regex would be the ' char followed by any number of any chars [including zero chars] ending with an end of string/line token:

'.*$ 

And if you wanted to capture everything after the ' char but not include it in the output, you would use:

(?<=').*$ 

This basically says give me all characters that follow the ' char until the end of the line.

Edit: It has been noted that $ is implicit when using .* and therefore not strictly required, therefore the pattern:

'.* 

is technically correct, however it is clearer to be specific and avoid confusion for later code maintenance, hence my use of the $. It is my belief that it is always better to declare explicit behaviour than rely on implicit behaviour in situations where clarity could be questioned.

6 Comments

The $ is unnecessary. The dot will stop at the end of the line under normal circumstances.
unnecessary - but proper for what he wants to do. It serves as a reminder later that it is expecting everything from ' to the end of the line
@balabaster: I did not say that it was wrong. ;-) It was just a footnote.
@Tomalak: Wasn't trying to imply you were wrong by any means, was just clarifying my reasoning for my choice of using $ rather than not. Thank you for pointing it out.
+1 for including how to include everything after the character in question, instead of always including it.
|
35
'.*$ 

Starting with a single quote ('), match any character (.) zero or more times (*) until the end of the line ($).

1 Comment

This answer is a great example of how to break down the logic behind what a command, nice and clear!
20

When I tried '.* in windows (Notepad ++) it would match everything after first ' until end of last line.

To capture everything until end of that line I typed the following:

'.*?\n 

This would only capture everything from ' until end of that line.

Comments

13

In your example I'd go for the following pattern:

'([^\n]+)$ 

use multiline and global options to match all occurences.

To include the linefeed in the match you could use:

'[^\n]+\n 

But this might miss the last line if it has no linefeed.

For a single line, if you don't need to match the linefeed I'd prefer to use:

'[^$]+$ 

1 Comment

Had trouble with this suggestion with golang's regex. '[^\n]+ was needed instead of '[^\n]+$. See play.golang.org/p/EemihqdIMSl
5

This will capture everything up to the ' in backreference 1 - and everything after the ' in backreference 2. You may need to escape the apostrophes though depending on language (\')

/^([^']*)'?(.*)$/ 

Quick modification: if the line doesn't have an ' - backreference 1 should still catch the whole line.

^ - start of string ([^']*) - capture any number of not ' characters '? - match the ' 0 or 1 time (.*) - capture any number of characters $ - end of string 

Comments

0

https://regex101.com/r/Jjc2xR/1

/(\w*\(Hex\): w*)(.*?)(?= |$)/gm 

I'm sure this one works, it will capture de hexa serial in the badly structured text multilined bellow

 Space Reservation: disabled Serial Number: wCVt1]IlvQWv Serial Number (Hex): 77435674315d496c76515776 Comment: new comment 

I'm a eternal newbie in regex but I'll try explain this one

(\w*(Hex): w*) : Find text in line where string contains "Hex: "

(.*?) This is the second captured text and means everything after

(?= |$) create a limit that is the space between = and the |

So with the second group, you will have the value

1 Comment

That's not the question, is it ?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.