2

I've been recently learning about regular expressions. I'm trying to gather FDF objects into individual strings, which I can then parse. The problem I'm having is that my code only matches the first occurrence and all other "objects" in the FDF file are ignored.

Objects begin on their own line with 2 numbers and the string "obj," and then a carriage return (not a line feed). They end after a carriage return and the string "endobj".

//testing parsing into objects... List<String> FDFobjects = new List<String>(); String strRegex = @"^(?<obj>\d+ \d+) obj\r(?<objData>.+?)\rendobj(?=\r)"; Regex useRegex = new Regex(strRegex, RegexOptions.Multiline | RegexOptions.Singleline); StreamReader reader = new StreamReader(FileName); String fdfString = reader.ReadToEnd(); reader.Close(); foreach (Match useMatch in useRegex.Matches(fdfString)) FDFobjects.Add(useMatch.Groups["objData"].Value); if (FDFobjects.Count > 0) Console.WriteLine(FDFobjects[0]); Console.WriteLine(FDFobjects.Count); 

(I was using $ at the end of the regex string, but that matches 0 times, whereas using (?=\r) matches once.)

Edit: Some line returns are CR/LF, and some are just CR. I don't know if it's always consistent for the different parts of the file, so I just check for all of them. I've settled on the following, which seems to work perfectly so far (and I'm not using the Multiline option). Adding the look behind is what made the biggest difference here....

... = new Regex(@"(?<=^|[^\\](\r\n|\r|\n))(?<objName>\d+ \d+) obj(\r\n|\r|\n)(?<objData>.*?)(?<!\\)(\r\n|\r|\n)endobj(?=\r\n|\r|\n|$)", RegexOptions.Singleline); 
12
  • Try @"^(?<obj>\d+ \d+) obj\r?\n(?<objData>.+?)\r?\nendobj(?=\r?\n)". Maybe changing \r to a more flexible \r?\n can help. Without an exact sample string, it is not easy to help you with this pattern. Commented Sep 21, 2016 at 20:04
  • @Wiktor: Thanks. It doesn't work. The FDF is using carriage return only, it appears. Commented Sep 21, 2016 at 20:08
  • 1
    Then provide the exact input string with exact expected output. Commented Sep 21, 2016 at 20:08
  • I cannot convince myself that using a regex to parse FDF data is going to be 100% reliable. What if the data contains the string "endobj" at the end of a line? Commented Sep 21, 2016 at 20:13
  • @Andrew: That's why I check that the "endobj" string is on it's own line. It's preceded by a \r. Commented Sep 21, 2016 at 20:15

2 Answers 2

0

The ^ in your pattern is only going to match at the start of the string. Try \b instead.

Sign up to request clarification or add additional context in comments.

3 Comments

The first object is not at the start of the string and it matches. The RegexOptions.Multiline option is supposed to change the matching of ^ and $.
Good point... I've never tried mixing Singleline and Multiline before - do you really need both?
I hear you. The unfortunately named "Singleline" and "Multiline" options are unrelated. "Singleline" has to do with whether the dot matches new lines or not.
0

It seems that MSDN Regex Web help is lying about what ^ matches:

^  -   Matches the position at the start of the searched string. If the m (multiline search) character is included with the flags, ^ also matches the position following \n or \r.

It only matches the position after \n, see the following demo with the @"(?m)^\d+" pattern matching 1, 2, 4 in the "1\r\n2\r3\n4" input (3 is preceded with \r).

Use (?<=\r|^) at the beginning and (?=\r|$) at the end:

var s = "1 2 obj\rObj1\rendobj\r2 3 obj\rObj2\rendobj\r3 45 obj\rObj3\rendobj"; var matches = Regex.Matches(s, @"(?<=\r|^)(?<obj>\d+ \d+) obj\r(?<objData>.+?)\rendobj(?=\r|$)", RegexOptions.Multiline | RegexOptions.Singleline); foreach (Match m in matches) { Console.WriteLine("___ MATCH ___"); Console.WriteLine(m.Value); } 

Outputs all 3 matches:

___ MATCH ___ 1 2 obj Obj1 endobj ___ MATCH ___ 2 3 obj Obj2 endobj ___ MATCH ___ 3 45 obj Obj3 endobj 

See the C# demo online.

2 Comments

Thanks for the input. It seems like the Multiline option was not doing what it should, but I don't remember the details. (I've slept since then.)
That option is rather tricky. I thought I had known it well, but your question was an eye opener :)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.