5

I'm using the Microsoft regular expression engine in Excel VBA. I'm very new to regex but I have a pattern working right now. I need to expand it and I'm having trouble. Here is my code so far:

Sub ImportFromDTD() Dim sDTDFile As Variant Dim ffile As Long Dim sLines() As String Dim i As Long Dim Reg1 As RegExp Dim M1 As MatchCollection Dim M As Match Dim myRange As Range Set Reg1 = New RegExp ffile = FreeFile sDTDFile = Application.GetOpenFilename("DTD Files,*.XML", , _ "Browse for file to be imported") If sDTDFile = False Then Exit Sub '(user cancelled import file browser) Open sDTDFile For Input Access Read As #ffile Lines = Split(Input$(LOF(ffile), #ffile), vbNewLine) Close #ffile Cells(1, 2) = "From DTD" J = 2 For i = 0 To UBound(Lines) 'Debug.Print "Line"; i; "="; Lines(i) With Reg1 '.Pattern = "(\<\!ELEMENT\s)(\w*)(\s*\(\#\w*\)\s*\>)" .Pattern = "(\<\!ELEMENT\s)(\w*)(\s*\(\#\w*\)\s*\>)" .Global = True .MultiLine = True .IgnoreCase = False End With If Reg1.Test(Lines(i)) Then Set M1 = Reg1.Execute(Lines(i)) For Each M In M1 sExtract = M.SubMatches(1) sExtract = Replace(sExtract, Chr(13), "") Cells(J, 2) = sExtract J = J + 1 'Debug.Print sExtract Next M End If Next i Set Reg1 = Nothing End Sub 

Currently, I'm matching on a set of data like this:

 <!ELEMENT DealNumber (#PCDATA) > 

and extract Dealnumber but now, I need to add another match on data like this:

<!ELEMENT DealParties (DealParty+) > 

and extract just Dealparty without the Parens and the +

I've been using this as a reference and it's awesome but I'm still a bit confused. How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops

EDIT

I have come across a few new scenarios that have to be matched on.

 Extract Deal <!ELEMENT Deal (DealNumber,DealType,DealParties) > Extract DealParty the ?,CR are throwing me off <!ELEMENT DealParty (PartyType,CustomerID,CustomerName,CentralCustomerID?, LiabilityPercent,AgentInd,FacilityNo?,PartyReferenceNo?, PartyAddlReferenceNo?,PartyEffectiveDate?,FeeRate?,ChargeType?) > Extract Deals <!ELEMENT Deals (Deal*) > 

2 Answers 2

3

Looking at your pattern, you have too many capture groups. You only want to capture the PCDATA and DealParty. Try changing you pattern to this:

 With Reg1 .Pattern = "\<!ELEMENT\s+\w+\s+\(\W*(\w+)\W*\)" .Global = True .MultiLine = True .IgnoreCase = False End With 

Here's the stub: Regex101.

Sign up to request clarification or add additional context in comments.

4 Comments

It didn't work. When I run it, It stops on sExtract = M.SubMatches(1) and if I hold the cursor over it, I get <invalid procedure call or argument> and .pattern is <object variable or with block variable not set> the only thing I changed is the pattern.
I just realized you thought I was tring to extract PCDATA which is not the case. I've updated my question.
Your edited question is still confusing to me. You have lines like <!ELEMENT x (y)>. What do you want to get? x, y or both?
I want to get x if there isn't a + if there is, I want to get y without the +
1

You could use this Regex pattern;

 .Pattern = "\<\!ELEMENT\s+(\w+)\s+\((#\w+|(\w+)\+)\)\s+\>" 
  1. This portion

(#\w+|(\w+)\+)

says match either

#a-z0-9
a-z0-9+

inside the parentheses.

ie match either

(#PCDATA)
(DealParty+)

to validate the entire string

  1. Then the submatches are used to extract DealNumber for the first valid match, DealParty for the other valid match

edited code below - note submatch is now M.submatches(0)

 Sub ImportFromDTD() Dim sDTDFile As Variant Dim ffile As Long Dim sLines() As String Dim i As Long Dim Reg1 As RegExp Dim M1 As MatchCollection Dim M As Match Dim myRange As Range Set Reg1 = New RegExp J = 1 strIn = "<!ELEMENT Deal12Number (#PCDATA) > <!ELEMENT DealParties (DealParty+) >" With Reg1 .Pattern = "\<\!ELEMENT\s+(\w+)\s+\((#\w+|(\w+)\+)\)\s+\>" .Global = True .MultiLine = True .IgnoreCase = False End With If Reg1.Test(strIn) Then Set M1 = Reg1.Execute(strIn) For Each M In M1 sExtract = M.SubMatches(2) If Len(sExtract) = 0 Then sExtract = M.SubMatches(0) sExtract = Replace(sExtract, Chr(13), "") Cells(J, 2) = sExtract J = J + 1 Next M End If Set Reg1 = Nothing End Sub 

3 Comments

Thank for your post, it was the exact answer for my question. Since then, I've come across a few more matches that I need. One of them being a multiline one and I'm having trouble getting the pattern to match. I've been on the regex101 site all day working on it. I've edited my original post to include them. I'm thinking I might not be able to do them all in one pattern.
I'm just going to ask a new question for the additional matches. Thanks for your help!
Hi Brett. Can you take a look at my new post for me? It got buried very fast and was downgraded for what I feel were very ludicrous reasons. Second try

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.