0

I'm trying to parse an sql where I would like to get the where clause of the statement.

Below is the piece of code I have written:

string input = "select * from table where x = 5 and abc = 'p' or def = 1 order by col"; Match match = Regex.Match(input, @"select.*from [a-z]+ where(.*)(?:order by .*)?",RegexOptions.IgnoreCase); 

But here the output I get includes the order by statement which I dont want. I get the expected output if I removed last '?', but the input statement might or might not contain order by.

Expected Output: " x = 5 and abc = 'p' or def = 1 "

can you please correct my regex

4
  • Is the SQL statement restricted or can it reach any level of complexity? Commented Apr 13, 2015 at 8:43
  • It is restricted to only where and order by clauses in this case Commented Apr 13, 2015 at 8:44
  • 3
    then don't use regex at all. use a simple combination of Substring and IndexOf. it will save you a lot of time and trouble. Commented Apr 13, 2015 at 8:46
  • 1
    you can also try an sql parser instead. here's a thread about that Commented Apr 13, 2015 at 9:16

4 Answers 4

2

Add a first group with order clause then another without.

string input = "select * from table where x = 5 and abc = 'p' or def = 1 order by col"; Match match = Regex.Match(input, @"select.*from [a-z]+ where(?:(.*)(?:order by .*)|(.*))",RegexOptions.IgnoreCase); 

Regex is not a good SQL parser and it will fail in many cases. For instance :

select * from table where x = 'order by col'

order by col' will be missing from the match.

Sign up to request clarification or add additional context in comments.

Comments

1

Using a regular expression to parse SQL is a recipe for a king size headache. try this:

string input = "select * from table where x = 5 and abc = 'p' or def = 1 order by col"; string output = input.Substring(input.IndexOf(" where ", StringComparison.OrdinalIgnoreCase)+7, input.IndexOf(" order by "), StringComparison.OrdinalIgnoreCase). 

Note: you will need to confirm that your sql actually contains both where and order by clauses, but it's fairly simple to do that

Note #2: it might be +6 and not +7, I didn't test the code.

Edit

It's worth mentioning that my suggested solution also suffers from the same drawbacks that Guillaume pointed out in his answer, i.e if the where clause will contain something like x = ' order by blabla' my suggestion will fail as well. However, it's fairly simple to avoid this, simply change input.IndexOf(" order by ", StringComparison.OrdinalIgnoreCase) to input.LastIndexOf(" order by ", StringComparison.OrdinalIgnoreCase). This way you can be sure to get the actual order by clause of your sql statement.

3 Comments

nope, c#. they do look alike and I had some Capitalization mistakes but I've fixed them now.
You should use IndexOf("str", StringComparison.OrdinalIgnoreCase) instead of ToLower msdn.microsoft.com/library/ms224425.aspx
@Guillaume You are correct, thanks. I've edited my answer once again according to your suggestion.
0

You can change the code to

string input = "select * from table where x = 5 and abc = 'p' or def = 1 order by col"; Match match = Regex.Match(input, @"select.*from [a-z]+ where(.*?)(?=\s+and|$)", RegexOptions.IgnoreCase); 

and it will only capture the where clause up to the next and or the end of the query.

2 Comments

What if there isn't an and? Shouldn't you capture the and anyway? Also, you already have RegexOptions.IgnoreCase so (?i) is not needed here, I'm not sure why you've added it.
I modified the question to have expected output and your output is only one condition of where clause not all
0

Try this pattern:

(?<where>(?<=where ).*)(?: order by) 

Find the group "where" in the match return. Be sure to match other key words like having or group by.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.