1

I am quite stuck with a regex I can't get to work. It should capture everything except digits and the word fiktiv (not single characters of it!). Objective is to get rid of this content.

I have tried something like (?!\d|fiktiv).* on my sample string 123456788daswqrt fiktiv

https://regex101.com/r/kU8mF3/1

However this does match the fiktiv at the end as well.

8
  • 2
    What language are you using. In most programming languages to get rid of the content you'd match it and then replace it with empty string. So for example on the command line (assuming unix): awk '{gsub(/fiktiv/,"");gsub(/[0-9]/,"";print $0}' Commented Aug 16, 2016 at 8:27
  • I guess you just need .replace(/(fiktiv)|\D/g, "$1") (in JS). What is the regex flavor and what is the expected output? Commented Aug 16, 2016 at 8:28
  • 1
    that would be \d Commented Aug 16, 2016 at 8:29
  • I am using SQL Server which under the cover uses a .NET assembly Commented Aug 16, 2016 at 8:32
  • 1
    So, can you use Regex.Replace(input, "(fiktiv)|[^0-9]", "$1")? Or are you limited to TSQL toolset? See regex101.com/r/vR4uU0/1 Commented Aug 16, 2016 at 8:34

2 Answers 2

2

One possibility would be to use a neglected character class, which can be used by putting a ^ in [] braces. So you basically say don't match digits, and as many non digits as you can get until a space occurs and the word fiktiv appears.

This capturing will be "saved" in the capturing group 1 for later use.

([^\d]+)\s+fiktiv 

Testing could be done here:

https://regex101.com/

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for your help...but I am still struggling to prevent the word fiktiv getting matched
@MartinGuth: You should not try to avoid matching it, as it will make the regex inefficient and slow, see my approach: regex101.com/r/vR4uU0/1
The concept behind regex to strip out the unneeded characters by having output in so called capturing groups. Even if you explicitly state fiktiv in the regex and it gets matched, the capturing group number 1 will only contain what the () braces are wrapped around, so : [^\d]+ in this example. You need to find out how the output can access the capturing groups.
0

It should capture everything except digits and the word fiktiv (not single characters of it!). Objective is to get rid of this content.

So, you want to remove any character that is not a digit (that is, \D or [^0-9] pattern) and not a fiktiv char sequence.

You may use a regex with a capturing group and alternation:

(fiktiv)|[^0-9] 

and replace with the contents of Group 1 using a $1 backreference, fiktiv, to restore it in the replaced string.

See the regex demo

C# implementation:

Regex.Replace(input‌​, "(fiktiv)|[^0-9]", "$1") 

Also, see Use RegEx in SQL with CLR Procs.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.