6

Hi I have a variable in a Silex route and we only allow alphanumeric values

->assert('hash','\w+') 

I would like to also allow a dot in the variable, but my efforts at editing this regex have failed. Help greatly appreciated, thanks!

2
  • 3
    Keep in mind that \w includes underscore. And, that its meaning changes if your RE engine is set for Unicode semantics. Commented Feb 12, 2014 at 18:48
  • @DavidO update my answer with your hint. Wrote my answer initally from my cell and did spare explanations ^^; Commented Feb 12, 2014 at 21:44

3 Answers 3

13

Try

->assert('hash', '[a-zA-Z0-9.]+') 

Why not [\w.]?

You tagged your question as PHP so I assume that this manual applies. And there it reads

\w any "word" character 

and

A "word" character is any letter or digit or the underscore character, that is, any character which can be part of a Perl "word". The definition of letters and digits is controlled by PCRE's character tables, and may vary if locale-specific matching is taking place. For example, in the "fr" (French) locale, some character codes greater than 128 are used for accented letters, and these are matched by \w.

So after all \w might match äöüß... you don't know.

As it reads hash you may also want to try

->assert('hash', '[a-fA-F0-9.]+') 

which only accepts hex-digits and . and not G or Z or ...

Sign up to request clarification or add additional context in comments.

4 Comments

And where the documentation states, "...any character which can be part of a Perl 'word'.", it really should state, "...any character which can be part of a Perl identifier." We have no control over the documentation, but I mention it to offer the rationale.
@DavidO yeah, Perl "word" sounds strange but with identifiers one might think that only the english alphabet and some decoration like '_' is matched and not something locale specific. You could have phrased the whole paragraph just like A 'word' character is any letter or digit or the underscore character. The definition of letters and digits may vary if locale-specific matching is taking place. For example, in the "fr" (French) locale, some character codes greater than 128 are used for accented letters, and these are matched by \w.
In Perl, if you are using the "utf8" pragma, identifiers may contain any character that matches \w, even if it's in the >127 range.
@DavidO nice to know, but I guess to use RegExp it is sufficient to know, that \w varies with locale and matche more than [a-zA-Z] and less than [^ ] ;-)
11

Try using a character class ([…]), like this:

[\w.]+ 

For example:

->assert('hash','[\w.]+') 

Comments

2

I don't know the internals of assert(), but use a char class:

->assert('hash','[\w.]+') 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.