93

If I pass request fields or cookies with a period/dot in their names, PHP auto-replaces them with underscores. For example, if I put this code at https://example.com/test.php?x.y=a.b:

<?php echo $_SERVER['REQUEST_URI']; echo $_GET['x.y']; echo $_GET['x_y']; 

the output is:

/test.php?x.y=a.b a.b 

Is there any way I can prevent this behaviour?

6
  • .. Why don't you just convert all dots to some kind of token, like for instance, to (~#~) and then post it? When receiving the vars you can then reconvert them back.. This is because sometimes we NEED to post underscores.. and we would loose them if reconverting all "_" to "."s... Commented Sep 22, 2011 at 19:20
  • From the retriving query itself you can concate the user_name like "concat(firstname,'_',lastname) as user_name. Commented Jan 21, 2013 at 4:49
  • @Kaspar Mary ... the database is setup to have columns username and status and the usernames are stored as firstname.lastname so I can't use any concat in sql as they are alreaddy concat-ed with a . Commented Jan 21, 2013 at 5:00
  • @Crisp Thanks for the comment! (at) Rob interesting problem Commented Jan 21, 2013 at 5:01
  • Why isn't there a delete comment? :) Commented Feb 27, 2014 at 2:06

13 Answers 13

79

Here's PHP.net's explanation of why it does it:

Dots in incoming variable names

Typically, PHP does not alter the names of variables when they are passed into a script. However, it should be noted that the dot (period, full stop) is not a valid character in a PHP variable name. For the reason, look at it:

<?php $varname.ext; /* invalid variable name */ ?> 

Now, what the parser sees is a variable named $varname, followed by the string concatenation operator, followed by the barestring (i.e. unquoted string which doesn't match any known key or reserved words) 'ext'. Obviously, this doesn't have the intended result.

For this reason, it is important to note that PHP will automatically replace any dots in incoming variable names with underscores.

That's from http://ca.php.net/variables.external.

Also, according to this comment these other characters are converted to underscores:

The full list of field-name characters that PHP converts to _ (underscore) is the following (not just dot):

  • chr(32) ( ) (space)
  • chr(46) (.) (dot)
  • chr(91) ([) (open square bracket)
  • chr(128) - chr(159) (various)

So it looks like you're stuck with it, so you'll have to convert the underscores back to dots in your script using dawnerd's suggestion (I'd just use str_replace though.)

Sign up to request clarification or add additional context in comments.

4 Comments

This is a great explanation of why, but doesn't answer the original question of "is there any way to get it to stop"; other answers below do provide an answer to the original question.
@ElYobo, @JeremyRuten; good explanation of why? I'm using PHP 5.4 and PHP is still doing this. I'd also love to know why its not deprecated yet. I can only see two reasons for keeping it; register_globals (deprecated since 5.3), and for convenience in ~doing what register globals does manually (in which case the burden should be on the person doing that to map var names how they see fit IMO).
Backwards compatibility I assume? Good point, with register globals going the way of the dodo this strange "functionality" could go likewise.
With php7, register globals already rode off into the sunset but the problem is still present.
64

Long-since answered question, but there is actually a better answer (or work-around). PHP lets you at the raw input stream, so you can do something like this:

$query_string = file_get_contents('php://input'); 

which will give you the $_POST array in query string format, periods as they should be.

You can then parse it if you need (as per POSTer's comment)

<?php // Function to fix up PHP's messing up input containing dots, etc. // `$source` can be either 'POST' or 'GET' function getRealInput($source) { $pairs = explode("&", $source == 'POST' ? file_get_contents("php://input") : $_SERVER['QUERY_STRING']); $vars = array(); foreach ($pairs as $pair) { $nv = explode("=", $pair); $name = urldecode($nv[0]); $value = urldecode($nv[1]); $vars[$name] = $value; } return $vars; } // Wrapper functions specifically for GET and POST: function getRealGET() { return getRealInput('GET'); } function getRealPOST() { return getRealInput('POST'); } ?> 

Hugely useful for OpenID parameters, which contain both '.' and '_', each with a certain meaning!

5 Comments

To make this work with GET parameters replace file_get_contents("php://input") with $_SERVER['QUERY_STRING'].
And you can do same for cookies using $_SERVER['COOKIES']
This is a good start, but there are a couple of issues with it. It doesn't handle array values (e.g. foo.bar[]=blarg will not end up as an array, it will end up as a scalar variable called foo.bar[]). It also has a lot of overhead as it reprocesses all values, regardless of whether there is a period in them or not.
See my solution below, which fixes up the problems with Rok's implementation.
For some reason $query_string = file_get_contents('php://input'); returns an empty string for me.
32

Highlighting an actual answer by Johan in a comment above - I just wrapped my entire post in a top-level array which completely bypasses the problem with no heavy processing required.

In the form you do

<input name="data[database.username]"> <input name="data[database.password]"> <input name="data[something.else.really.deep]"> 

instead of

<input name="database.username"> <input name="database.password"> <input name="something.else.really.deep"> 

and in the post handler, just unwrap it:

$posdata = $_POST['data']; 

For me this was a two-line change, as my views were entirely templated.

FYI. I am using dots in my field names to edit trees of grouped data.

3 Comments

Very elegant and practical solution indeed, with the side benefit of keeping form data nicely namespaced.
This completely solves the problem and should have been the answer accepted.
Works like a charm. Very simple, yet elegant solution.
20

Do you want a solution that is standards compliant, and works with deep arrays (for example: ?param[2][5]=10) ?

To fix all possible sources of this problem, you can apply at the very top of your PHP code:

$_GET = fix( $_SERVER['QUERY_STRING'] ); $_POST = fix( file_get_contents('php://input') ); $_COOKIE = fix( $_SERVER['HTTP_COOKIE'] ); 

The working of this function is a neat idea that I came up during my summer vacation of 2013. Do not be discouraged by a simple regex, it just grabs all query names, encodes them (so dots are preserved), and then uses a normal parse_str() function.

function fix($source) { $source = preg_replace_callback( '/(^|(?<=&))[^=[&]+/', function($key) { return bin2hex(urldecode($key[0])); }, $source ); parse_str($source, $post); $result = array(); foreach ($post as $key => $val) { $result[hex2bin($key)] = $val; } return $result; } 

11 Comments

Thanks for this. Please also update it for deep arrays a[2][5] if you have time.
@Johan, deep arrays do work. a[2][5]=10 produces array(1) { ["a"]=> array(1) { [2]=> array(1) { [5]=> string(2) "10" } } }.
Oh I got it, it indeed does, just tested it. Php does not convert dots etc inside array indexes, only top level of array name is troubled: php_touches_this[nochangeshere][nochangeshere]. Great. Thanks.
I'd love to see your benchmarks, as that does conflict with the testing that I did a few months back. Also, I've just run across the situation where I need to handle periods in posted file fields, which no answers address yet; any ideas?
You'll see them soon enough, don't currently have time, but you can present yours. * File uploads require multipart/form-data type, which doesn't get passed to php://input. Therefore, this is still very hackish to do. See: stackoverflow.com/questions/1361673/get-raw-post-data
|
7

This happens because a period is an invalid character in a variable's name, the reason for which lies very deep in the implementation of PHP, so there are no easy fixes (yet).

In the meantime you can work around this issue by:

  1. Accessing the raw query data via either php://input for POST data or $_SERVER['QUERY_STRING'] for GET data
  2. Using a conversion function.

The below conversion function (PHP >= 5.4) encodes the names of each key-value pair into a hexadecimal representation and then performs a regular parse_str(); once done, it reverts the hexadecimal names back into their original form:

function parse_qs($data) { $data = preg_replace_callback('/(?:^|(?<=&))[^=[]+/', function($match) { return bin2hex(urldecode($match[0])); }, $data); parse_str($data, $values); return array_combine(array_map('hex2bin', array_keys($values)), $values); } // work with the raw query string $data = parse_qs($_SERVER['QUERY_STRING']); 

Or:

// handle posted data (this only works with application/x-www-form-urlencoded) $data = parse_qs(file_get_contents('php://input')); 

6 Comments

what would happen though if this needed to be used for something else that was sent through and I actually need the _ in the variable?
@Rob I've added the output based on your question; it works as expected, because I don't touch the underscores.
Note: This is an edited solution which later copied my code and my idea (see change log). It should be removed by the moderators.
Apparently it was good enough for you to take the bin2hex() idea from mine, so can we just drop this pointless feud?
@RokKralj I hope you're not implying that my older answer was perfectly fine; it sure seemed fine until I realised a major oversight which was fixed in a series of edits, the final one sharing the same regular expression to match names in a url encoded string; I honestly can't think of any other way to write that expression, or I would have done so now to stop the accusations.
|
6

This approach is an altered version of Rok Kralj's, but with some tweaking to work, to improve efficiency (avoids unnecessary callbacks, encoding and decoding on unaffected keys) and to correctly handle array keys.

A gist with tests is available and any feedback or suggestions are welcome here or there.

public function fix(&$target, $source, $keep = false) { if (!$source) { return; } $keys = array(); $source = preg_replace_callback( '/ # Match at start of string or & (?:^|(?<=&)) # Exclude cases where the period is in brackets, e.g. foo[bar.blarg] [^=&\[]* # Affected cases: periods and spaces (?:\.|%20) # Keep matching until assignment, next variable, end of string or # start of an array [^=&\[]* /x', function ($key) use (&$keys) { $keys[] = $key = base64_encode(urldecode($key[0])); return urlencode($key); }, $source ); if (!$keep) { $target = array(); } parse_str($source, $data); foreach ($data as $key => $val) { // Only unprocess encoded keys if (!in_array($key, $keys)) { $target[$key] = $val; continue; } $key = base64_decode($key); $target[$key] = $val; if ($keep) { // Keep a copy in the underscore key version $key = preg_replace('/(\.| )/', '_', $key); $target[$key] = $val; } } } 

10 Comments

Boom this worked perfectly for me, thanks El Yobo/Rok. Using it in a CodeIgniter 2.1.3 project.
I would note if values come in that don't have %20 entities already in place, such as 'Some Key=Some Value' then the output of this function is 'Some_Key=Some Value', maybe the regex could be tweaked?
The regex could be adjusted to catch un-url encoded spaces... but if your source isn't url encoded already, then there will probably be other problems, as the handling always decodes and encodes strings, then the parse_str call will again urldecode. What are you trying ot parse that's not encoded already?
Thanks for the attribution. Though, I might warn that your code might perform worse, because POSTs are usually just a few hundred bytes. I prefer simplicity here.
Did you get those benchmarks up somewhere? I'm interested to see which scenarios it's slower in, as everything that I tested it ranged between the same speed as yours and twice as fast. I suspect the difference is in the type of things that it was tested on :) You can easily add some timing checks to my gist to see how it goes, why not compare yours against the same input and post results and times?
|
5

The reason this happens is because of PHP's old register_globals functionality. The . character is not a valid character in a variable name, so PHP coverts it to an underscore in order to make sure there's compatibility.

In short, it's not a good practice to do periods in URL variables.

3 Comments

It is also not a good idea to have register_globals on. In fact, it should be turned off right now if possible.
register_globals is in fact off, as is the default in PHP5. > The . character is not a valid character in a variable name Unfortunately I'm not looking to use this as a variable name (I keep it as a key in the $_GET dictionary), so this 'thoughtfulness' in PHP adds no value :-( Ah well...
It doesn't matter if register_globals is on or off. PHP still performs the replacements.
3

If looking for any way to literally get PHP to stop replacing '.' characters in $_GET or $_POST arrays, then one such way is to modify PHP's source (and in this case it is relatively straightforward).

WARNING: Modifying PHP C source is an advanced option!

Also see this PHP bug report which suggests the same modification.

To explore you'll need to:

  • download PHP's C source code
  • disable the . replacement check
  • ./configure, make and deploy your customized build of PHP

The source change itself is trivial and involves updating just one half of one line in main/php_variables.c:

.... /* ensure that we don't have spaces or dots in the variable name (not binary safe) */ for (p = var; *p; p++) { if (*p == ' ' /*|| *p == '.'*/) { *p='_'; .... 

Note: compared to original || *p == '.' has been commented-out


Example Output:

given a QUERY_STRING of a.a[]=bb&a.a[]=BB&c%20c=dd, running <?php print_r($_GET); now produces:

 Array ( [a.a] => Array ( [0] => bb [1] => BB ) [c_c] => dd ) 

Notes:

  • this patch addresses the original question only (it stops replacement of dots, not spaces).
  • running on this patch will be faster than script-level solutions, but those pure-.php answers are still generally-preferable (because they avoid changing PHP itself).
  • in theory a polyfill approach is possible here and could combine approaches -- test for the C-level change using parse_str() and (if unavailable) fall-back to slower methods.

1 Comment

You shouldn't do it like this ever, however, +1 for the effort.
2

My solution to this problem was quick and dirty, but I still like it. I simply wanted to post a list of filenames that were checked on the form. I used base64_encode to encode the filenames within the markup and then just decoded it with base64_decode prior to using them.

Comments

2

After looking at Rok's solution I have come up with a version which addresses the limitations in my answer below, crb's above and Rok's solution as well. See a my improved version.


@crb's answer above is a good start, but there are a couple of problems.

  • It reprocesses everything, which is overkill; only those fields that have a "." in the name need to be reprocessed.
  • It fails to handle arrays in the same way that native PHP processing does, e.g. for keys like "foo.bar[]".

The solution below addresses both of these problems now (note that it has been updated since originally posted). This is about 50% faster than my answer above in my testing, but will not handle situations where the data has the same key (or a key which gets extracted the same, e.g. foo.bar and foo_bar are both extracted as foo_bar).

<?php public function fix2(&$target, $source, $keep = false) { if (!$source) { return; } preg_match_all( '/ # Match at start of string or & (?:^|(?<=&)) # Exclude cases where the period is in brackets, e.g. foo[bar.blarg] [^=&\[]* # Affected cases: periods and spaces (?:\.|%20) # Keep matching until assignment, next variable, end of string or # start of an array [^=&\[]* /x', $source, $matches ); foreach (current($matches) as $key) { $key = urldecode($key); $badKey = preg_replace('/(\.| )/', '_', $key); if (isset($target[$badKey])) { // Duplicate values may have already unset this $target[$key] = $target[$badKey]; if (!$keep) { unset($target[$badKey]); } } } } 

5 Comments

-1. Why? 1. Space %20 is also a special character that gets converted to underscore. 2. Your code preproceses all the data, since preg_match_all has to scan everything, even though you say you don't. 3. Your code fails at examples like this: a.b[10]=11.
You're right about space, thanks. My explanation already points out that my approach doesn't handle arrays, so I'm not quite sure why you're pointing that out. preg_match_all has to "process" one string, not extract and reprocess all the unaffected keys and values, so you're a little off track there as well. That said, your approach with parse_string looks like an interesting approach that, with a bit of tweaking, might be better :)
You say you extract only affected keys, but in terms of computational complexity, you don't. You are saying as you had some kind of random access to fetch only the affected keys, but even if there are no affected keys present, you have to access the whole memory. If you have a post with 100 megs of data, it doesn't matter what you extract, both approaches are linear, O(n). In fact, you are making the complexity worse, using the in_array() function, as pointed above.
I'm looking through the 100megs once, not splitting it apart (which immediately doubles the memory), then splitting it again (doubling again) as in crb's method that I was comparing this too. Big O notation isn't taking into account memory usage at all, and this implementation doesn't use in_array anyway. Also, if you care to run some tests, you'll notice that the above is still significantly faster; not O(n) vs O(n^2), but one linear approach can still be faster than another... and this one is ;)
The other major advantage that this approach has is that the speed advantage is greatest when there's no work to be done at all, i.e. when no keys have been provided with periods or spaces; this means that it has minimal overhead if you drop it in to process all requests, because it does almost no work (one regex), vs extracting and encoding all keys multiple times.
0

Well, the function I include below, "getRealPostArray()", isn't a pretty solution, but it handles arrays and supports both names: "alpha_beta" and "alpha.beta":

 <input type='text' value='First-.' name='alpha.beta[a.b][]' /><br> <input type='text' value='Second-.' name='alpha.beta[a.b][]' /><br> <input type='text' value='First-_' name='alpha_beta[a.b][]' /><br> <input type='text' value='Second-_' name='alpha_beta[a.b][]' /><br> 

whereas var_dump($_POST) produces:

 'alpha_beta' => array (size=1) 'a.b' => array (size=4) 0 => string 'First-.' (length=7) 1 => string 'Second-.' (length=8) 2 => string 'First-_' (length=7) 3 => string 'Second-_' (length=8) 

var_dump( getRealPostArray()) produces:

 'alpha.beta' => array (size=1) 'a.b' => array (size=2) 0 => string 'First-.' (length=7) 1 => string 'Second-.' (length=8) 'alpha_beta' => array (size=1) 'a.b' => array (size=2) 0 => string 'First-_' (length=7) 1 => string 'Second-_' (length=8) 

The function, for what it's worth:

function getRealPostArray() { if ($_SERVER['REQUEST_METHOD'] !== 'POST') {#Nothing to do return null; } $neverANamePart = '~#~'; #Any arbitrary string never expected in a 'name' $postdata = file_get_contents("php://input"); $post = []; $rebuiltpairs = []; $postraws = explode('&', $postdata); foreach ($postraws as $postraw) { #Each is a string like: 'xxxx=yyyy' $keyvalpair = explode('=',$postraw); if (empty($keyvalpair[1])) { $keyvalpair[1] = ''; } $pos = strpos($keyvalpair[0],'%5B'); if ($pos !== false) { $str1 = substr($keyvalpair[0], 0, $pos); $str2 = substr($keyvalpair[0], $pos); $str1 = str_replace('.',$neverANamePart,$str1); $keyvalpair[0] = $str1.$str2; } else { $keyvalpair[0] = str_replace('.',$neverANamePart,$keyvalpair[0]); } $rebuiltpair = implode('=',$keyvalpair); $rebuiltpairs[]=$rebuiltpair; } $rebuiltpostdata = implode('&',$rebuiltpairs); parse_str($rebuiltpostdata, $post); $fixedpost = []; foreach ($post as $key => $val) { $fixedpost[str_replace($neverANamePart,'.',$key)] = $val; } return $fixedpost; } 

Comments

0

Using crb's I wanted to recreate the $_POST array as a whole though keep in mind you'll still have to ensure you're encoding and decoding correctly both at the client and the server. It's important to understand when a character is truly invalid and it is truly valid. Additionally people should still and always escape client data before using it with any database command without exception.

<?php unset($_POST); $_POST = array(); $p0 = explode('&',file_get_contents('php://input')); foreach ($p0 as $key => $value) { $p1 = explode('=',$value); $_POST[$p1[0]] = $p1[1]; //OR... //$_POST[urldecode($p1[0])] = urldecode($p1[1]); } print_r($_POST); ?> 

I recommend using this only for individual cases only, offhand I'm not sure about the negative points of putting this at the top of your primary header file.

Comments

0

My current solution (based on prev topic replies):

function parseQueryString($data) { $data = rawurldecode($data); $pattern = '/(?:^|(?<=&))[^=&\[]*[^=&\[]*/'; $data = preg_replace_callback($pattern, function ($match){ return bin2hex(urldecode($match[0])); }, $data); parse_str($data, $values); return array_combine(array_map('hex2bin', array_keys($values)), $values); } $_GET = parseQueryString($_SERVER['QUERY_STRING']); 

1 Comment

please add some explanation that will be helpful to everyone who will read your answer.