As others pointed out, there is no general way to get the original text from a regular expression, since there are usually a lot of possibilies.
However, since you have the digits of the "original text", you are able to recreate the text in case these specific digits are the only information that is missing in the pattern; e.g., in your polish example \d{2}-\d{3} you are able to replace \d{2} and {3} in the pattern by 2 and 3 digits of your postalcode from api A, and the pattern will give you the additional "-".
Examples for cases you can't reconstruct:
- SO: "[A-Z]{2}[ ]?\d{5}" because you don't get letters from api A, so you can't reconstruct them.
- BR: "\d{5}[\-]?\d{3}" because you don't get 8 digits from api A.
- anything with optional stuff, cause, well, it is not defined which of these options is the right one. There might be several valid solutions that might depend on special conditions (e.g. you have to use the additional 3 digits in
\d{4}(-\d{3})? for cities with more than 10000 houses or something like that, or you have to use the - in \d{2}[-]?\d{2} only for the capital of the state or maybe you can just use it as you like.) That includes terms like \d{1-4}, since the length might depend on other values. And you might get problems if leading 0's are allowed in your code: for an input 0000001, 1, 01, 001 and 0001 might be the correct solution for \d{1-4} (though I would assume that leading 0's in practice will only happen with a fixed length); and for \d{4}(-\d{3})?, 0001002 might mean 0001-001 (large city) or 1001 (small city).
The usual way to get the correct postal code in these (and tbh in all) cases would be to look it up in a database by city and street name. (You can buy access to such databases from your local postal service, or create a database from e.g. openstreetmap-data).
Having said that, here is some example code that will reconstruct codes that only are missing a fixed number of digits, e.g. PL (\d{2}-\d{3}). It will work too for patterns like FK ("FIQQ 1ZZ"), provided the code from A will be "0000001". I assume it will work for about 50%-60% of countries.
use CommerceGuys\Addressing\Repository\AddressFormatRepository; $country = 'PL'; $postalcodeA = '0031401'; $repo = new AddressFormatRepository(); $pattern = $repo ->get($country) ->getPostalCodePattern() ; $ok = 1; $pospattern = 0; $posA = 0; $postalcodeB = ''; while ( ($pospattern < strlen($pattern)) and ($ok==1) ) { $pospattern += 1; $charact = substr($pattern, -$pospattern,1); if (strcmp($charact,'}') == 0) { if (strcmp(substr($pattern, -$pospattern - 4, 3),'\d{') == 0) { $cnt = substr($pattern, -$pospattern - 1,1); $postalcodeB = substr($postalcodeA, -$posA - $cnt, $cnt) . $postalcodeB; $posA += $cnt; $pospattern += 4; } else { $ok = 0; } } elseif ( ctype_digit($charact) ) { if ( strcmp($charact,substr($postalcodeA,-$posA-1,1)) !== 0) { $ok = 0; } $postalcodeB = $charact . $postalcodeB; $posA += 1; } elseif ( preg_match('/[\(\)\[\]\{\}\$\?\\\]/', $charact) ) { $ok = 0; } else { $postalcodeB = $charact . $postalcodeB; } } # USE WITH CARE! READ INFO! # if ($ok == 0) { # $postalcodeB = preg_replace( # '/^.*(' . $pattern . ')$/', # '$1', # $postalcodeA # ); # if (strcmp($postalcodeA,$postalcodeB) !== 0) { # $ok = 1; # } #} if (!preg_match('/^' . $pattern . '$/', $postalcodeB)) { $ok = 0; } if (!$ok) { echo "Pattern ",$pattern," not supported or no match to ",$postalcodeA,"\r\n"; } else { echo "Pattern ",$pattern," ok: ",$postalcodeA," -> ",$postalcodeB,"\r\n"; }
It will replace every occurance of \d{n} in the pattern by n digits, starting at the end of the string. In case it doesn't understand the pattern (e.g. as it has optional stuff), you might want to try preg_replace. I wouldn't use it (and commented it out) cause it can give you unpredictable and wrong random results (see below the example for Boston City Hall), but I added it in case you want to use it because you e.g. can make sure the client for api A will never allow a zip+4 code to be entered. As a last step it will verify if the result fits the pattern.
You can easily add support for \d (a single digit).
You can try to add support for terms like \d{1-4} by e.g. checking how many digits api A has and doesn't use in other terms, and use the remaining digits (e.g. \d{2}-\d{1-4} with an input 0001245 has 4 digits, uses 2 for the first term \d{2} so it has 2 digits for \d{1-4} left, but keep in mind the things i wrote above: you might get wrong results if zero is an allowed digit at the beginning, e.g. 00-1245, 01-245 or 12-34 might be valid results (in this case, you cannot recover the code without looking up the city name in a database). And you will get in trouble for \d{1-2}-\d{2-3}.
You should add a final check to see if the numbers of digits fits the digits in A (e.g. you might want to concat all the digits in the result and check if this string is the code given by A padded with zeros). That will prevent you from some misinterpretation caused by e.g. preg_replace or \d{1-2} or other optional stuff. For example, someone entered US zip+4 code for Boston City Hall, which is 02201-1020. Your api A will give you 0220110, or, worse, 2011020, and preg_replace will give you 20110 or 11020, both of which are completely wrong (02201 might be an acceptable compromise, but you will have problems generating this result).
You can then let it run once for every country with a random code and then check for patterns that don't work. Some of these will just not work because the code is not right (e.g. FK will only work if the input is 0000001 what is not usually the case for random input).
If you are lucky, you don't need these countries.
Or, as a last ressort, you might be able to rewrite some of the remaining errors, but it will require some manual work:
Some of the patterns will contain optional stuff, e.g. \d{2}[-]?\d{2}. For these cases you can check if the - depends e.g. on some of the digits or the city name, or if it is really optional. If it is really optional, you have to decide if you want the - or not and then save that as the new pattern, e.g. \d{2}-\d{2}. But in most cases you can't do a general replacement, e.g. for US you might decide to leave out the +4, but you still wouldn't be able to get the correct result if the customer entered the (correct) zip+4 code for boston city hall, see example above.
For other patterns there might be some allowed possibilities, e.g. \d{4}|A-\d{3}. For this cases you might be able to create 2 patterns, e.g. \d{4} and A-\d{3}. You can do the same for e.g. \d{2}(-\d{2})? and manually generate the two patterns \d{2} and \d{2}-\d{2}. You then have to test all these patterns for a country (put the whole thing in a while-loop and execute it for every sub-pattern) and take the first that fits. A pattern will fit, if it uses all given digits from A and fulfills the final patterntest. Though this will, again, usually fail if leading zeros are allowed: input 0000123 might mean 0123or A-123, so you might have to check other resources if zeros are allowed (and a similar problem as with boston city hall might still occur). But this way you might be able to reconstruct some more countries.
But in most of these cases it won't be possible to rewrite them or even generate a specific postal code manually without looking them up in a database.
50-1will be stored by api A this way:0050001?0000501. Anyway, your example doesn't seem to be a valid postal code. Just to make it clearer, my code should work for any country.preg_matchbut you will have to provide the second. Your first API seems very lacking. What will it do with a Brazilian postal code, or a US ZIP+4?