7

I'm looking for a 'smart way' of decoding multiple XML tags inside a string, i have the following function:

function b($params) { $xmldata = '<?xml version="1.0" encoding="UTF-8" ?><root>' . html_entity_decode($params['data']) . '</root>'; $lang = ucfirst(strtolower($params['lang'])); if (simplexml_load_string($xmldata) === FALSE) { return $params['data']; } else { $langxmlobj = new SimpleXMLElement($xmldata); if ($langxmlobj -> $lang) { return $langxmlobj -> $lang; } else { return $params['data']; } } } 

And trying out

$params['data'] = '<French>Service DNS</French><English>DNS Service</English> - <French>DNS Gratuit</French><English>Free DNS</English>'; $params['lang'] = 'French'; $a = b($params); print_r($a); 

But outputs:

Service DNS 

And I want it to basically output every tags, so result should be :

Service DNS - DNS Gratuit 

Pulling my hairs out. Any quick help or directions would be appreciated.


Edit: Refine needs.

Seems that I wasn't clear enough; so let me show another example

If i have the following string as input :

The <French>Chat</French><English>Cat</English> is very happy to stay on stackoverflow because it makes him <French>Heureux</French><English>Happy</English> to know that it is the best <French>Endroit</French><English>Place</English> to find good people with good <French>Réponses</French><English>Answers</English>. 

So if i'd run function with 'French' it will return :

The Chat is very happy to stay on stackoverflow because it makes him Heureux to know that it is the best Endroit to find good people with good Réponses. 

And with 'English' :

The Cat is very happy to stay on stackoverflow because it makes him Happy to know that it is the best Place to find good people with good Answers. 

Hope it's more clear now.

1
  • What your php version? Your code output every tags for me($a is a SimpleXMLElement Object) Commented Dec 14, 2013 at 13:57

5 Answers 5

6
+50

Basically, I will parse out the lang section firstly, like:

<French>Chat</French><English>Cat</English> 

with this:

"@(<($defLangs)>.*?</\\2>)+@i" 

Then parse the right lang str out with callback.

If you got php 5.3+, then:

function transLang($str, $lang, $defLangs = 'French|English') { return preg_replace_callback ( "@(<($defLangs)>.*?</\\2>)+@i", function ($matches) use($lang) { preg_match ( "/<$lang>(.*?)<\/$lang>/i", $matches [0], $longSec ); return $longSec [1]; }, $str ); } echo transLang ( $str, 'French' ), "\n", transLang ( $str, 'English' ); 

If not, a little complicated:

class LangHelper { private $lang; function __construct($lang) { $this->lang = $lang; } public function callback($matches) { $lang = $this->lang; preg_match ( "/<$lang>(.*?)<\/$lang>/i", $matches [0], $subMatches ); return $subMatches [1]; } } function transLang($str, $lang, $defLangs = 'French|English') { $langHelper = new LangHelper ( $lang ); return preg_replace_callback ( "@(<($defLangs)>.*?</\\2>)+@i", array ( $langHelper, 'callback' ), $str ); } echo transLang ( $str, 'French' ), "\n", transLang ( $str, 'English' ); 
Sign up to request clarification or add additional context in comments.

Comments

3

If I understand you correctly you would like to remove all "language" tags, but keep the contents of the provided language.

The DOM is a tree of nodes. Tags are element nodes, the text is stored in text nodes. Xpath allows to select nodes using expressions. So take all the child nodes of the language elements you want to keep and copy them just before the language node. Then remove all language nodes. This will work even if the language elements contain other element nodes, like an <em>.

function replaceLanguageTags($fragment, $language) { $dom = new DOMDocument(); $dom->loadXml( '<?xml version="1.0" encoding="UTF-8" ?><content>'.$fragment.'</content>' ); // get an xpath object $xpath = new DOMXpath($dom); // fetch all nodes with the language you like to keep $nodes = $xpath->evaluate('//'.$language); foreach ($nodes as $node) { // copy all the child nodes of just before the found node foreach ($node->childNodes as $childNode) { $node->parentNode->insertBefore($childNode->cloneNode(TRUE), $node); } // remove the found node $node->parentNode->removeChild($node); } // select all language nodes $tags = array('English', 'French'); $nodes = $xpath->evaluate('//'.implode('|//', $tags)); foreach ($nodes as $node) { // remove them $node->parentNode->removeChild($node); } $result = ''; // we do not need the root node, so save all its children foreach ($dom->documentElement->childNodes as $node) { $result .= $dom->saveXml($node); } return $result; } $xml = <<<'XML' The <French>Chat</French><English>Cat</English> is very happy to stay on stackoverflow because it makes him <French>Heureux</French><English>Happy</English> to know that it is the best <French>Endroit</French><English>Place</English> to find good people with good <French>Réponses</French><English>Answers</English>. XML; var_dump(replaceLanguageTags($xml, 'English')); var_dump(replaceLanguageTags($xml, 'French')); 

Output:

string(146) "The Cat is very happy to stay on stackoverflow because it makes him Happy to know that it is the best Place to find good people with good Answers." string(153) "The Chat is very happy to stay on stackoverflow because it makes him Heureux to know that it is the best Endroit to find good people with good Réponses." 

Comments

2

What version of PHP are you on? I don't know what else could be different, but I copied & pasted your code and got the following output:

SimpleXMLElement Object ( [0] => Service DNS [1] => DNS Gratuit ) 

Just to be sure, this is the code I copied from above:

<?php function b($params) { $xmldata = '<?xml version="1.0" encoding="UTF-8" ?><root>' . html_entity_decode($params['data']) . '</root>'; $lang = ucfirst(strtolower($params['lang'])); if (simplexml_load_string($xmldata) === FALSE) { return $params['data']; } else { $langxmlobj = new SimpleXMLElement($xmldata); if ($langxmlobj -> $lang) { return $langxmlobj -> $lang; } else { return $params['data']; } } } $params['data'] = '<French>Service DNS</French><English>DNS Service</English> - <French>DNS Gratuit</French><English>Free DNS</English>'; $params['lang'] = 'French'; $a = b($params); print_r($a); 

3 Comments

no it's not more clear. with your new string your code produces SimpleXMLElement Object ( [0] => Chat [1] => Heureux [2] => Endroit [3] => Reponses ) Maybe you need a function other than print_r, but it's not clear what you are trying to achieve or what is your current result. If you're looking to output a paragraph as in your question, don't use print_r, do this: $a[0] is very happy to stay on stackoverflow because it makes him $a[1] to know that it is the best $a[2] to find good people with $a[3].
No. I want that THE function return the text translated with the correct language; i don't want to go thru arrays and indexes. Please ignore the 'print_r' function at the end; print $a should print the translated text.
as far as i know there is no simple way to do what you want with PHP short of going through arrays.
2

Here's my suggestion. It should be fast and it is simple. You just need to strip the tags of the desired language and then remove any other tags along with their content.

The downside is that if you wish to use any other tags than the language one, you have to make sure that the opening one is different from the closing (e.g. <p >Lorem</p> instead of <p>Lorem</p>). On the other hand this allows you to add as many languages as you want, without keeping a list of them. You need to know only the default one (or just throw and catch exception) when the asked language is missing.

function only_lang($lang, $text) { static $infinite_loop; $result = str_replace("<$lang>", '', $text, $num_matches_open); $result = str_replace("</$lang>", '', $result, $num_matches_close); // Check if the text is malformed. Good place to throw an error if($num_matches_open != $num_matches_close) { //throw new Exception('Opening and closing tags does not match', 1); return $text; } // Check if this language is present at all. // Otherwise fallback to default language or throw an error if( ! $num_matches_open) { //throw new Exception('No such language', 2); // Prevent infinite loop if even the default language is missing if($infinite_loop) return $text; $infinite_loop = __FUNCTION__; return $infinite_loop('English', $text); } // Strip any other language and return the result return preg_replace('!<([^>]+)>.*</\\1>!', '', $result); } 

Comments

1

I got a simple one using regex. Useful, if the input only contains <lang>...</lang> tags.

function to_lang($lang="", $str="") { return strip_tags(preg_replace('~<(\w+(?<!'.$lang.'))>.*</\1>~Us',"",$str)); } echo to_lang("English","The happy <French>Chat</French><English>Cat</English>"); 

Removes each <tag>...</tag>, that is not the specified one in $lang. If there could be spaces/specials inside the <tag-name> e.g. <French-1> replace \w with [^/>].


Search pattern explained a bit

1.) <(\w+(?<!'.$lang.'))

< followed by one or more Word characters, not matching $lang (using a negative lookbehind) and capturing the <tag_name>

2.) .* followed by anything (ungreedy: modifier U, dot matches newlines: modifier s)

3.) </\1> until the captured tag is closed

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.