15

I have a html string that contains exactly one a-element in it. Example:

 <a href="http://www.test.com" rel="nofollow external">test</a> 

In php I have to test if rel contains external and if yes, then modify href and save the string.

I have looked for DOM nodes and objects. But they seem to be too much for only one A-element, as I have to iterate to get html nodes and I am not sure how to test if rel exists and contains external.

$html = new DOMDocument(); $html->loadHtml($txt); $a = $html->getElementsByTagName('a'); $attr = $a->item(0)->attributes(); ... 

At this point I am going to get NodeMapList that seems to be overhead. Is there any simplier way for this or should I do it with DOM?

3
  • When dealing with DOM you have two options: 1) use native DOM parser 2) Use regular expression (which is overhead) Commented Apr 21, 2013 at 1:47
  • Keep going. Use DOMDocument() for manipulation Commented Apr 21, 2013 at 1:48
  • Nobody should use the raw DOM methods for manipulation. Consider phpQuery or QueryPath etc. to reduce tedious boilerplate. Commented Apr 21, 2013 at 1:48

4 Answers 4

13

Is there any simplier way for this or should I do it with DOM?

Do it with DOM.

Here's an example:

<?php $html = '<a href="http://example.com" rel="nofollow external">test</a>'; $dom = new DOMDocument; $dom->loadHTML($html); $xpath = new DOMXPath($dom); $nodes = $xpath->query("//a[contains(concat(' ', normalize-space(@rel), ' '), ' external ')]"); foreach($nodes as $node) { $node->setAttribute('href', 'http://example.org'); } echo $dom->saveHTML(); 
Sign up to request clarification or add additional context in comments.

3 Comments

$dom->saveHTML(); This method, as of 5.2.6, will automatically add <html><body> and <!DOCTYPE> tags to the document if they are missing, without asking whether you want them.
Some query explanation would be beneficial to researchers.
@MarcinJaworski Thanks for the heads up – looks like the fix is to pass some flags to loadHtml: stackoverflow.com/questions/4879946/…
2

I kept going to modify with DOM. This is what I get:

$html = new DOMDocument(); $html->loadHtml('<?xml encoding="utf-8" ?>' . $txt); $nodes = $html->getElementsByTagName('a'); foreach ($nodes as $node) { foreach ($node->attributes as $att) { if ($att->name == 'rel') { if (strpos($att->value, 'external')) { $node->setAttribute('href','modified_url_goes_here'); } } } } $txt = $html->saveHTML(); 

I did not want to load any other library for just this one string.

Comments

1

The best way is to use a HTML parser/DOM, but here's a regex solution:

$html = '<a href="http://www.test.com" rel="nofollow external">test</a><br> <p> Some text</p> <a href="http://test.com">test2</a><br> <a rel="external">test3</a> <-- This won\'t work since there is no href in it. '; $new = preg_replace_callback('/<a.+?rel\s*=\s*"([^"]*)"[^>]*>/i', function($m){ if(strpos($m[1], 'external') !== false){ $m[0] = preg_replace('/href\s*=\s*(("[^"]*")|(\'[^\']*\'))/i', 'href="http://example.com"', $m[0]); } return $m[0]; }, $html); echo $new; 

Online demo.

Comments

0

You could use a regular expression like if it matches /\s+rel\s*=\s*".*external.*"/ then do a regExp replace like /(<a.*href\s*=\s*")([^"]\)("[^>]*>)/\1[your new href here]\3/

Though using a library that can do this kind of stuff for you is much easier (like jquery for javascript)

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.