Skip to main content
Bounty Awarded with 200 reputation awarded by user717452
added 125 characters in body
Source Link
Jens A. Koch
  • 42.1k
  • 14
  • 123
  • 147

Another way of saving the XML is:

$xmlString = $rss->saveXML(); file_put_contents(__DIR__.'/Test.xml', $xmlString); 

Another way of saving the XML is:

$xmlString = $rss->saveXML(); file_put_contents(__DIR__.'/Test.xml', $xmlString); 
Source Link
Jens A. Koch
  • 42.1k
  • 14
  • 123
  • 147

  • In RSS 1.0 there is no 'date' on items. But 'dc:date' comes into play. http://web.resource.org/rss/1.0/spec#s5.5

  • In RSS 2.0 there is no 'date', but 'pubdate' on items. http://cyber.law.harvard.edu/rss/rss.html#hrelementsOfLtitemgt

  • Decide, if you want to look for 'date', 'dc:date' and 'pubDate'. The following code works with pubDate.

  • $limit = 50; was unused

  • Removing nodes from a nodeList under iteration will not work. It's an old hat! See comments here: http://php.net/manual/de/domnode.removechild.php The solution is to use a queue for marking the bad nodes and remove them afterwards.

  • I have taken the liberty to mangle the code a bit. I left the debug stuff intentionally active. Mainly for date comparison stuff and reduced list display. The code is commented.

  • Please adjust the feed URL and the "-x days" in the condition. I had to work with a public rss feed to test things.

--

<?php date_default_timezone_set('America/Los_Angeles'); $feed = array(); // target array for filtered items $nodesToRemoveQueue = array(); // stores all nodes to remove $rss = new DOMDocument(); $url = 'http://rss.nytimes.com/services/xml/rss/nyt/Space.xml'; $rss->load($url); $nodeList = $rss->getElementsByTagName('item'); foreach ($nodeList as $node) { $pubDate = $node->getElementsByTagName('pubDate')->item(0)->nodeValue; // if date in the xml feed is older then desired number of days, remove node // and proceed with iteration. (do not transfer the data into the $feeds array.) if(isDateOlderThenDays($pubDate, '-5 days')) { echo 'Removed ' . $pubDate . '<br>'; // $node->parentNode->removeChild($node); this won't work!! $nodesToRemoveQueue[] = $node; // put node in queue, remove later continue; } echo 'Kept ' . $pubDate . '<br>'; // build item for $feed array, then add item to $feed array $item = array ( 'title' => $node->getElementsByTagName('title')->item(0)->nodeValue, 'desc' => $node->getElementsByTagName('description')->item(0)->nodeValue, 'link' => $node->getElementsByTagName('link')->item(0)->nodeValue, 'date' => $pubDate, ); $feed[] = $item; } // helper to compare dates - function isDateOlderThenDays($date, $days) { // when pubdate($date) is lower(older) then $days, return true, else false. return (strtotime($date) < strtotime($days)) ? true : false; } // feed array contains all the not "outdated" items var_dump($feed); // finally: remove the "outdated" nodes foreach($nodesToRemoveQueue as $node){ $node->parentNode->removeChild($node); } // nodelist reduction check. this should only displays the dates kept $nodeList = $rss->getElementsByTagName('item'); foreach ($nodeList as $node) { echo $node->getElementsByTagName('pubDate')->item(0)->nodeValue . '<br>'; } // write reduced RSS XML to file $rss->save(__DIR__.'/Test.xml');