0


I'm using a simple PHP script on Altervista provider to get data from a very big HTML table (more than 6300 rows) at this link.

The problem is the "Maximum execution time of 30 seconds exceeded" during the rows loop.
I'd like to get XML data or even plain CSV text data, is there a faster way instead of looping each row?

<?php set_time_limit(3000); ini_set('max_execution_time', 3000); function XML_Append($XML,$Q,$Sex,$TabCnt,$TabName) { $pagecontent = file_get_contents($Q); echo "DONE fetch"; $doc = new DOMDocument(); $doc->preserveWhiteSpace = false; $doc->loadHTML($pagecontent); $tables = $doc->getElementsByTagName('table'); $rows = $tables->item($TabCnt)->getElementsByTagName('tr'); $rowLen=$rows->length; echo $rowLen; for($ir = 0; $ir < $rowLen; ++$ir) { echo $ir . "\r\n"; $row=$rows[$ir]; } unset($doc); } $QUERY_SPRINT_FEMMINILE="http://risultati.fitri.it/rank.asp?Anno=%ANNO%&TRank=S&Ss=F&PunDal=0.00&PunAl=999.99"; $QUERY_SPRINT_MASCHILE="http://risultati.fitri.it/rank.asp?Anno=%ANNO%&TRank=S&Ss=M&PunDal=0.00&PunAl=999.99"; $QUERY=""; $ANNO=""; if (isset($_GET['Anno'])) { $ANNO= $_GET['Anno']; } else { $ANNO="2019"; } $QUERY=str_replace("%ANNO%",$ANNO,$QUERY_SPRINT_MASCHILE); $xml = new SimpleXMLElement('<DocumentElement/>'); XML_Append($xml,$QUERY,"M",1,"SP"); echo "DONE"; ?> 

the loop code is:

 foreach ($rows as $row) { $xmlTable = $XML->addChild($TabName); $xmlTable->addChild('_S', $Sex); $cols = $row->getElementsByTagName('td'); $colLen=$cols->length; for($i = 0; $i < $colLen; ++$i) { $NomeColonna="C" . $i; $value= $cols->item($i)->nodeValue; $value=trim(str_replace(PHP_EOL, "", $value)); $value=str_replace("\xc2\xa0","",$value); $xmlTable->addChild($NomeColonna,$value); } } 
6
  • What operations do you perform on each row? If none, maybe you can convert the $rows array directly to XML data or a CSV file. Commented Sep 27, 2019 at 0:18
  • It might not be a beautiful solution, but you can parse that file using regular expressions. And write to output XML file without creating any objects. Of course, using DOMDocument and SimpleXMLElement do the code easier but working with big files they spend more time and require more memory. Commented Sep 27, 2019 at 5:18
  • I'm going to try with simple CSV but the problem seems to be the use of "complex" object exposed by DOMDocument ... Commented Sep 27, 2019 at 7:26
  • The HTML you're looking to process is awful, it looks like a distracted child wrote it in 1999. Literally every row has multiple instances of invalid HTML. This will have a lot to do with how long your script takes to process it, and there's not much you can do about it, unfortunately. I think even cheating and using a regex would be very slow. You could maybe contact the site owners and see if they can output a different format, but I think your best bet is increasing the timeout and waiting for it to finish. Commented Sep 27, 2019 at 16:24
  • miken32 I agree with you. I've already contacted the owners for an API or Webservice but... no way... for now... so I think I have no solution because I can't change the timeout on my hosting plan. Not even 000webhostapp seems to work. Commented Sep 27, 2019 at 18:48

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.