4

I am trying to scrape a website using PHP, CURL and POST method in order to submit a form before web scraping the page. The problem I am experiencing is that there is connected with POST method: no data is submitted to the server, so the scraped webpage doesn't contain what I am looking for.

I quit sure the problem is connected with the form type: enctype="multipart/form-data". How can I manage this POST request, considering that the form is multipart/form-data? Do I have to encode the post_string in a special way?

Here's the code I'm using:

 function curl($url) { //POST string $post_string="XXXX"; $options = Array( CURLOPT_RETURNTRANSFER => TRUE, CURLOPT_FOLLOWLOCATION => TRUE, CURLOPT_AUTOREFERER => TRUE, CURLOPT_CONNECTTIMEOUT => 120, CURLOPT_TIMEOUT => 120, CURLOPT_MAXREDIRS => 10, CURLOPT_USERAGENT => "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1a2pre) Gecko/2008073000 Shredder/3.0a2pre ThunderBrowse/3.2.1.8", CURLOPT_URL => $url, CURLOPT_CAINFO => dirname(__FILE__)."/cacert.pem", CURLOPT_POSTFIELDS => $post_string, ); $ch = curl_init(); curl_setopt_array($ch, $options); $data = curl_exec($ch); curl_error($ch); curl_close($ch); return $data; } $scraped_page = curl("XXXURLXXX"); echo $scraped_page; 

Thank you!

2 Answers 2

6

Set the CURLOPT_POST to true:

CURLOPT_POST = true 

Then fill your post fields like this 'setup':

$postfields = array(); $postfields['field1'] = 'value1'; $postfields['field2'] = 'value2'; CURLOPT_POSTFIELDS => $postfields 

If value is an array, the Content-Type header will be set to multipart/form-data.

The PHP manual

Sign up to request clarification or add additional context in comments.

9 Comments

Well, I've added these lines: CURLOPT_POST => TRUE, CURLOPT_POSTFIELDS => http_build_query($postfields), and filled post fields with your setup, but it still doesn't work: Firebug confirms that no POST is executed...
Firebug will not show the real POST as this happens 'inside' the PHP execution, not inside the 'output' of the page. Add the following to your PHP, after the $data = curl_exec($ch); ==> var_dump(curl_getinfo($ch)); And see what that will show.
Ah, ok, thank you, but the page which is "echoed" still shows the empty form and no results..
Hmm. Some people recommend to just do this, without the http_build_query function: CURLOPT_POSTFIELDS => $postfields And I found out why: nl1.php.net/curl_setopt "If value is an array, the Content-Type header will be set to multipart/form-data."
I'm only saying what PHP will do, if this does not work, and if we cannot test this code locally, it's the best thing we can do :) :|
|
2

Yes, $post_string needs to be an array.

Also set CURLOPT_POST to true.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.