0

based on my last question, i sent request to website and it show me output. But, output show me the full website. i want get only some data like link in curl output.

$url = 'http://site1.com/index.php'; $data = ["send" => "Test"]; $ch = curl_init($url); curl_setopt($ch, CURLOPT_POST, 1); curl_setopt($ch, CURLOPT_POSTFIELDS, $data); curl_setopt($curl, CURLOPT_RETURNTRANSFER, true); $response = curl_exec($ch); curl_close($ch); var_dump($response); 

this code show me what i want , but the output contain full website. i just want get some data and show in out put.

7
  • What is the response you get? Commented Apr 1, 2020 at 15:40
  • What do you mean by "I just want get some data"? Where do you filter the output by anything? Commented Apr 1, 2020 at 15:41
  • @vivek_23 response show me the website with data. i just want data. not fully website show on screen Commented Apr 1, 2020 at 15:42
  • if the website is yours, you can create separate endpoint which will return, what ever you need (eg:site1.com/index-curl.php) If the website is not yours, you will have to use a web scraping script to filter out the response. The following link might be helpful for you to write a scraper stackoverflow.com/questions/9813273/web-scraping-in-php Commented Apr 1, 2020 at 15:42
  • @NicoHaase i don't know how to filter. the data that i want is in some html class. <div><img src="WANT-THIS"></div> Commented Apr 1, 2020 at 15:45

1 Answer 1

1

You can use preg_match_all and a carefully constructed pattern. This modified version of your code should give you a list of all the image urls in the HTML that you retrieve:

 $url = 'http://site1.com/index.php'; $data = ["send" => "Test"]; $ch = curl_init($url); curl_setopt($ch, CURLOPT_POST, 1); curl_setopt($ch, CURLOPT_POSTFIELDS, $data); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); $response = curl_exec($ch); curl_close($ch); $matches = NULL; $pattern = '/<img[^>]+src=\"([^"]+)"[^>]*>/'; $img_count = preg_match_all($pattern, $response, $matches); var_dump($matches[1]); 

If you'd like to fetch all the links instead, you can change $pattern to this:

 $pattern = '/<a[^>]+href=\"([^"]+)"[^>]*>/'; 

I have tested this code on an html file that looks like this:

<html> <body> <div><img src="WANT-THIS"></div> </body> </html> 

And the output is this:

array(1) { [0]=> string(9) "WANT-THIS" } 

EDIT 2: In response to additional questions from the OP, I have also tried the script on this html file:

<html> <body> <div1>CODE</div><div2>CODE</div><div3>CODE</div><div4>CODE</div><div5>CODE</div><div6>CODE</div><img src="IMAGE"> </body> </html> 

And it produces this result:

array(1) { [0]=> string(5) "IMAGE" } 

If this doesn't solve your problem, you'll need to provide additional detail -- either an example url that you are fetching, some HTML that you want to search, or extra detail about how you might know which image in the HTML you want to grab -- does it have some special id? Is it always the first image? The second image? Is there any characteristic by which we know which image to grab?

Sign up to request clarification or add additional context in comments.

21 Comments

I wouldn't recommend regex matching on HTML response.
why not? It gives you a list of images in the page.
@vivek_23 Not sure that long and sanctimonious post deserves so many upvotes in the first place. Secondly, i'm not writing a 'parser' -- the objective is very limited here.
You should not use regex for this as it's a bad practice regardless of how big or small the objective is. Also, in your regex, you are expecting the src to be under double quotes, it could be under single quotes as well. Also, if I am not mistaken, it could collide with data-src attribute as well.
@vivek_23 So you would prefer no solution than a solution that might be tweaked to work?
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.