Return to Question

added 1 character in body; edited title

edited Jan 31, 2021 at 23:11

6.8k
4
21
23

scrapping scraping with curl

I am trying to scrapscrape some info from some websites using PHP CURL, the problem is it gives me wrong (different) content than opening it with normal browser

The example site is this: http://web.vecer.com/portali/vecer/v1/default.asp?kaj=3&id=2010091905576453

I am trying to get the meta tags, in the browser it returns as:

<meta name="title" content="Razmere v Preboldu se umirjajo" /> <meta name="description" content="Za prebivalci Prebolda je nemirna no&#269;, ki ji je sledilo jutro s &#353;e dodatnimi padavinami..." /> <link rel="image_src" href="http://web.vecer.com/portali/podatki/2010/09/19/slike/online_Prebold0-100.jpg" /> <link rel="target_url" href="http://web.vecer.com/portali/vecer/v1/default.asp?kaj=3&id=2010091905576453" />

but my curl gets this:

<title>VECER.COM: </title> <meta name="title" content="" /> <meta name="description" content="" /> <link rel="image_src" href="-100.jpg" /> <link rel="target_url" href="http://web.vecer.com/portali/vecer/v1/default.asp?kaj=3&id=1899123000000000">

here is my code:

function curl($url){ $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.6 (KHTML, like Gecko) Chrome/16.0.897.0 Safari/535.6'); curl_setopt($ch, CURLOPT_HEADER, true); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_COOKIEFILE, "cookie.txt"); curl_setopt($ch, CURLOPT_COOKIEJAR, "cookie.txt"); curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30); curl_setopt($ch, CURLOPT_REFERER, "http://www.windowsphone.com"); $data = curl_exec($ch); curl_close($ch); return $data; }

What I'm doing wrong?

scrapping with curl

I am trying to scrap some info from some websites using PHP CURL, the problem is it gives me wrong (different) content than opening it with normal browser

The example site is this: http://web.vecer.com/portali/vecer/v1/default.asp?kaj=3&id=2010091905576453

I am trying to get the meta tags, in the browser it returns as:

<meta name="title" content="Razmere v Preboldu se umirjajo" /> <meta name="description" content="Za prebivalci Prebolda je nemirna no&#269;, ki ji je sledilo jutro s &#353;e dodatnimi padavinami..." /> <link rel="image_src" href="http://web.vecer.com/portali/podatki/2010/09/19/slike/online_Prebold0-100.jpg" /> <link rel="target_url" href="http://web.vecer.com/portali/vecer/v1/default.asp?kaj=3&id=2010091905576453" />

but my curl gets this:

<title>VECER.COM: </title> <meta name="title" content="" /> <meta name="description" content="" /> <link rel="image_src" href="-100.jpg" /> <link rel="target_url" href="http://web.vecer.com/portali/vecer/v1/default.asp?kaj=3&id=1899123000000000">

here is my code:

function curl($url){ $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.6 (KHTML, like Gecko) Chrome/16.0.897.0 Safari/535.6'); curl_setopt($ch, CURLOPT_HEADER, true); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_COOKIEFILE, "cookie.txt"); curl_setopt($ch, CURLOPT_COOKIEJAR, "cookie.txt"); curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30); curl_setopt($ch, CURLOPT_REFERER, "http://www.windowsphone.com"); $data = curl_exec($ch); curl_close($ch); return $data; }

What I'm doing wrong?

scraping with curl

I am trying to scrape some info from some websites using PHP CURL, the problem is it gives me wrong (different) content than opening it with normal browser

The example site is this: http://web.vecer.com/portali/vecer/v1/default.asp?kaj=3&id=2010091905576453

I am trying to get the meta tags, in the browser it returns as:

<meta name="title" content="Razmere v Preboldu se umirjajo" /> <meta name="description" content="Za prebivalci Prebolda je nemirna no&#269;, ki ji je sledilo jutro s &#353;e dodatnimi padavinami..." /> <link rel="image_src" href="http://web.vecer.com/portali/podatki/2010/09/19/slike/online_Prebold0-100.jpg" /> <link rel="target_url" href="http://web.vecer.com/portali/vecer/v1/default.asp?kaj=3&id=2010091905576453" />

but my curl gets this:

<title>VECER.COM: </title> <meta name="title" content="" /> <meta name="description" content="" /> <link rel="image_src" href="-100.jpg" /> <link rel="target_url" href="http://web.vecer.com/portali/vecer/v1/default.asp?kaj=3&id=1899123000000000">

here is my code:

function curl($url){ $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.6 (KHTML, like Gecko) Chrome/16.0.897.0 Safari/535.6'); curl_setopt($ch, CURLOPT_HEADER, true); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_COOKIEFILE, "cookie.txt"); curl_setopt($ch, CURLOPT_COOKIEJAR, "cookie.txt"); curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30); curl_setopt($ch, CURLOPT_REFERER, "http://www.windowsphone.com"); $data = curl_exec($ch); curl_close($ch); return $data; }

What I'm doing wrong?

fixed grammar and improved formatting

Source Link

edit approved Jan 28, 2013 at 12:53

Uttara

2.5k
3
25
35

iI am trying to scrap some info from some websites using PHP CURL, the problem is it gives me wrong (different) content than opening it with normal browser

theThe example site is this: http://web.vecer.com/portali/vecer/v1/default.asp?kaj=3&id=2010091905576453

iI am trying to get the meta tags, in the browser it returns as:

 <meta name="title" content="Razmere v Preboldu se umirjajo" />  <meta name="description" content="Za prebivalci Prebolda je nemirna no&#269;, ki ji je sledilo jutro s &#353;e dodatnimi padavinami..." />  <link rel="image_src" href="http://web.vecer.com/portali/podatki/2010/09/19/slike/online_Prebold0-100.jpg" />  <link rel="target_url" href="http://web.vecer.com/portali/vecer/v1/default.asp?kaj=3&id=2010091905576453" />

but my curl gets this:

 <title>VECER.COM: </title>  <meta name="title" content="" />  <meta name="description" content="" />  <link rel="image_src" href="-100.jpg" />  <link rel="target_url" href="http://web.vecer.com/portali/vecer/v1/default.asp?kaj=3&id=1899123000000000"kaj=3&id=1899123000000000">

here is my code:

function curl($url){ $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.6 (KHTML, like Gecko) Chrome/16.0.897.0 Safari/535.6'); curl_setopt($ch, CURLOPT_HEADER, true); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_COOKIEFILE, "cookie.txt"); curl_setopt($ch, CURLOPT_COOKIEJAR, "cookie.txt"); curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30); curl_setopt($ch, CURLOPT_REFERER, "http://www.windowsphone.com"); $data = curl_exec($ch); curl_close($ch); return $data; }

anyone knows what i'mWhat I'm doing wrong?

i am trying to scrap some info from some websites using PHP CURL, the problem is it gives me wrong (different) content than opening it with normal browser

the example site is this: http://web.vecer.com/portali/vecer/v1/default.asp?kaj=3&id=2010091905576453

i am trying to get the meta tags, in the browser it returns as:

 <meta name="title" content="Razmere v Preboldu se umirjajo" />  <meta name="description" content="Za prebivalci Prebolda je nemirna no&#269;, ki ji je sledilo jutro s &#353;e dodatnimi padavinami..." />  <link rel="image_src" href="http://web.vecer.com/portali/podatki/2010/09/19/slike/online_Prebold0-100.jpg" />  <link rel="target_url" href="http://web.vecer.com/portali/vecer/v1/default.asp?kaj=3&id=2010091905576453" />

but my curl gets this:

 <title>VECER.COM: </title>  <meta name="title" content="" />  <meta name="description" content="" />  <link rel="image_src" href="-100.jpg" />  <link rel="target_url" href="http://web.vecer.com/portali/vecer/v1/default.asp?kaj=3&id=1899123000000000"

here is my code:

function curl($url){ $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.6 (KHTML, like Gecko) Chrome/16.0.897.0 Safari/535.6'); curl_setopt($ch, CURLOPT_HEADER, true); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_COOKIEFILE, "cookie.txt"); curl_setopt($ch, CURLOPT_COOKIEJAR, "cookie.txt"); curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30); curl_setopt($ch, CURLOPT_REFERER, "http://www.windowsphone.com"); $data = curl_exec($ch); curl_close($ch); return $data; }

anyone knows what i'm doing wrong?

I am trying to scrap some info from some websites using PHP CURL, the problem is it gives me wrong (different) content than opening it with normal browser

The example site is this: http://web.vecer.com/portali/vecer/v1/default.asp?kaj=3&id=2010091905576453

I am trying to get the meta tags, in the browser it returns as:

<meta name="title" content="Razmere v Preboldu se umirjajo" /> <meta name="description" content="Za prebivalci Prebolda je nemirna no&#269;, ki ji je sledilo jutro s &#353;e dodatnimi padavinami..." /> <link rel="image_src" href="http://web.vecer.com/portali/podatki/2010/09/19/slike/online_Prebold0-100.jpg" /> <link rel="target_url" href="http://web.vecer.com/portali/vecer/v1/default.asp?kaj=3&id=2010091905576453" />

but my curl gets this:

<title>VECER.COM: </title> <meta name="title" content="" /> <meta name="description" content="" /> <link rel="image_src" href="-100.jpg" /> <link rel="target_url" href="http://web.vecer.com/portali/vecer/v1/default.asp?kaj=3&id=1899123000000000">

here is my code:

function curl($url){ $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.6 (KHTML, like Gecko) Chrome/16.0.897.0 Safari/535.6'); curl_setopt($ch, CURLOPT_HEADER, true); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_COOKIEFILE, "cookie.txt"); curl_setopt($ch, CURLOPT_COOKIEJAR, "cookie.txt"); curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30); curl_setopt($ch, CURLOPT_REFERER, "http://www.windowsphone.com"); $data = curl_exec($ch); curl_close($ch); return $data; }

What I'm doing wrong?

Source Link

asked Jan 28, 2013 at 12:34

mire

scrapping with curl

i am trying to scrap some info from some websites using PHP CURL, the problem is it gives me wrong (different) content than opening it with normal browser

the example site is this: http://web.vecer.com/portali/vecer/v1/default.asp?kaj=3&id=2010091905576453

i am trying to get the meta tags, in the browser it returns as:

 <meta name="title" content="Razmere v Preboldu se umirjajo" /> <meta name="description" content="Za prebivalci Prebolda je nemirna no&#269;, ki ji je sledilo jutro s &#353;e dodatnimi padavinami..." /> <link rel="image_src" href="http://web.vecer.com/portali/podatki/2010/09/19/slike/online_Prebold0-100.jpg" /> <link rel="target_url" href="http://web.vecer.com/portali/vecer/v1/default.asp?kaj=3&id=2010091905576453" />

but my curl gets this:

 <title>VECER.COM: </title> <meta name="title" content="" /> <meta name="description" content="" /> <link rel="image_src" href="-100.jpg" /> <link rel="target_url" href="http://web.vecer.com/portali/vecer/v1/default.asp?kaj=3&id=1899123000000000"

here is my code:

function curl($url){ $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.6 (KHTML, like Gecko) Chrome/16.0.897.0 Safari/535.6'); curl_setopt($ch, CURLOPT_HEADER, true); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_COOKIEFILE, "cookie.txt"); curl_setopt($ch, CURLOPT_COOKIEJAR, "cookie.txt"); curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30); curl_setopt($ch, CURLOPT_REFERER, "http://www.windowsphone.com"); $data = curl_exec($ch); curl_close($ch); return $data; }

anyone knows what i'm doing wrong?

Collectives™ on Stack Overflow

Return to Question

scrapping scraping with curl

scrapping with curl

scraping with curl

scrapping with curl