Skip to main content
added 1 character in body; edited title
Source Link

scrapping scraping with curl

I am trying to scrapscrape some info from some websites using PHP CURL, the problem is it gives me wrong (different) content than opening it with normal browser

The example site is this: http://web.vecer.com/portali/vecer/v1/default.asp?kaj=3&id=2010091905576453

I am trying to get the meta tags, in the browser it returns as:

<meta name="title" content="Razmere v Preboldu se umirjajo" /> <meta name="description" content="Za prebivalci Prebolda je nemirna no&#269;, ki ji je sledilo jutro s &#353;e dodatnimi padavinami..." /> <link rel="image_src" href="http://web.vecer.com/portali/podatki/2010/09/19/slike/online_Prebold0-100.jpg" /> <link rel="target_url" href="http://web.vecer.com/portali/vecer/v1/default.asp?kaj=3&id=2010091905576453" /> 

but my curl gets this:

<title>VECER.COM: </title> <meta name="title" content="" /> <meta name="description" content="" /> <link rel="image_src" href="-100.jpg" /> <link rel="target_url" href="http://web.vecer.com/portali/vecer/v1/default.asp?kaj=3&id=1899123000000000"> 

here is my code:

function curl($url){ $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.6 (KHTML, like Gecko) Chrome/16.0.897.0 Safari/535.6'); curl_setopt($ch, CURLOPT_HEADER, true); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_COOKIEFILE, "cookie.txt"); curl_setopt($ch, CURLOPT_COOKIEJAR, "cookie.txt"); curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30); curl_setopt($ch, CURLOPT_REFERER, "http://www.windowsphone.com"); $data = curl_exec($ch); curl_close($ch); return $data; } 

What I'm doing wrong?

scrapping with curl

I am trying to scrap some info from some websites using PHP CURL, the problem is it gives me wrong (different) content than opening it with normal browser

The example site is this: http://web.vecer.com/portali/vecer/v1/default.asp?kaj=3&id=2010091905576453

I am trying to get the meta tags, in the browser it returns as:

<meta name="title" content="Razmere v Preboldu se umirjajo" /> <meta name="description" content="Za prebivalci Prebolda je nemirna no&#269;, ki ji je sledilo jutro s &#353;e dodatnimi padavinami..." /> <link rel="image_src" href="http://web.vecer.com/portali/podatki/2010/09/19/slike/online_Prebold0-100.jpg" /> <link rel="target_url" href="http://web.vecer.com/portali/vecer/v1/default.asp?kaj=3&id=2010091905576453" /> 

but my curl gets this:

<title>VECER.COM: </title> <meta name="title" content="" /> <meta name="description" content="" /> <link rel="image_src" href="-100.jpg" /> <link rel="target_url" href="http://web.vecer.com/portali/vecer/v1/default.asp?kaj=3&id=1899123000000000"> 

here is my code:

function curl($url){ $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.6 (KHTML, like Gecko) Chrome/16.0.897.0 Safari/535.6'); curl_setopt($ch, CURLOPT_HEADER, true); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_COOKIEFILE, "cookie.txt"); curl_setopt($ch, CURLOPT_COOKIEJAR, "cookie.txt"); curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30); curl_setopt($ch, CURLOPT_REFERER, "http://www.windowsphone.com"); $data = curl_exec($ch); curl_close($ch); return $data; } 

What I'm doing wrong?

scraping with curl

I am trying to scrape some info from some websites using PHP CURL, the problem is it gives me wrong (different) content than opening it with normal browser

The example site is this: http://web.vecer.com/portali/vecer/v1/default.asp?kaj=3&id=2010091905576453

I am trying to get the meta tags, in the browser it returns as:

<meta name="title" content="Razmere v Preboldu se umirjajo" /> <meta name="description" content="Za prebivalci Prebolda je nemirna no&#269;, ki ji je sledilo jutro s &#353;e dodatnimi padavinami..." /> <link rel="image_src" href="http://web.vecer.com/portali/podatki/2010/09/19/slike/online_Prebold0-100.jpg" /> <link rel="target_url" href="http://web.vecer.com/portali/vecer/v1/default.asp?kaj=3&id=2010091905576453" /> 

but my curl gets this:

<title>VECER.COM: </title> <meta name="title" content="" /> <meta name="description" content="" /> <link rel="image_src" href="-100.jpg" /> <link rel="target_url" href="http://web.vecer.com/portali/vecer/v1/default.asp?kaj=3&id=1899123000000000"> 

here is my code:

function curl($url){ $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.6 (KHTML, like Gecko) Chrome/16.0.897.0 Safari/535.6'); curl_setopt($ch, CURLOPT_HEADER, true); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_COOKIEFILE, "cookie.txt"); curl_setopt($ch, CURLOPT_COOKIEJAR, "cookie.txt"); curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30); curl_setopt($ch, CURLOPT_REFERER, "http://www.windowsphone.com"); $data = curl_exec($ch); curl_close($ch); return $data; } 

What I'm doing wrong?

iI am trying to scrap some info from some websites using PHP CURL, the problem is it gives me wrong (different) content than opening it with normal browser

theThe example site is this: http://web.vecer.com/portali/vecer/v1/default.asp?kaj=3&id=2010091905576453

iI am trying to get the meta tags, in the browser it returns as:

 <meta name="title" content="Razmere v Preboldu se umirjajo" />  <meta name="description" content="Za prebivalci Prebolda je nemirna no&#269;, ki ji je sledilo jutro s &#353;e dodatnimi padavinami..." />  <link rel="image_src" href="http://web.vecer.com/portali/podatki/2010/09/19/slike/online_Prebold0-100.jpg" />  <link rel="target_url" href="http://web.vecer.com/portali/vecer/v1/default.asp?kaj=3&id=2010091905576453" /> 

but my curl gets this:

 <title>VECER.COM: </title>  <meta name="title" content="" />  <meta name="description" content="" />  <link rel="image_src" href="-100.jpg" />  <link rel="target_url" href="http://web.vecer.com/portali/vecer/v1/default.asp?kaj=3&id=1899123000000000"kaj=3&id=1899123000000000"> 

here is my code:

function curl($url){ $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.6 (KHTML, like Gecko) Chrome/16.0.897.0 Safari/535.6'); curl_setopt($ch, CURLOPT_HEADER, true); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_COOKIEFILE, "cookie.txt"); curl_setopt($ch, CURLOPT_COOKIEJAR, "cookie.txt"); curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30); curl_setopt($ch, CURLOPT_REFERER, "http://www.windowsphone.com"); $data = curl_exec($ch); curl_close($ch); return $data; } 

anyone knows what i'mWhat I'm doing wrong?

i am trying to scrap some info from some websites using PHP CURL, the problem is it gives me wrong (different) content than opening it with normal browser

the example site is this: http://web.vecer.com/portali/vecer/v1/default.asp?kaj=3&id=2010091905576453

i am trying to get the meta tags, in the browser it returns as:

 <meta name="title" content="Razmere v Preboldu se umirjajo" />  <meta name="description" content="Za prebivalci Prebolda je nemirna no&#269;, ki ji je sledilo jutro s &#353;e dodatnimi padavinami..." />  <link rel="image_src" href="http://web.vecer.com/portali/podatki/2010/09/19/slike/online_Prebold0-100.jpg" />  <link rel="target_url" href="http://web.vecer.com/portali/vecer/v1/default.asp?kaj=3&id=2010091905576453" /> 

but my curl gets this:

 <title>VECER.COM: </title>  <meta name="title" content="" />  <meta name="description" content="" />  <link rel="image_src" href="-100.jpg" />  <link rel="target_url" href="http://web.vecer.com/portali/vecer/v1/default.asp?kaj=3&id=1899123000000000" 

here is my code:

function curl($url){ $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.6 (KHTML, like Gecko) Chrome/16.0.897.0 Safari/535.6'); curl_setopt($ch, CURLOPT_HEADER, true); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_COOKIEFILE, "cookie.txt"); curl_setopt($ch, CURLOPT_COOKIEJAR, "cookie.txt"); curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30); curl_setopt($ch, CURLOPT_REFERER, "http://www.windowsphone.com"); $data = curl_exec($ch); curl_close($ch); return $data; } 

anyone knows what i'm doing wrong?

I am trying to scrap some info from some websites using PHP CURL, the problem is it gives me wrong (different) content than opening it with normal browser

The example site is this: http://web.vecer.com/portali/vecer/v1/default.asp?kaj=3&id=2010091905576453

I am trying to get the meta tags, in the browser it returns as:

<meta name="title" content="Razmere v Preboldu se umirjajo" /> <meta name="description" content="Za prebivalci Prebolda je nemirna no&#269;, ki ji je sledilo jutro s &#353;e dodatnimi padavinami..." /> <link rel="image_src" href="http://web.vecer.com/portali/podatki/2010/09/19/slike/online_Prebold0-100.jpg" /> <link rel="target_url" href="http://web.vecer.com/portali/vecer/v1/default.asp?kaj=3&id=2010091905576453" /> 

but my curl gets this:

<title>VECER.COM: </title> <meta name="title" content="" /> <meta name="description" content="" /> <link rel="image_src" href="-100.jpg" /> <link rel="target_url" href="http://web.vecer.com/portali/vecer/v1/default.asp?kaj=3&id=1899123000000000"> 

here is my code:

function curl($url){ $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.6 (KHTML, like Gecko) Chrome/16.0.897.0 Safari/535.6'); curl_setopt($ch, CURLOPT_HEADER, true); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_COOKIEFILE, "cookie.txt"); curl_setopt($ch, CURLOPT_COOKIEJAR, "cookie.txt"); curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30); curl_setopt($ch, CURLOPT_REFERER, "http://www.windowsphone.com"); $data = curl_exec($ch); curl_close($ch); return $data; } 

What I'm doing wrong?

Source Link
mire
  • 45
  • 9

scrapping with curl

i am trying to scrap some info from some websites using PHP CURL, the problem is it gives me wrong (different) content than opening it with normal browser

the example site is this: http://web.vecer.com/portali/vecer/v1/default.asp?kaj=3&id=2010091905576453

i am trying to get the meta tags, in the browser it returns as:

 <meta name="title" content="Razmere v Preboldu se umirjajo" /> <meta name="description" content="Za prebivalci Prebolda je nemirna no&#269;, ki ji je sledilo jutro s &#353;e dodatnimi padavinami..." /> <link rel="image_src" href="http://web.vecer.com/portali/podatki/2010/09/19/slike/online_Prebold0-100.jpg" /> <link rel="target_url" href="http://web.vecer.com/portali/vecer/v1/default.asp?kaj=3&id=2010091905576453" /> 

but my curl gets this:

 <title>VECER.COM: </title> <meta name="title" content="" /> <meta name="description" content="" /> <link rel="image_src" href="-100.jpg" /> <link rel="target_url" href="http://web.vecer.com/portali/vecer/v1/default.asp?kaj=3&id=1899123000000000" 

here is my code:

function curl($url){ $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.6 (KHTML, like Gecko) Chrome/16.0.897.0 Safari/535.6'); curl_setopt($ch, CURLOPT_HEADER, true); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_COOKIEFILE, "cookie.txt"); curl_setopt($ch, CURLOPT_COOKIEJAR, "cookie.txt"); curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30); curl_setopt($ch, CURLOPT_REFERER, "http://www.windowsphone.com"); $data = curl_exec($ch); curl_close($ch); return $data; } 

anyone knows what i'm doing wrong?