Attempt to get HTML from website using PHP cURL does not work

Question

I am attempting to write a script that can retrieve the HTML from my school's schedule search webpage. I am able to visit the web page normally when I visit it using a browser, but when I try to get it to work using cURL, it gets the HTML from the redirected page. When I changed the

CURLOPT_FOLLOWLOCATION

variable from true to false, it only outputs a blank page with the headers sent.

For reference, my PHP code is

<?php $curl_connection = curl_init('https://www.registrar.usf.edu/ssearch/'); curl_setopt($curl_connection, CURLOPT_CONNECTTIMEOUT, 30); curl_setopt($curl_connection, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"); curl_setopt($curl_connection, CURLOPT_RETURNTRANSFER, true); curl_setopt($curl_connection, CURLOPT_SSL_VERIFYPEER, false); curl_setopt($curl_connection, CURLOPT_FOLLOWLOCATION, false); curl_setopt($curl_connection, CURLOPT_HEADER, true); curl_setopt($curl_connection, CURLOPT_REFERER, "https://www.registrar.usf.edu/"); $result = curl_exec($curl_connection); print $result; ?>

The website that I am trying to get the HTML of from cURL is https://www.registrar.usf.edu/ssearch/ or https://www.registrar.usf.edu/ssearch/search.php

Any ideas?

The page is pushing a couple of cookies: cookie_test=cookie_set; PHPSESSID=nijdlbfqe2dfqqege40eh7lai4 — gview
– gview, Commented May 9, 2012 at 6:30

Kishor · Accepted Answer · 2012-05-09 06:41:19Z

I added 2 lines more, which now saves cookies which decides whether to redirect you when you try scraping the shedule's page.

$curl_connection = curl_init(); $url = "https://www.registrar.usf.edu/ssearch/search.php"; curl_setopt($curl_connection, CURLOPT_URL, $url); curl_setopt($curl_connection, CURLOPT_CONNECTTIMEOUT, 30); curl_setopt($curl_connection, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"); curl_setopt($curl_connection, CURLOPT_RETURNTRANSFER, true); curl_setopt($curl_connection, CURLOPT_SSL_VERIFYPEER, false); curl_setopt ($curl_connection, CURLOPT_COOKIEJAR, 'cookie.txt');//cookiejar to dump cookie infos. curl_setopt ($curl_connection, CURLOPT_COOKIEFILE, 'cookie.txt');//cookie file for further reference from the site curl_setopt($curl_connection, CURLOPT_FOLLOWLOCATION, true); curl_setopt($curl_connection, CURLOPT_HEADER, true); curl_setopt($curl_connection, CURLOPT_REFERER, "https://www.registrar.usf.edu/"); $result = curl_exec($curl_connection); echo $result;

Also, I havent seen anyone putting urls in curl_init yet.

Here is the cookie :

# Netscape HTTP Cookie File # http://curl.haxx.se/rfc/cookie_spec.html # This file was generated by libcurl! Edit at your own risk. www.registrar.usf.edu FALSE / FALSE 0 PHPSESSID eied78t0v1qlqcop0rdk214361 www.registrar.usf.edu FALSE /ssearch/ FALSE 1336718465 cookie_test cookie_set

If you ever wanna debug a non working curl stuff, start with var_dump(curl_getinfo($curl_connection)); and next one to check is curl_error($curl_connection);

Great, that works, thanks! I looked at my cookie file but it doesn't look like it's written anything. I guess the website is looking that I can accept cookies but doesn't need them for anything useful. Do you think it's weird that it redirects to the main page instead of asking to accept cookies?
Its moreover like, if the site cant read the cookies back from us, redirect to home. So even if we save cookies, if we remove curlopt_cookiefile from where the site reads the cookie, it will redirect us to home. Maybe, your uni needs protection for people who hate cookies :D

Collectives™ on Stack Overflow

Attempt to get HTML from website using PHP cURL does not work

1 Answer 1

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Related