⚠️ IMPORTANT: This package is no longer maintained or supported. For the latest updates, please use our new package at crawlbase-php.
A lightweight, dependency free PHP class that acts as wrapper for ProxyCrawl API.
Choose a way of installing:
- Use Packagist PHP package manager.
- Download the project from Github and save it into your project so you can require it
require_once('proxycrawl-php/src/[class].php')
First initialize the CrawlingAPI class. You can get your free token here.
$api = new ProxyCrawl\CrawlingAPI(['token' => 'YOUR_PROXYCRAWL_TOKEN']);Pass the url that you want to scrape plus any options from the ones available in the API documentation.
$api->get(string $url, array $options = []);Example:
$response = $api->get('https://www.facebook.com/britneyspears'); if ($response->statusCode === 200) { echo $response->body; }You can pass any options from ProxyCrawl API.
Example:
$response = $api->get('https://www.reddit.com/r/pics/comments/5bx4bx/thanks_obama/', [ 'user_agent' => 'Mozilla/5.0 (Windows NT 6.2; rv:20.0) Gecko/20121202 Firefox/30.0', 'format' => 'json' ]); if ($response->statusCode === 200) { echo $response->body; }Optionally pass store parameter to true to store a copy of the API response in the ProxyCrawl Cloud Storage.
Example:
$response = $api->get('https://www.reddit.com/r/pics/comments/5bx4bx/thanks_obama/', [ 'store' => true ]); if ($response->statusCode === 200) { echo 'storage url: ' . $response->headers->storage_url . PHP_EOL; }Pass the url that you want to scrape, the data that you want to send which can be either a json or a string, plus any options from the ones available in the API documentation.
$api->post(string $url, array or string $data, array options = []);Example:
$response = $api->post('https://producthunt.com/search', ['text' => 'example search']); if ($response->statusCode === 200) { echo $response->body; }You can send the data as application/json instead of x-www-form-urlencoded by setting option post_content_type as json.
$response = $api->post('https://httpbin.org/post', json_encode(['some_json' => 'with some value']), ['post_content_type' => 'json']); if ($response->statusCode === 200) { echo $response->body; }Pass the url that you want to scrape, the data that you want to send which can be either a json or a string, plus any options from the ones available in the API documentation.
$api->put(string $url, array or string $data, array options = []);Example:
$response = $api->put('https://producthunt.com/search', ['text' => 'example search']); if ($response->statusCode === 200) { echo $response->body; }If you need to scrape any website built with Javascript like React, Angular, Vue, etc. You just need to pass your javascript token and use the same calls. Note that only ->get is available for javascript and not ->post.
$api = new ProxyCrawl\CrawlingAPI(['token' => 'YOUR_JAVASCRIPT_TOKEN']);$response = $api->get('https://www.nfl.com'); if ($response->statusCode === 200) { echo $response->body; }Same way you can pass javascript additional options.
$response = $api->get('https://www.freelancer.com', ['page_wait' => 5000]); if ($response->statusCode === 200) { echo $response->body; }You can always get the original status and proxycrawl status from the response. Read the ProxyCrawl documentation to learn more about those status.
$response = $api->get('https://craiglist.com'); echo $response->headers->original_status . PHP_EOL; echo $response->headers->pc_status . PHP_EOL;First initialize the ScraperAPI class. You can get your free token here. Please note that only some websites are supported, check the API documentation for more information.
$api = new ProxyCrawl\ScraperAPI(['token' => 'YOUR_PROXYCRAWL_TOKEN']);Pass the url that you want to scrape plus any options from the ones available in the API documentation.
Example:
$response = $api->get('https://www.amazon.com/DualSense-Wireless-Controller-PlayStation-5/dp/B08FC6C75Y/'); echo 'status code: ' . $response->statusCode . PHP_EOL; if ($response->statusCode === 200) { var_dump($response->json); // Will print scraped Amazon details }First initialize the LeadsAPI class. You can get your free token here.
$api = new ProxyCrawl\LeadsAPI(['token' => 'YOUR_PROXYCRAWL_TOKEN']);Pass the domain where you want to search for leads.
Example:
$response = $api->getFromDomain('target.com'); if ($response->statusCode === 200) { foreach ($response->json->leads as $key => $lead) { echo $lead->email . PHP_EOL; } }Initialize with your Screenshots API token and call the get method.
$api = new ProxyCrawl\ScreenshotsAPI(['token' => 'YOUR_PROXYCRAWL_TOKEN']); $response = $api->get('https://www.apple.com'); echo 'success: ' . $response->headers->success . PHP_EOL; echo 'remaining requests: ' . $response->headers->remaining_requests . PHP_EOL; file_put_contents('apple.jpg', $response->body);or you can specify a callback that automatically saves the file to the temporary folder
$api = new ProxyCrawl\ScreenshotsAPI(['token' => 'YOUR_PROXYCRAWL_TOKEN']); $response = $api->get('https://www.apple.com', [ 'callback' => function($filepath) { echo 'filepath: ' . $filepath . PHP_EOL; } ]); echo 'success: ' . $response->headers->success . PHP_EOL; echo 'remaining requests: ' . $response->headers->remaining_requests . PHP_EOL;or specifying a file path via saveToPath option
$api = new ProxyCrawl\ScreenshotsAPI(['token' => 'YOUR_PROXYCRAWL_TOKEN']); $response = $api->get('https://www.apple.com', [ 'saveToPath' => 'apple.jpg', 'callback' => function($filepath) { echo 'filepath: ' . $filepath . PHP_EOL; } ]); echo 'success: ' . $response->headers->success . PHP_EOL; echo 'remaining requests: ' . $response->headers->remaining_requests . PHP_EOL;Note that $api.get(url, options) method accepts an options
Initialize the Storage API using your private token.
$api = new ProxyCrawl\StorageAPI(['token' => 'YOUR_PROXYCRAWL_TOKEN']);Pass the url that you want to get from Proxycrawl Storage.
$response = $api->get('https://www.apple.com'); echo 'status code: ' . $response->statusCode . PHP_EOL; if ($response->statusCode === 200) { echo 'body: ' . $response->body . PHP_EOL; echo 'original status: ' . $response->headers->original_status . PHP_EOL; echo 'proxycrawl status: ' . $response->headers->pc_status . PHP_EOL; echo 'rid: ' . $response->headers->rid . PHP_EOL; echo 'url: ' . $response->headers->url . PHP_EOL; echo 'stored date: ' . $response->headers->stored_at . PHP_EOL; }or you can use the RID
$response = $api->get('RID_REPLACE'); echo 'status code: ' . $response->statusCode . PHP_EOL; if ($response->statusCode === 200) { echo 'body: ' . $response->body . PHP_EOL; echo 'original status: ' . $response->headers->original_status . PHP_EOL; echo 'proxycrawl status: ' . $response->headers->pc_status . PHP_EOL; echo 'rid: ' . $response->headers->rid . PHP_EOL; echo 'url: ' . $response->headers->url . PHP_EOL; echo 'stored date: ' . $response->headers->stored_at . PHP_EOL; }Note: One of the two RID or URL must be sent. So both are optional but it's mandatory to send one of the two.
Delete request
To delete a storage item from your storage area, use the correct RID
if ($api->delete('RID_REPLACE')) { echo 'delete success' . PHP_EOL; echo 'status code: ' . $api->response->statusCode . PHP_EOL; } else { echo 'delete failed' . PHP_EOL; echo 'status code: ' . $api->response->statusCode . PHP_EOL; }Bulk request
To do a bulk request with a list of RIDs, please send the list of rids as an array
$items = $api->bulk(['RID1', 'RID2', 'RID3', ...]); foreach ($items as $item) { echo 'body: ' . $item->body . PHP_EOL; echo 'stored at: ' . $item->stored_at . PHP_EOL; echo 'original status: ' . $item->original_status . PHP_EOL; echo 'proxycrawl status: ' . $item->pc_status . PHP_EOL; echo 'rid: ' . $item->rid . PHP_EOL; echo 'url: ' . $item->url . PHP_EOL; echo PHP_EOL; }RIDs request
To request a bulk list of RIDs from your storage area
$rids = $api->rids(); foreach ($rids as $rid) { echo $rid . PHP_EOL; }You can also specify a limit as a parameter
$rids = $api->rids(10);To get the total number of documents in your storage area
$totalCount = $api->totalCount(); echo 'total count: ' . $totalCount . PHP_EOL;If you have questions or need help using the library, please open an issue or contact us.
Copyright 2023 ProxyCrawl