Skip to content

anshu-krishna/HTML-Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HTML Scraper

A set of PHP classes to simplify data extraction from HTML.


Base code for the CSS_to_Xpath method in HTML_Scraper was cloned from https://github.com/zendframework/zend-dom.
Zend Framework : http://framework.zend.com/
Repository : http://github.com/zendframework/zf2
Copyright (c) 2005-2015 Zend Technologies USA Inc. http://www.zend.com
License : https://framework.zend.com/license New BSD License


For basic documentation see the DOC file.

Example

<?php require_once 'HTML_Scraper.php'; $doc = new HTML_Scraper; if(!$doc->load_HTML_file('https://www.royalroad.com/fiction/10073/the-wandering-inn')) { echo 'Unable to load data'; exit(1); } $data = []; $data['title'] = $doc->querySelector_extract('textContentTrim', 'div.fic-title h1[property="name"]', 0); $data['url'] = $doc->xpath_extract(function($meta) { return $meta->getAttribute('content'); }, '//meta[@property="og:url"]', 0); $data['description'] = $doc->querySelector_extract(function(&$div) { return trim(DOMNodeHelper::innerHTML($div)); }, 'div.description div[property="description"]', 0); $data['tags'] = $doc->querySelector_extract('textContentTrim', 'span.tags span[property="genre"]'); var_dump($data); ?>

About

A PHP class to simplify data extraction from HTML.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published