A very simple example link https://www.accessdata.fda.gov/scripts/cder/daf/index.cfm.
Even wget without any header information can successfully scrape the information.
However, casperjs just not work
var casper=require("casper").create(); var mouse=require("mouse").create(casper); var link="https://www.accessdata.fda.gov/scripts/cder/daf/index.cfm"; casper.start().then(function() { this.open(link); this.wait(5000); }); casper.run(function(){ this.echo(this.getPageContent()).exit(); }); It always output
<html><head></head><body></body></html> add header info does not help, like below
this.open(link, { method: 'get', authority: 'www.accessdata.fda.gov', path: '/scripts/cder/daf/index.cfm', scheme: 'https', headers: { 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.111 Safari/537.36', 'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9', 'accept-encoding': 'gzip, deflate, br', 'accept-language': 'en-US,en;q=0.9,zh-TW;q=0.8,zh;q=0.7,zh-CN;q=0.6,ja;q=0.5', 'cache-control': 'max-age=0', 'sec-fetch-dest': 'document', 'sec-fetch-mode': 'navigate', 'sec-fetch-site': 'none', 'sec-fetch-user': '?1', 'upgrade-insecure-requests': '1' } }); I tried many combinations of header style but just not work.
However, it is noteworthy that the casperjs code above works for certain website like http://docs.casperjs.org/en/latest/selectors.html