Scraping information from a webpage that has a table spanning many pages

Question

I'm using the rvest package in R and would like to scrape some data from a table that only includes about 40% of the total information. I followed this blog post, but it doesn't specify how to scrape data when there is no difference in the HTML address for the different pages. This website is the one I'm trying to obtain some job listing data from.

I've successfully retrieved the data on the first page using this code:

job_page <- read_html( 'page_address' ) data_raw <- job_page %>% html_node('table') %>% html_text()

Is it possible to scrape the webpage when the HTML address is NOT different for multiple pages of data? My hope is to use lapply to iterate over the multiple pages in some way.

Yifu Yan · Accepted Answer · 2018-06-21 23:21:48Z

Try this URL instead, it should give you all results in one page:

http://explore.msujobs.msstate.edu/cw/en-us/filter/?search-keyword=&job-mail-subscribe-privacy=agree&location=main%20campus%20-%20starkville%20ms&category=faculty&page=1&page-items=100

You can open the developer tools in Chrome and select Network tab. You can examine the request and tweak searching parameters.

Collectives™ on Stack Overflow

Scraping information from a webpage that has a table spanning many pages

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related