I have a dataframe that looks like this:
country <- c("Canada", "US", "Japan", "China") url <- c("http://en.wikipedia.org/wiki/United_States", "http://en.wikipedia.org/wiki/Canada", "http://en.wikipedia.org/wiki/Japan", "http://en.wikipedia.org/wiki/China") df <- data.frame(country, url) country link 1 Canada http://en.wikipedia.org/wiki/United_States 2 US http://en.wikipedia.org/wiki/Canada 3 Japan http://en.wikipedia.org/wiki/Japan 4 China http://en.wikipedia.org/wiki/China Using rvest I'd like to scrape the table of contents for each url and bind them to one single output.
This code extracts the table of contents for one url:
library(rvest) toc <- html(url) %>% html_nodes(".toctext") %>% html_text() Desired Output:
country toc US Etymology History Native American and European contact Settlements ... Canada Etymology History Aboriginal peoples European colonization ...etc