4

I have a dataframe that looks like this:

country <- c("Canada", "US", "Japan", "China") url <- c("http://en.wikipedia.org/wiki/United_States", "http://en.wikipedia.org/wiki/Canada", "http://en.wikipedia.org/wiki/Japan", "http://en.wikipedia.org/wiki/China") df <- data.frame(country, url) country link 1 Canada http://en.wikipedia.org/wiki/United_States 2 US http://en.wikipedia.org/wiki/Canada 3 Japan http://en.wikipedia.org/wiki/Japan 4 China http://en.wikipedia.org/wiki/China 

Using rvest I'd like to scrape the table of contents for each url and bind them to one single output.

This code extracts the table of contents for one url:

library(rvest) toc <- html(url) %>% html_nodes(".toctext") %>% html_text() 

Desired Output:

country toc US Etymology History Native American and European contact Settlements ... Canada Etymology History Aboriginal peoples European colonization ...etc 

1 Answer 1

5

This will scrape them into a full data frame (one row per TOC entry). Tedious-but-straightforward "print/output" code left to the OP:

library(rvest) library(dplyr) country <- c("Canada", "US", "Japan", "China") url <- c("http://en.wikipedia.org/wiki/United_States", "http://en.wikipedia.org/wiki/Canada", "http://en.wikipedia.org/wiki/Japan", "http://en.wikipedia.org/wiki/China") df <- data.frame(country, url) bind_rows(lapply(url, function(x) { data.frame(url=x, toc_entry=toc <- html(url[1]) %>% html_nodes(".toctext") %>% html_text()) })) -> toc_entries df <- toc_entries %>% left_join(df) df[sample(nrow(df), 10),] ## Source: local data frame [10 x 3] ## ## url toc_entry country ## 1 http://en.wikipedia.org/wiki/Japan Government finance Japan ## 2 http://en.wikipedia.org/wiki/Canada Cold War and civil rights era US ## 3 http://en.wikipedia.org/wiki/United_States Food Canada ## 4 http://en.wikipedia.org/wiki/Japan Sports Japan ## 5 http://en.wikipedia.org/wiki/Canada Religion US ## 6 http://en.wikipedia.org/wiki/China Cold War and civil rights era China ## 7 http://en.wikipedia.org/wiki/Japan Literature, philosophy, and the arts Japan ## 8 http://en.wikipedia.org/wiki/United_States Population Canada ## 9 http://en.wikipedia.org/wiki/Japan Settlements Japan ## 10 http://en.wikipedia.org/wiki/Canada Military US 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.