Webscraping Table

Question

I am trying to webscrape the table from this following page (https://www.coya.com/bike/fahrrad-index-2019), namely the values the bike index for 50 german cities (if u click "Alle Ergebnisse +", you ll see all 50 cities.

I need especially some columns ("Bewertung spezielle Radwege & Qualität der Radwege", "Investitionen & QUalität der Infrastruktur", "Bewertung der Infrastruktur", "Fahrradsharing-Score", "Autofreier Tag", "Critical-Mass-Fahrrad-aktionen, "Event-Score).

This is what I tried:

library(rvest) num_link="https://www.coya.com/bike/fahrrad-index-2019" num_page= read_html(num_link) xyc= num_page %>% html_nodes("._1200:nth-child(2)") %>% html_text()

I tried Selectorgadget, unfortunately I get all the values of the table in a long String (str_split is challenging, because commas in numbers got mixed with commas between the numbers:

 "[1] "Ergebnisse für DeutschlandKriminalitätInfrastrukturFahrrad-SharingEvents#StadtLandSizeTotal Score1OldenburgDeutschlandK57,90,4271,94588,3594,4684,5227,153,0590,3454,1836,4515,0525,75N31,5216,2669,122MünsterDeutschlandK58,740,3910,53445,5883,0488,4328,1551,2388,0453,0535,522630,76N23,8412,4265,933Freiburg i. Breisg.DeutschlandK59,350,"

Could someone help me scraping the table, if possible, especially only some values of specific columns (see above)? Very thankful for any help/tip.

Thank you in advance. (I am a newbie, please be gentle.)

Sinh Nguyen · Accepted Answer · 2021-04-08 01:48:21Z

Here is one way to solve the puzzle. Though the row names use a lot of icons so I just leave empty column name. You can create a vector names and assign them manually using

names(table_content) <- names_vector

Here is the code

library(rvest) #> Loading required package: xml2 library(dplyr, warn.conflicts = FALSE) library(purrr) # Here is just reuse your code num_link <- "https://www.coya.com/bike/fahrrad-index-2019" num_page <- read_html(num_link) # Extract the items from your code but go further down table_content <- num_page %>% html_nodes("._1200:nth-child(2)") %>% # Extract the node that contain the table html_nodes(css = ".w-dyn-list") %>% # Extract the nodes corresponded to each row html_nodes(css = ".bike-collection-item") %>% # Then map the function that take each rows in and convert them to a table # and bind them together into one table map_dfr(function(x) { # suppress the message due to no column name was feed into map_dfc suppressMessages( x %>% html_nodes(".td") %>% map_dfc(function(x) { x %>% html_text }) ) })

Here is the extracted content

#> # A tibble: 70 x 21 #> ...1 ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10 ...11 ...12 ...13 #> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> #> 1 1 Olde… Deut… K 57,9 0,427 1,94 588,… 94,46 84,52 27,1 53,05 90,34 #> 2 2 Müns… Deut… K 58,74 0,391 0,53 445,… 83,04 88,43 28,15 51,23 88,04 #> 3 3 Frei… Deut… K 59,35 0,34 2,27 962,… 88,87 77,52 32,57 48,11 93,49 #> 4 4 Bamb… Deut… K 55,59 0,302 0 456,… 89,04 92,66 30,29 47,74 93,75 #> 5 5 Gött… Deut… K 62,66 0,28 3,07 379,… 92,8 80,99 23,03 48,07 89,18 #> 6 6 Heid… Deut… K 63,14 0,22 1,21 394,… 90,39 88,33 29,02 47,88 94,21 #> 7 7 Karl… Deut… K 57,39 0,25 4,23 725,… 90,35 71,62 18,75 46,33 93,93 #> 8 8 Brau… Deut… K 67,36 0,21 0 522,… 85,89 90,97 20,55 49,2 89,78 #> 9 9 Kons… Deut… K 62,77 0,22 4,6 121,… 93,62 76,98 23 48,49 94,09 #> 10 10 Brem… Deut… M 58,86 0,21 1,38 334,… 87,34 87,15 18,64 59,78 94,64 #> # … with 60 more rows, and 8 more variables: ...14 <chr>, ...15 <chr>, #> # ...16 <chr>, ...17 <chr>, ...18 <chr>, ...19 <chr>, ...20 <chr>, #> # ...21 <chr>

^{Created on 2021-04-08 by the reprex package (v1.0.0)}

Elin · Accepted Answer · 2021-04-08 02:43:25Z

You probably want to do one table at a time and then the columns one at a time, so you can then create a data frame. Try this for example:

col1 <- num_page %>% html_nodes(paste0(".w-dyn-item :nth-child(2) div")) %>% html_text()

The selector gadget is nifty but I usually need to experiment a lot to get the right selectors.

Collectives™ on Stack Overflow

Webscraping Table

2 Answers 2

Here is the extracted content

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Here is the extracted content

Comments

Comments

Related