1

I need to plot a time series graph but the data that I'm using is proving to be quite challenging.

Ideally, I'd like a graph that looks something like this: enter image description here

But mine looks like this:

enter image description here

I have tried a series of different things but none of them have worked.

The dataset can be found here and I'll attach a picture of what the dataset itself looks like:enter image description here

some code I have tried includes

 ggplot( aes(x=date, y=northEast)) + geom_area(fill="#69b3a2", alpha=0.5) + geom_line(color="#69b3a2") + ylab("test") + theme_ipsum() 
ggplot(covidData2) + geom_line( mapping = aes(x = weekBeginning, y=northEast, group=northEast) ) 

Any help would be greatly appreciated!

2
  • Can you dput() a sample of the data? It's not trivial to find the data and import it with reasonable column names. Commented Mar 8, 2022 at 20:50
  • structure(list(weekBeginning = c(NA, "Period", "01/09/2020 - 07/09/2020", continues with dates northEast = c("North East", "Number of cases", "953", "1052", "2344", "3532", "5215", "5562", continues with numbers and It continues with different regions Commented Mar 8, 2022 at 20:57

1 Answer 1

3

You need to tidy your data up before plotting it. If you look at your data frame, all of the "numeric" columns have been interpreted as character vectors because the column names are nested and therefore appear in the first couple of rows. You need to consolidate these and convert them to column names. Then, you need to convert the numeric columns to numbers. Finally, you need to parse the dates, as ggplot will simply read the periods as character vectors:

library(readxl) library(lubridate) library(ggplot2) library(hrbrthemes) wb <- read_xlsx(path.expand("~/covid.xlsx"), sheet = "Table 9") df <- as.data.frame(wb) df[1, 1] <- "" for(i in 2:length(df)) { if(is.na(df[1, i])) df[1, i] <- df[1, i - 1] } nms <- trimws(paste(df[1,], df[2,])) df <- df[-c(1:2),] names(df) <- nms df <- df[sapply(df, function(x) !all(is.na(x)))] df[-1] <- lapply(df[-1], as.numeric) df <- head(df, -3) df$Period <- dmy(substr(df$Period, 1, 10)) 

Now we can plot:

ggplot(df, aes(x = Period, y = `North East Rate`)) + geom_area(fill = "#69b3a2", alpha=0.5) + geom_line(color = "#69b3a2") + ylab("Rate per 100,000") + xlab("") + theme_ipsum() 

Created on 2022-03-08 by the reprex package (v2.0.1)

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.