0

I am using an online ONS dataset of inflation and trying to chart it, but when plotting it with ggplot the x-axis is not in chronological order. (the order is random)

Here is my code, with the link to the dataset:

install.packages("tidyverse") library("tidyverse") install.packages("lubridate") library("lubridate") #webscraping the ONS inflation csv file cpi<-read.csv(url("https://www.ons.gov.uk/generator?format=csv&uri=/economy/inflationandpriceindices/timeseries/d7g7/mm23")) #removing rows 1 to 7 which contain descriptors, keeping this as a dataframe cpi<-cpi[-c(1,2,3,4,5,6,7),,drop=FALSE] #renaming columns as date and inflation cpi<- cpi %>% rename(date=Title) cpi<- cpi %>% rename(inflation=CPI.ANNUAL.RATE.00..ALL.ITEMS.2015.100) #proper title characters for date cut_cpi$date<- str_to_title(cut_cpi$date) #subsetting cpi dataset in order to have only the data from the row of 2020 JAN to the last row cut_cpi<- cpi[(which(cpi$date=="2020 JAN")):nrow(cpi),] #plotting inflation in a line chart ggplot(cut_cpi,aes(x=date,y=inflation,group=1,))+geom_line(colour="black")+labs(title="CPI inflation from January 2020") +theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) 

I think the problem might have to do with dates as that is a character rather than a date. But I cannot turn that into date class.

I tried with this

cut_cpi$date <- as_factor(cut_cpi$date) cut_cpi$date <- as_date(cut_cpi$date, format='%Y %b') 

I tried checking the locale and it is not a problem

> Sys.setlocale("LC_TIME") [1] "English_United Kingdom.1252" 
3

2 Answers 2

1

You had two issues.

1- inflation was stored as character not a number so it couldn't be plotted

2- date was stored as a character, not a date, so it would just be plotted in alphabetical order. It has to be a date so it can be sorted properly, then just format the scale so that it prints the date in the format that you want.

library("tidyverse") library("lubridate") #webscraping the ONS inflation csv file cpi<-read.csv(url("https://www.ons.gov.uk/generator?format=csv&uri=/economy/inflationandpriceindices/timeseries/d7g7/mm23")) #removing rows 1 to 7 which contain descriptors, keeping this as a dataframe cpi<-cpi[-c(1,2,3,4,5,6,7),,drop=FALSE] #renaming columns as date and inflation cpi<- cpi %>% rename(date=Title) cpi<- cpi %>% rename(inflation=CPI.ANNUAL.RATE.00..ALL.ITEMS.2015.100) #proper title characters for date #THIS FAILS. cut_cpi data.frame hasn't been created yet so this doesn't work. Unnecessary so just remove it. #cut_cpi$date<- str_to_title(cut_cpi$date) #subsetting cpi dataset in order to have only the data from the row of 2020 JAN to the last row cut_cpi<- cpi[(which(cpi$date=="2020 JAN")):nrow(cpi),] #NEW cut_cpi<- cut_cpi %>% mutate(real_date_format= parse_date_time(cut_cpi$date, orders = "%Y %b")) %>% arrange(desc(real_date_format)) #plotting inflation in a line chart #NEW # remove extra comma on aes # converted inflation to numeric (was character) # converted real_date_format to date (was datetime). scale_x_date breaks with datetime ggplot(cut_cpi,aes(x=as_date(real_date_format), y=as.numeric(inflation),group=1))+ geom_line(colour="black")+ labs(title="CPI inflation from January 2020") + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + #NEW scale_x_date(date_breaks = "1 month", date_labels = "%b %Y") 
Sign up to request clarification or add additional context in comments.

Comments

1

you can try this :

ggplot(cut_cpi,aes(x=ym(date),y=inflation,group=1,))+geom_line(colour="black")+labs(title="CPI inflation from January 2020") +theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+ scale_x_date(date_breaks = "3 month") 

you can change the "3 month" by whatever you want.

Claire

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.