0

I am trying to be as specific as possible. The data I am working with looks like:

 dates bsheet mro ciss 1 2008 Oct 490509 3.751000 0.8579982 2 2008 Nov 513787 3.434333 0.9153926 3 2008 Dec 570591 2.718742 0.9145012 4 2009 Jan 534985 2.323581 0.8811410 5 2009 Feb 528390 2.001000 0.8551557 6 2009 Mar 551730 1.662290 0.8286146 7 2009 Apr 514041 1.309333 0.7460113 8 2009 May 486151 1.097774 0.5925725 9 2009 Jun 484629 1.001000 0.5412631 10 2009 Jul 454379 1.001000 0.5398128 11 2009 Aug 458111 1.001000 0.3946989 12 2009 Sep 479956 1.001000 0.2232348 13 2009 Oct 448080 1.001000 0.2961637 14 2009 Nov 427756 1.001000 0.3871220 15 2009 Dec 448548 1.001000 0.3209175 

and can be produced via

structure(list(dates = c("2008 Oct", "2008 Nov", "2008 Dec", "2009 Jan", "2009 Feb", "2009 Mar", "2009 Apr", "2009 May", "2009 Jun", "2009 Jul", "2009 Aug", "2009 Sep", "2009 Oct", "2009 Nov", "2009 Dec" ), bsheet = c(490509, 513787, 570591, 534985, 528390, 551730, 514041, 486151, 484629, 454379, 458111, 479956, 448080, 427756, 448548), mro = c(3.751, 3.43433333333333, 2.71874193548387, 2.32358064516129, 2.001, 1.66229032258065, 1.30933333333333, 1.09777419354839, 1.001, 1.001, 1.001, 1.001, 1.001, 1.001, 1.001), ciss = c(0.857998173913043, 0.9153926, 0.914501173913044, 0.881140954545454, 0.85515565, 0.828614636363636, 0.746011318181818, 0.592572476190476, 0.541263136363636, 0.539812782608696, 0.394698857142857, 0.223234772727273, 0.296163727272727, 0.387122047619048, 0.32091752173913)), row.names = c(NA, 15L), class = "data.frame") 

The line chart I created using the following code

 ciss_plot = ggplot(data = example) + geom_line(aes(x = dates, y = ciss, group = 1)) + labs(x = 'Time', y = 'CISS') + scale_x_discrete(breaks = dates_breaks, labels = dates_labels) + scale_y_continuous(limits = c(0, 1), breaks = c(seq(0, 0.8, by = 0.2)), expand = c(0, 0)) + theme_bw() + theme(axis.text.x = element_text(hjust = c(rep(0.5, 11), 0.8, 0.2))) ciss_plot 

for ggplot2 looks like:

Plot created using ggplot2

whereas if plot the same data using the standard built in plot() function of R using

 plot(example$ciss, type = 'l') 

results in

Plot created using default R function

which obviously is NOT identical!

Could someone please help me out? These plots take me forever already and I am not figuring out where the problem is. I suspect something is wring either with "group = 1" or the data type of the example$dates column!

I am thankful for any constructive input!!

Thank you all in advance!

Manuel

1 Answer 1

1

Your date column is in character format. This means that ggplot will by default convert it to a factor and arrange it in alphabetical order, which is why the plot appears in a different shape. One way to fix this is to ensure you have the levels in the correct order before plotting, like this:

library(dplyr) library(ggplot2) dates_breaks <- as.character(example$dates) ggplot(data = example %>% mutate(dates = factor(dates, levels = dates))) + geom_line(aes(x = dates, y = ciss, group = 1)) + labs(x = 'Time', y = 'CISS') + scale_x_discrete(breaks = dates_breaks, labels = dates_breaks, guide = guide_axis(n.dodge = 2)) + scale_y_continuous(limits = c(0, 1), breaks = c(seq(0, 0.8, by = 0.2)), expand = c(0, 0)) + theme_bw() 

enter image description here

A smarter way would be to convert the date column to actual date times, which allows greater freedom of plotting and prevents you having to use a grouping variable at all:

example <- example %>% mutate(dates = as.POSIXct(strptime(paste(dates, "01"), "%Y %b %d"))) ggplot(example) + geom_line(aes(x = dates, y = ciss, group = 1)) + labs(x = 'Time', y = 'CISS') + scale_y_continuous(limits = c(0, 1), breaks = c(seq(0, 0.8, by = 0.2)), expand = c(0, 0)) + scale_x_datetime(breaks = seq(min(example$dates), max(example$dates), "year"), labels = function(x) strftime(x, "%Y\n%b")) + theme_bw() + theme(panel.grid.minor.x = element_blank()) 

enter image description here

Sign up to request clarification or add additional context in comments.

2 Comments

I would like to have the x scale identical as in the ggplot I created, that is, one break per year startin in october 2009 and then ending with aug 2020. Furthermore, I do not understand why the plots I created are not identical. That is, why does ggplot show the data somehow different, or wrong?
@mzerobin as I explained in my answer, ggplot is arranging the dates in alphabetical order, so "2018 Dec" appears before "2018 Oct" etc. The base R plot is just plotting in the order the values appear in the data frame. The plots look different because the x axis is arranged in a different order. It is better to convert to date format or date-time format rather than using character strings. I'll update my answer to show how to get the breaks right for you.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.