0

I wish to plot a line plot of the df below by grouping the rows, so i would have 1 line for GDP, 1 line for agriculture and 1 line for services (ignoring countries for now), does anyone know if this is possible using ggplot?

My final plot would have an x axis of years and a y axis of gdp (value)

economics_df Series Name Country 1997 1998 1999 2000 GDP (current US$) Spain 5.90077E+11 6.19215E+11 6.34908E+11 5.98363E+11 GDP (current US$) France 1.45288E+12 1.50311E+12 1.49315E+12 1.36564E+12 GDP (current US$) Monaco 2840175545 2934498443 2906093757 2647885849 GDP (current US$) Italy 1.24188E+12 1.27005E+12 1.25245E+12 1.14668E+12 GDP (current US$) Croatia 24091170703 25792876644 23677307509 21839780971 Agriculture (% of GDP) Spain 4.302210034 4.150411966 3.817378211 3.745305634 Agriculture (% of GDP) France 2.344255815 2.362459834 2.236261411 2.098357551 Agriculture (% of GDP) Monaco 2.544255815 2.342459834 2.234261411 2.108357551 Agriculture (% of GDP) Italy 2.861911574 2.768857277 2.722232363 2.56361412 Agriculture (% of GDP) Croatia 5.228986538 5.306173593 5.393085168 4.961600952 Services (% of GDP) Syria 45.65197856 44.15290647 45.68986146 41.94697681 Services(% of GDP) Lebanon 60.61030928 58.32727829 59.05884148 61.52190623 Services (% of GDP Israel 62.02333939 63.02788655 63.92563162 64.72521236 Services (% of GDP) Egypt 48.15193682 48.28789144 47.55581925 46.52599236 Services (% of GDP) Libya 44.15193682 44.28789144 45.55581925 45.55581445 
2
  • 2
    Before creating the right dataset for your ggplot, you may need to refine the modalities from variable Series Name (homogeneization) Commented Oct 25, 2022 at 9:16
  • 1
    Sure is this possible. First step would be to reshape your data using e.g. tidyr::pivot_longer. However as already noted by Yacine your data is a mix of percentages and dollar values and putting both in one plot will most likely not give a meaningful plot because of the different ranges and units. Commented Oct 25, 2022 at 9:30

1 Answer 1

2

You need to get the data into the right shape. ggplot makes plotting very easy once the data is in long form, which is easy to do with dplyr and tidyr:

library(dplyr) library(ggplot2) library(tidyr) econ_for_plot <- economics_df |> pivot_longer(-c(`Series Name`, Country), names_to = "year") |> group_by(`Series Name`, year) |> summarise(value = sum(value)) econ_for_plot # # A tibble: 12 x 3 # # Groups: Series Name [3] # `Series Name` year value # <chr> <chr> <dbl> # 1 Agriculture 1997 1.73e 1 # 2 Agriculture 1998 1.69e 1 # 3 Agriculture 1999 1.64e 1 # 4 Agriculture 2000 1.55e 1 # 5 GDP 1997 3.31e12 # 6 GDP 1998 3.42e12 # 7 GDP 1999 3.41e12 # 8 GDP 2000 3.14e12 # 9 Services 1997 2.61e 2 # 10 Services 1998 2.58e 2 # 11 Services 1999 2.62e 2 # 12 Services 2000 2.60e 2 

I have used sum() in the summarise() call, but you could replace it with mean() or any other function to aggregate the data. Once it is in this form you can plot it:

 ggplot(econ_for_plot, aes( x = year, y = value, color = `Series Name`, group = `Series Name` ) ) + geom_point() + geom_line() + scale_y_log10() + labs( title = "Sum of spending", y = "Sum of category (log scale)" ) + theme_bw() 

enter image description here

Input data

economics_df <- structure(list(`Series Name` = c( "GDP", "GDP", "GDP", "GDP", "GDP", "Agriculture", "Agriculture", "Agriculture", "Agriculture", "Agriculture", "Services", "Services", "Services", "Services", "Services" ), Country = c( "Spain", "France", "Monaco", "Italy", "Croatia", "Spain", "France", "Monaco", "Italy", "Croatia", "Syria", "Lebanon", "Israel", "Egypt", "Libya" ), `1997` = c( 5.90077e+11, 1.45288e+12, 2840175545, 1.24188e+12, 24091170703, 4.302210034, 2.344255815, 2.544255815, 2.861911574, 5.228986538, 45.65197856, 60.61030928, 62.02333939, 48.15193682, 44.15193682 ), `1998` = c( 6.19215e+11, 1.50311e+12, 2934498443, 1.27005e+12, 25792876644, 4.150411966, 2.362459834, 2.342459834, 2.768857277, 5.306173593, 44.15290647, 58.32727829, 63.02788655, 48.28789144, 44.28789144 ), `1999` = c( 6.34908e+11, 1.49315e+12, 2906093757, 1.25245e+12, 23677307509, 3.817378211, 2.236261411, 2.234261411, 2.722232363, 5.393085168, 45.68986146, 59.05884148, 63.92563162, 47.55581925, 45.55581925 ), `2000` = c( 5.98363e+11, 1.36564e+12, 2647885849, 1.14668e+12, 21839780971, 3.745305634, 2.098357551, 2.108357551, 2.56361412, 4.961600952, 41.94697681, 61.52190623, 64.72521236, 46.52599236, 45.55581445 )), class = "data.frame", row.names = c( NA, -15L )) 

Edit: I made the Y-axis log-scale because the range of values was large. But now I have read the comments and looked at the data more closely, I realise that this plots absolute dollars and relative percent on the same scale. So this post tells you how to construct such a plot - although it does not really make sense to do so in this case.

Sign up to request clarification or add additional context in comments.

3 Comments

What is the error message you get? That's just the code.
Yep apologies. This should be it - Error in h(): ! Problem with summarise() column value. ℹ value = sum(value). ✖ invalid 'type' (character) of argument ℹ The error occurred in group 1: Series Name = "Agriculture (% of GDP)", year = "1997".
Right it's a character rather than numeric column so R doesn't know how to add it up. Try changing the line summarise(value = sum(value)) to summarise(value = sum(as.numeric(value), na.rm = T)) . A word of caution though that if you get a lot of warnings saying that NAs were introduced by coercion then make sure everything in that column is actually a number stored as text, rather than a string of characters.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.