0

I got following problem: I got a Database and have to evaluate some Deltas e.g. I already computed this with SQL and it works well but now I need to try it with R and I didnt work much with R so I'm not quite sure if my problems are solvable with R. The best case would be if i can transform my sql code to a R code. Otherwise i will present my problem to you:

+----+------+------------+ | a | b | c | +----+------+------------+ | 10 | 2 | 31.01.2020 | | 11 | 5 | 31.01.2020 | | 10 | 3 | 31.01.2020 | | 10 | 2 | 31.12.2019 | | 11 | 4 | 31.12.2019 | | 10 | 2 | 31.12.2019 | +----+------+------------+ 

dput:

structure(list(a = c(10L, 11L, 10L, 10L, 11L, 10L), b = c(2L, 5L, 3L, 2L, 4L, 2L), c = c("31.01.2020", "31.01.2020", "31.01.2020", "31.12.2019", "31.12.2019", "31.12.2019")), row.names = c(NA, -6L), class = "data.frame") 

delta

+----+---+ | 10 | 1 | | 11 | 1 | +----+---+ 

It's my first time in this community so I hope you can see my picture. It's an example for my problem:

In the first step i need to summarize column b for the same date (c) and the same number in a). So the result for 31.12.2020 would be : 10= 5 and 11= 5 and the same for the 31.12.2019.

Until this point it works but in the next step I need to calculate an the delta of b) s.t. column a) and c)

In other words: For the number 10 in a) i calculated 5 in 31.01.2020 in 31.12.2019 i calculated 4 for 10. Now i have to calculate 5-4, but it's not possible for me to program it.

So i hope some of you understand my problem and can help me to find a smart way i could solve this problem.

I would be very thankful if you can help me.

1
  • 2
    Welcome! The package sqldf allows you to use SQL code to operate on R data frames. This may be helpful if you have working SQL. Also, the data manipulation package dplyr is something that may be conceptually similar to how you think about it in SQL and therefore not hard to translate the code into R. Maybe those can be a starting place to explore. Commented Apr 9, 2020 at 12:57

3 Answers 3

1

Using the package dplyr, try something like this:

library(dplyr) df %>% mutate(c_date = as.Date(c, format = "%d.%m.%Y")) %>% # convert to a Date in R group_by(a, c_date) %>% summarize(b = sum(b)) %>% # sum grouped by a and c group_by(a) %>% arrange(c_date) %>% # sort on the date summarize(delta = diff(b)) # take the difference, grouped on a 

Which will give:

# A tibble: 2 x 2 a delta <int> <int> 1 10 1 2 11 1 
Sign up to request clarification or add additional context in comments.

Comments

1

You can do this with data.table. First convert c to a date column.

Then take the sum of b by each (a, c) group, then within the a groups of the resulting table, take the diff of this sum. The diff will be in the correct direction because keyby orders the results by the grouping columns.

library(data.table) setDT(df) df[, c := as.Date(c, format = '%d.%m.%Y')] df[, .(bsum = sum(b)), keyby = .(a, c) ][, .(bsum_diff = diff(bsum)), by = a] # a bsum_diff # 1: 10 1 # 2: 11 1 

Comments

0

I don't know if is possible translate SQL to R code, but working with databases via R is not so hard. In a big picture: You need to set a SQL connection: DBI::dbconnect(). Next, with the function tbl() and the package dplyr you start to replicate a sql select statement. Here you can find more information: https://db.rstudio.com

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.