Execute kdb+ queries from R for advanced high-performance analytics.
A KX Fusion interface.
# remove old package if('rkdb' %in% rownames(installed.packages())) remove.packages('rkdb') # install devtools if(! 'devtools' %in% rownames(installed.packages())) install.packages('devtools') library(devtools) # install rkdb devtools::install_github('kxsystems/rkdb', quiet=TRUE,INSTALL_opts=c("--no-multiarch")) # to install rkdb of particular release # devtools::install_github('kxsystems/rkdb@v0.10.0', quiet=TRUE) library(rkdb)Start a kdb+ process to test the installation.
q -p 5000Open a connection to it.
h <- open_connection('localhost',5000)You can evaluate any kdb+ expression and its result will come back to R:
execute(h, '1+1') ## [1] 2Assigning a variable in the q workspace also works:
execute(h, 'x:1+1') #assign x to 2 ## NULL execute(h, 'x') # get back the value ## [1] 2Kdb+ uses some basic types that might not have a direct equivalent in R. Note also that this is not a bijective operation. The conversions (from kdb to R) are:
| kdb/q | r |
|---|---|
| boolean | logical |
| byte | raw |
| short | integer |
| int | integer |
| long | integer64 |
| real | numeric |
| float | numeric |
| char | character |
| symbol | character |
| timestamp | nanotime |
| month | integer |
| date | Date |
| datetime | POSIXct |
| timespan | integer64 |
| minute | difftime |
| second | difftime |
| time | integer |
| enumeration | character |
| table | data.frame |
| keyed table | data.frame |
| dictionary (mixed types) | named list |
| dictionary (same types) | named vector |
| function | character |
| list (same types) | vector |
| list (same ‘complex’ types) | list |
| list (different types) | list |
Rkdb provides a convenient way to retrieve computation done on the kdb+ side so you can have the best of both worlds:
kdb <- ' t: ([] x:1000#`a`b`c;y:1000#1f*til 10;z:1000#1f*til 4); select sum y, dev z by x from t ' execute(h, kdb) ## x y z ## 1 a 1503 1.120709 ## 2 b 1497 1.116689 ## 3 c 1500 1.116689One can for instance use R graphical capabilities:
kdb <- ' t: ([] x:1000#`a`b`c;y:1000#1f*til 10;z:1000#1f*til 4); select y,z from t where x=`a ' DF <- execute(h, kdb) plot(DF$y, DF$z, main='scatter plot', xlab='y values', ylab='z values')You can call kdb+ functions with R objects as arguments. They will be passed and converted to native kdb+ data types, and the kdb+ expression will be evaluated:
execute(h, "raze", list(c(1,2,3), c(4,5,6))) ## [1] 1 2 3 4 5 6 execute(h, "+", 2, 5) ## [1] 7 execute(h,'{`tmp set x}',data.frame(a=c(1,2,3),b=c("a","b","b"))) ## [1] "tmp"For example, here is how you can use the left-join function on two data frames:
DF1 <- data.frame(x=c('x','x','y','y'), y=1:4) DF2 <- data.frame(x=c('x','y','z'), z=seq(10,30,10)) execute(h, "{[x;y] x lj `x xkey y}", DF1, DF2) ## x y z ## 1 x 1 10 ## 2 x 2 10 ## 3 y 3 20 ## 4 y 4 20See code.kx.com/q/interfaces for other interfaces to and from kdb+.
