1

I'm trying to find the overall distance moved by a worker and my df looks something like

Name x y John 12 34 John 15 31 John 8 38 John 20 14 

I've tried using the dist(rbind()) function, but the result given is not correct. It just gives the result of sqrt((row1)^2+(row2)^2+(row3)^2+(row4)^2), which I don't think is correct.

So I'm trying to use for loop to do this, so that dist between row 1 and 2 , 2 and 3, and so on is calculated separately and summed up later. How would I do this?

My code currently looks like:

for(i in nrow(df)){ n <- dist(rbind(df$x,df$y)) } 

and this just gives me the wrong single result mentioned above, and not a list of individual distances for each 1-2 row/s.

My expected output would be like:

4.2426 9.8995 26.8328 

and I can sum them up later by I guess running:

sum(n) 

right?

10
  • pls share your expected output! Commented Jul 10, 2018 at 6:03
  • @ChirayuChamoli added my expected outcome. Thank you for the feedback. Commented Jul 10, 2018 at 6:12
  • You are not using i anywhere in the loop. Probably , you are trying to do , n[i] <- dist(rbind(df$x[i],df$y[i])). Currently, n is holding only one value at any given time and probably, the entire thing can be replaced without a loop. I am not sure though, as it is not clear to me. Commented Jul 10, 2018 at 6:18
  • what distance calculation are you performing? Commented Jul 10, 2018 at 6:26
  • @Wimpel Euclidean distance. I should have mentioned it earlier sorry. So I could just use sqrt((x[1]-x[2])^2+(y[1]-y[2])^2) kind of formula, but I don't know how to do a loop for that. Any help for this please? Commented Jul 10, 2018 at 6:45

3 Answers 3

1

Using base R, you can call the dist on each consecutive pair of rows, then cumsum the adjacent distances to get your results by Name.

df <- read.table(text="Name x y John 12 34 John 15 31 John 8 38 John 20 14 Mark 11 13 Mark 16 18", header=TRUE) by(df, df$Name, function(mat) { idx <- seq_len(nrow(mat)) cumsum(mapply(function(i,j) dist(mat[c(i,j), c("x","y")]), head(idx, -1), tail(idx, -1))) }) 

Alternatively, the below just calculate the whole distance matrix and extract the first off-diagonal

by(df, df$Name, function(mat) { idx <- seq_len(nrow(mat)) cumsum( as.matrix(dist(mat[,c("x","y")]))[cbind(head(idx, -1), tail(idx, -1))]) }) 
Sign up to request clarification or add additional context in comments.

1 Comment

I've never seen most of the functions you've just suggested here. Thank you for the new information! I'll give it a try soon
1

no loops required

dplyr

A dplyr/tidyverse approach that also can covers multiple names (since the existence of a 'name'-column indicates multiple workers).

df <- data.frame( Name = c("John","John","John","John"), x = c(12,15,8,20), y = c(34,31,38,14), stringsAsFactors = FALSE ) library(tidyverse) df %>% #group by name (just in case there are multiple workers in the DF) #you can remove this line if there is only 1 worker group_by( Name ) %>% #get the previous x and y value mutate( x_prev = lag( x ), y_prev = lag( y ) ) %>% #filter out rows without previous x value filter( !is.na( x_prev ) ) %>% #calculate the distance mutate( distance = sqrt( abs (x - x_prev )^2 + abs( y - y_prev )^2 ) ) %>% #summarise to get the total distance summarise( total_distance = sum( distance ) ) # # A tibble: 1 x 2 # Name total_distance # <chr> <dbl> # 1 John 41.0 

base R

#create a matrix of x and y, calculate the distance and create a matrix from the results M <- as.matrix( dist( matrix( c( df$x, df$y ), ncol = 2 ) ) ) M # 1 2 3 4 # 1 0.000000 4.242641 5.656854 21.54066 # 2 4.242641 0.000000 9.899495 17.72005 # 3 5.656854 9.899495 0.000000 26.83282 # 4 21.540659 17.720045 26.832816 0.00000 #get the first off diagonal of the matrix (row = column+1) M[row(M) == col(M) + 1] #[1] 4.242641 9.899495 26.832816 #sum the first off diagonal sum( M[row(M) == col(M) + 1] ) #[1] 40.97495 

3 Comments

Yes, there are multiple workers, but I already created a subset for each workers so that each workers have their own individual data frame. So I guess I can start from mutate() section? Also knowing that there aren't any NA values, do I still have to use the filter code? Thank you for the reply and for future response.
@Robo the NA-filtering is because you cannot calculate the distance for the first row, since the distance travelled is based in the current and the previous values of x and y. For n lines, you will always get n-1 calculated ditances. Try commenting out lines in the code to see the results up until that point. In this solution, you do not have to create an individual subset for each worker; you can work on the combined df, and group per worker.
@Robo added a base solution
0
df<-data.frame("Name" = rep(x = "John",times = 4),"x" = c(12,15,8,20),"y" = c(34,31,38,14)) #> df # Name x y #1 John 12 34 #2 John 15 31 #3 John 8 38 #4 John 20 14 n<-numeric() for(i in 1:(nrow(df) - 1)){ n[i] <- dist(rbind(df[i,-1],df[(i + 1),-1])) } print(n) #[1] 4.242641 9.899495 26.832816 sum(n) #[1] 40.97495 

3 Comments

This code says that there's no object called 'i'. What have I done wrong? Also, how does R know if x or y column is used? Would you be able to explain the code please? I'm quite new to coding & R.
Actually, it does give me the same result, when I made the data frame following your code. Before, I just used another given dataset. So, how can I mimic this if I was to use an already made-up dataset with 8000+ rows? I cant write c(8000 numbers) for both x and y. It gives me an error same as above if I use the already-made dataset.
I have removed the first column with the name when I mentioned (-1) here : df[i,-1],df[(i + 1),-1]. You can similarly remove any extra columns and just retain the x and y columns.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.