Working with carData package,dataset - salaries. Question requires to check the yrs.since.phd variable. What is the difference in salary of the professor with the highest (yrs.since.phd) and lowest (yrs.since.phd)? x<-salary_data[c("yrs.since.phd","salary")] gives me the 2 variables but how do I sort them from lowest to highest in order to be able to compare, or shall I use different function?
Add a comment |
1 Answer
There are several ways for sorting a dataframe, one of which is with dplyr::arrange. This orders the rows of the dataframe in the order you assign, e.g.
carData::Salaries %>% arrange(rank, yrs.since.phd, salary) Will sort the data after rank, then after yrs.since.phd, then after salary. However, you also seem interested in comparing these. We can summarise the mean salary for professors by the max and min yrs.since.phd:
carData::Salaries %>% filter(rank == "Prof") %>% # Get data for professors only filter(yrs.since.phd == max(yrs.since.phd) | yrs.since.phd == min(yrs.since.phd)) %>% # Get only max and min values group_by(yrs.since.phd) %>% # For each value of yrs.since.phd summarise(mean_salary = mean(salary), # Calculate mean salary n = n()) # See how many are included in the calculation # A tibble: 2 x 3 yrs.since.phd mean_salary n * <int> <dbl> <int> 1 11 142467 1 2 56 131900 2 That is the general gist of it, but you might want to build on this for further analysis.
1 Comment
mhovd
@user15675493, if this answered your question please consider accepting it to mark your question as resolved.