@@ -7,4 +7,35 @@ This is a guide for anyone who needs to share data with a statistician. The targ
77* Students or postdocs in scientific disciplines looking for consulting advice
88* Junior statistics students whose job it is to collate/clean data sets
99
10- The goal of this guide is to ensure the most reproducible and the most
10+ The goals of this guide are to provide some instruction on the best way to share data to avoid the most common pitfalls
11+ and sources of delay in the transition from data collection to data analysis. The Leek group works with a large
12+ number of collaborators and the number one source of variation in the speed to results is the status of the data
13+ when they arrive at the Leek group. Based on my conversations with other statisticians this is true nearly universally.
14+
15+ My strong feeling is that statisticians should be able to handle the data in whatever state they arrive. It is important
16+ to see the raw data, understand the steps in the processing pipeline, and be able to incorporate hidden sources of
17+ variability in one's data analysis. On the other hand, for many data types, the processing steps are well documented
18+ and standardized. So the work of converting the data from raw form to directly analyzable form can be performed
19+ before calling on a statistician. This can dramatically speed the turnaround time, since the statistician doesn't
20+ have to work through all the pre-processing steps first.
21+
22+
23+ What you should deliver to the statistician
24+ ====================
25+
26+ For maximum speed in the analysis this is the information you should pass to a statistician:
27+
28+ 1 . The raw data.
29+ 2 . A [ tidy data set] ( http://vita.had.co.nz/papers/tidy-data.pdf )
30+ 3 . An explicit and exact recipe you used to go from 1 -> 2
31+
32+ Let's look at each part of the data package you will transfer.
33+
34+
35+
36+ What you should expect from a statistician
37+ ====================
38+
39+
40+
41+
0 commit comments