Skip to content

Commit 8231e2b

Browse files
committed
Update README.md
1 parent 6b59cf0 commit 8231e2b

File tree

1 file changed

+32
-1
lines changed

1 file changed

+32
-1
lines changed

README.md

Lines changed: 32 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,4 +7,35 @@ This is a guide for anyone who needs to share data with a statistician. The targ
77
* Students or postdocs in scientific disciplines looking for consulting advice
88
* Junior statistics students whose job it is to collate/clean data sets
99

10-
The goal of this guide is to ensure the most reproducible and the most
10+
The goals of this guide are to provide some instruction on the best way to share data to avoid the most common pitfalls
11+
and sources of delay in the transition from data collection to data analysis. The Leek group works with a large
12+
number of collaborators and the number one source of variation in the speed to results is the status of the data
13+
when they arrive at the Leek group. Based on my conversations with other statisticians this is true nearly universally.
14+
15+
My strong feeling is that statisticians should be able to handle the data in whatever state they arrive. It is important
16+
to see the raw data, understand the steps in the processing pipeline, and be able to incorporate hidden sources of
17+
variability in one's data analysis. On the other hand, for many data types, the processing steps are well documented
18+
and standardized. So the work of converting the data from raw form to directly analyzable form can be performed
19+
before calling on a statistician. This can dramatically speed the turnaround time, since the statistician doesn't
20+
have to work through all the pre-processing steps first.
21+
22+
23+
What you should deliver to the statistician
24+
====================
25+
26+
For maximum speed in the analysis this is the information you should pass to a statistician:
27+
28+
1. The raw data.
29+
2. A [tidy data set](http://vita.had.co.nz/papers/tidy-data.pdf)
30+
3. An explicit and exact recipe you used to go from 1 -> 2
31+
32+
Let's look at each part of the data package you will transfer.
33+
34+
35+
36+
What you should expect from a statistician
37+
====================
38+
39+
40+
41+

0 commit comments

Comments
 (0)