Assuming you have a single predictor variable that represents frequency of behaviour, I would make the following points
Should you split a numeric variable into high-low groups
I quote the following from one of my blog posts on creating clusters, where I use the term "median split" as a prototypical example of converting a numeric variable into a binary high-low variable.
Many researchers have heard the advice to not form median splits (see, Howell for a discussion), or other kinds of binary splits for that matter. The same arguments also tend to apply with other forms of abrupt grouping into a small number of factors.
Some arguments FOR running median splits are: 1) it allows you to do an ANOVA or t-test and compare group means; 2) group differences are easier to communicate to a lay audience; 3) it reflects the important distinction in the underlying continuous variable.
Some arguments AGAINST running median splits are: 1) you can always find an equivalent analysis that respects the continuous nature of the variable (e.g., regression); 2) when creating median splits, you lose a lot of information; 3) the cut-off tends to be relatively arbitrary and it varies between samples; 4) the resulting model based on a median split does not reflect the underlying nature of the variable; 5) in most cases a binary split will have less statistical power; 6) if the purpose is to communicate to a scientific audience, respecting the continuous nature of the variable is a necessary complexity.
From the above you can see that there are generally more reasons in favour of maintaining the continuous version of the variable. The two occasions where splits are tolerable are where it makes it easy to communicate findings to a lay audience and where the underlying effect of interest occurs in a stepwise fashion. In the case of the latter, the presence of a stepwise effect can be tested empirically; a quick look at a scatter plot should give some sense if there is a point where the effect changes dramatically. Likewise decisions based on test scores are often based on pass-fail kinds of categories, and there is often a concrete desire to draw inferences about these specific groups.
Also, check out page 128 of Making Friends with Your Data for further discussion.
In summary, my advice would be to run a correlation or a regression predicting your outcome variable from the continuous version of your predictor. You may or may not want to perform an order preserving transformation of your predictor depending on its distribution.
Creating two groups based on numeric variable
Putting aside the issues raised above, if you decide that you still want to split your predictor variable into high-low groups, the following are some options
- Use Statistical properties of your sample
- Median split
- Above or below the mean
- Take bottom 25% and top 25% and throw out the middle
- Take bottom third and top third and throw out the middle third
- Use accepted or externally validated cut-offs
- e.g., medical diagnoses are often based on certain cut-offs on a continuous scale
- Use your own understanding of the phenomena to define a cut-off
- Examine a histogram or density plot and look for a natural split in the data (as mentioned by @rolando2)