Questions tagged [binning]
Binning means grouping a continuous variable into discrete categories. It is particularly used in reference to histograms, but could also be used more generally in the sense of coarsening.
268 questions
4 votes
1 answer
265 views
Maximum Likelihood estimation for heavy tailed and binned data
I have binned loss data where each bin is defined by: A minimum loss and maximum loss (the bin boundaries) A probability of occurrence for that bin The probabilities across all bins sum to 1. ...
2 votes
1 answer
76 views
Regression adjustment when stratifying on discretized continuous covariate
Assuming a randomized experiment, with the randomization stratified on a discretized version of a continuous baseline covariate (e.g. age groups, cutoff of a clinical score). We know that ...
2 votes
0 answers
53 views
Issue with Count Scaling in Logarithmic Binning Compared to Linear Binning [closed]
I am trying to compare the counts obtained from linear binning and logarithmic binning for a given dataset X, but I am observing unexpected results in the logarithmic binning case. Linear Binning: if ...
0 votes
0 answers
39 views
Is binning discrete variables a good idea? [duplicate]
This is a follow-up question to What is the benefit of breaking up a continuous predictor variable? Is binning of continuous data always bad for statistical tests? [duplicate] From the above ...
11 votes
3 answers
654 views
Can I retain the ordinal nature of a predictor while answering a question about it that is inherently binary?
As part of a collaboration, I've been asked to fit a model with a continuous response $Y$ and an ordinal predictor $X$ (levels 1 to 5). The dataset owner is after an answer that is inherently binary: ...
9 votes
1 answer
172 views
Show that "dichotomizing" a continuous outcome variable reduces the standardized effect size
We have been teaching sample size calculation for the comparison of two groups A and B. I was asked to provide a mathematical explanation of why it reduces power to "dichotomize" a ...
2 votes
1 answer
218 views
How to model pixel shot noise
I'm interested in modelling the effect of shot noise on images. When taking a picture with a camera, the number of photons incident upon each pixel during the exposure time is (I believe) a ...
1 vote
1 answer
146 views
I would like to perform a Kruskal-Wallis test in Jamovi [closed]
My problem is that I have data of sexual assertiveness (7 point Likert-scale, SAQ) and relationship satisfaction (5 point Likert-scale, RAS) but I would like to divide the sexual assertiveness ...
1 vote
0 answers
55 views
Test score equivalence - preparing data for Spearman correlation? [closed]
I have a somewhat theoretical question. I am trying to establish how closely scores across different language tests (IELTS, TOEFL, C1A, OET, DET) used in public domains match each other, given that ...
0 votes
0 answers
151 views
Does replacing binned variables with Weight of Evidence values introduce data leakage?
In my company I've been noticing some binary classification modeling code that replaces bins of a continuous variable with the corresponding Weight of Evidence (WoE) of the given bin. As far as I ...
1 vote
1 answer
152 views
What distribution fit test to use for binned data?
I'm trying to compare two populations with 40 samples each. For each sample, I have two measurements of angle, measured in bins of 30 degrees (1-12), and I calculated the difference between the two (e....
2 votes
1 answer
546 views
Methods to derive cut-offs for continuous variables
I am working on a project to determine the variables that better predict the binary outcome. I am using conditional random forest and permimp::permimp for ...
0 votes
0 answers
58 views
Data driven approach to binning conditions based on a histogram
(Please note that this is all hypothetical at this point and the data specifics should not matter that much). Let's say I have a dataset where participants took a certain amount of time to complete a ...
1 vote
0 answers
36 views
Discretization in regression, experimentation, and causal inference as deafult [duplicate]
It crossed my mind that when designing an experiment and you're not interested in NHST but full regression model where coefficients for treatment exposure and relevant covariates are desired, perhaps ...
1 vote
1 answer
182 views
Does taking the ratio of Empirical Distributions (histogram bins) show their differences?
Background I have two Empirical distributions, both derived from social media data. The first represents a broad sample of ~4.8 million posts and the number of followers each post author has. The ...