- Notifications
You must be signed in to change notification settings - Fork 790
Description
When using score_nuisances with a discrete treatment, the function does not return the correct score.
The issue comes from the inverse_onehot function in econml/utilities.py. Currently, when it receives as input a DataFrame generated by pandas.get_dummies(), it incorrectly decodes the treatment.
For example, in case of binary treatments, labels originally coded as 0 and 1 are shifted and end up being decoded as 1 and 2, due to the following implementation:
def inverse_onehot(T): """ Given a one-hot encoding of a value, return a vector reversing the encoding to get numeric treatment indices. Note that we assume that the first column has been removed from the input. """ assert ndim(T) == 2 # note that by default OneHotEncoder returns float64s, so need to convert to int return (T @ np.arange(1, T.shape[1] + 1)).astype(int) This logic introduces an off-by-one error when decoding treatments.
Expected behavior
The function should return zero-based indices, ensuring that discrete treatments (e.g. 0/1) remain consistent after decoding. A corrected implementation would have the following code:
def inverse_onehot(T): assert econml.utilities.ndim(T) == 2 indices = ( np.arange(0, T.shape[1]) if isinstance(T, pd.DataFrame) else np.arange(1, T.shape[1] + 1) ) return (T @ indices).astype(int) This change guarantees that score_nuisances computes the correct score for discrete treatments.
Contributed by @Cantal00p, @f5ilverio