5
$\begingroup$

I am going through tensor-flow tutorial and noticed that they use one-hot encoding in regression tensorflow. I don't fully understand how it works. Let us take oversimplified case of ordinary least square regression. Assume we have y = [1,2,3] and x = [cat, dog, mouse]. Converting to one hot vector we get

cat = [0,0,1] dog = [0,1,0] mouse = [1,0,0] 

how does regression equation looks now? Is it multivariate regression now?

y = alpha + beta*x_1 + beta*x_2 + beta*x_3, 

where x_1, x_2, x_3 are coordinates of one-hot vector?

P.S. I am interested more in mechanics of this set up, not so much meaning.

$\endgroup$

2 Answers 2

3
$\begingroup$

Yes you turn it into three different variables, however this is not called multivariate regression, that indicates multiple output variables, not inputs. (Thanks to Simon for correcting me)

$\endgroup$
4
  • $\begingroup$ Multivariate regression typically refers to a regression model with more than 1 outcome, which is not true in this case $\endgroup$ Commented Aug 5, 2016 at 23:16
  • $\begingroup$ Oops you are correct, I will change my answer. $\endgroup$ Commented Aug 6, 2016 at 22:22
  • $\begingroup$ @Simon, thank you for the comment, could you give any references to your statement? $\endgroup$ Commented Apr 14, 2017 at 13:30
  • $\begingroup$ In most statistics textbooks you'll find that multiple regression is more than 1 predictor, while more than 1 outcome is multivariate regression/analysis. For example, you can see on wikipedia: en.wikipedia.org/wiki/Multivariate_statistics $\endgroup$ Commented Apr 14, 2017 at 19:27
3
$\begingroup$

Yes, that is the standard approach to convert categorical variables for fitting a model. In this case it would be used to train a neural network. So each category of a categorical variable is represented as a separate vector.

Note that you do not need to do this for binary variables such as Male/Female as the presence of one category implies absence of the other category, so instead of using a variable such as Gender = Male/Female; you could convert it into a variable called is_female = 0/1.

If this dataset is used to fit a regression model, the proper nomenclature should be multiple linear regression.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.