Optimal construction of day feature in neural networks

Question

Working on regression problem I started to think about representation of "day of a week" feature. I wonder which approach would perform better:

one feature; value 1/7 for Monday; 2/7 for Tuesday...
7 features: (1, 0, 0, 0, 0, 0, 0) for Monday; (0, 1, 0, 0, 0, 0, 0) for Tuesday...

It's hard to measure it due to network configuration differences. (Additional six features should be reflected in number of hidden nodes I believe.)

Number of all features is about 20. I use simple backprop to learn ordinary feed-forward neural network.

What about using binary encoding for day of week? 3 features, where (0, 0, 0) is Sunday. (0, 0, 1) for Monday and so on? — Shamoon
– Shamoon, Commented Mar 3, 2015 at 18:10
This has the added benefit of reducing the features to something more meaningful to reduce in computation time — Shamoon
– Shamoon, Commented Mar 3, 2015 at 18:11
There are many similar Qs on site! Some few: stats.stackexchange.com/questions/65900/… stats.stackexchange.com/questions/332688/… — kjetil b halvorsen
– kjetil b halvorsen ♦, Commented Oct 27, 2020 at 0:51

Zach · Accepted Answer · 2014-12-02 14:55:08Z

Your second representation is more traditional for categorical variables like day of week.

This is also known as creating dummy variables and is a widely used method for encoding categorical variables. If you used 1-7 encoding you're telling the model that days 4 and 5 are very similar, while days 1 and 7 are very dissimilar. In fact, days 1 and 7 are just as similar as days 4 and 5. The same logic holds up for 0-30 encoding for days of the month.

Day of the month is a little trickier, because while every week has the same 7 days, not every month has the same 30 days: some months have 31 days, and some months have 28 days. Since both weeks and months are cyclical, you could use fourier transformations to convert them to smooth linear variables.

For example (using R, my programming language of choice):

day_of_month = c(1:31, 1:28, 1:30) day_of_year <- 1:length(day_of_month) s = sin((2*pi)/30*day_of_month) c = cos((2*pi)/30*day_of_month) plot(day_of_month ~ day_of_year) lines(15*s+15 ~ day_of_year, col='blue') lines(15*c+15 ~ day_of_year, col='red') legend(10, 30, c('raw', 'sin', 'cos'), c('black', 'blue', 'red'))

raw vs sin vs cosine

(I scaled the sine/cosine variables to be 0/30, rather than -1/1 so the graph looks better)

As you can see, while the raw "day of month variable" jumps back to zero at the end of each month, the sine and cosine transformations make a smooth transition that lets the model know days at the end of one month are be similar to days at the beginning of the next month.

You can add the rest of the fourier terms as follows:

for(i in 1:3){ s = sin((2*pi)/30*day_of_month + 30 * i/4) c = cos((2*pi)/30*day_of_month + 30 * i/4) lines(15*s+15 ~ day_of_year, col='blue') lines(15*c+15 ~ day_of_year, col='red') } legend(10, 30, c('raw', 'sin', 'cos'), c('black', 'blue', 'red'))

Complete transforms

Each pair of sine/cosine waves makes a circle:

m <- lapply(1:4, function(i){ as.matrix( data.frame( s = sin((2*pi)/30*day_of_month + 30 * i/4), c = cos((2*pi)/30*day_of_month + 30 * i/4) ) ) }) m <- do.call(cbind, m) pairs(m)

circle This page has a really handy explanation of how to manipulate sine and cosine waves.

Is there any specific reason to do so? I wonder how could it affect convergence. My second doubt is when a variable is still categorical - what about day of month? (0 - 30) — Oepas Dost
– Oepas Dost, Commented Dec 2, 2014 at 0:41
Same thing; use indicator variables. The first encoding induces a similarity measure that may not be appropriate; e.g., is Sunday really the most dissimilar day from Monday? That's what the encoding implies... — Emre
– Emre, Commented Dec 2, 2014 at 0:53
@OepasDost If my post answers your question, feel free to up-vote it and/or accept it by clicking the checkmark. — Zach
– Zach, Commented Dec 2, 2014 at 13:44
@Zach why would you consider day of week (that can be encoded from 0 to 6) as categorical and day of month as ordinal cyclical (and therefore use the Fourier transform). And not both of them as ordinal cyclical and therefore do a Fourier transform for day of week as well? — zipp
– zipp, Commented Nov 29, 2017 at 20:59
@zipp You could use a Fourier for day of the week too. In my experience, the primary value of day of week is the difference between weekdays and weekends, which is very simple to capture with dummy variables (or an indicator variable). — Zach
– Zach, Commented Nov 30, 2017 at 18:23

Stack Exchange Network

Optimal construction of day feature in neural networks

1 Answer 1

Linked

Hot Network Questions

Optimal construction of day feature in neural networks

1 Answer 1

Linked

Related

Hot Network Questions