Let be $X = (X_1,...., X_n)$ an univariate time serie. I would like to know how to standardize my data when I split it into train and test data. Let me explain you how I tranform $X$ so that I can fit an LSTM neural net. From $X$ I make a new input data and its corresponding output data. So, we have: $X = ( (X_1, ..., X_m), ... , (X_{n-m}, ..., X_{n-1}) )$
$Y = (X_{m+1}, ..., X_n)$
$\text{Card}X = \text{Card}Y$
Let's set $p$ the size of my test set. If I use Python's notation, we have:
$X_{train} = X[:-p]$
$X_{test} = X[-p:]$
Idem for $Y$ ... Now, I am wondering how to standardize my data. I think that standardizing $X$ before splitting the data into train and test sets could lead to over-fitting since we a transformation that involves all $X_i$. Basically, I am not sure that the sum (mean, standard deviation) will drown the information. In this case I think it could be better to just compute the mean and the standard deviation in the training set and use them to standardize both of the train and test sets. For me it makes no sense to standardize them separately since $\text{Card}X_{test} << \text{Card}X_{train}$. But may be I am wrong. I would also like to know whether I have to standardize $Y$ and $X$ or just $X$. When I am working with MLP neural net I used to just normalize the input data..
So, thank you first for reading and if you have any ideas or remarks, any questions to ask, please let me know. I can explain more it is up to you :)
P.S. I don't find a 'standardization' tag and I thus use the tag named normalization.