I have a dataset from a production line, which is formatted as time series data. There is a batch column, which indicates the name of the batch (str), and there is a phase column which indicates the phase of the production (str). I am working with the datetime as the index of the pandas DataFrame.
I want to plot this data on timeseries graph, overlaying the data from each phase and distinguishing each batch (i.e. different colour), with each process variable (i.e. temp1, temp2, press1, press2) on a different axis (as per the diagram) How can this be done?
EDIT: for clarity, I need the trends to be plotted against a datetime baseline, otherwise they will not overlay.
Example of the dataset: | datetime | temp1 | temp 2|press1|press2|batch | phase | |:---- |:--: | :--: | :--: | :--: |:--: |:--: | | 2023-02-03 15:45:34| 34.45 | 23.34 | 13.23| 45.5 | 'D' | '10-Wait' | | ... | ... | ... | ... | ... | 'D' | ... | | 2023-02-03 15:55:34| 36.55 | 22.14 | 18.23| 78.5 | 'D' | '20-Initialise'|
To create a similar dataset -to mine- you can use the following code:
import numpy as np import pandas as pd import datetime date = pd.date_range(start='1/1/2023', end='10/06/2023', freq=datetime.timedelta(seconds=30)) tags = ['temp1','temp2','press1','press2'] data=np.random.rand(len(date),len(tags)) df=pd.DataFrame(data,columns=tags).set_index(date) batches = ['A','B','C','D','E','F','G'] n=len(batches) period_start = pd.to_datetime('1/1/2023') period_end = pd.to_datetime('10/06/2023') batch_start = (pd.to_timedelta(np.random.rand(n) * ((period_end - period_start).days + 1), unit='D') + period_start) batch_end = (batch_start + pd.to_timedelta(8,unit='H')) df_batches = pd.DataFrame(data=[batch_start,batch_end],columns=[batches],index=['start','end']).T for item in batches: start_time = df_batches['start'][item] end_time = df_batches['end'][item] df.loc[((df.index>=start_time)&(df.index<=end_time)), 'batch'] = item df.dropna(subset=['batch'],inplace=True) df['phase']='' phases = ['10-Wait','20-Initialise','30-Warm','40-Running'] for batch in batches: wait_len = int(len(df[df['batch']==batch].index)*0.2) init_len = int(len(df[df['batch']==batch].index)*0.4) warm_len = int(len(df[df['batch']==batch].index)*0.6) run_len = int(len(df[df['batch']==batch].index)) wait_start = df[df['batch']==batch].index[0] wait_end = df[df['batch']==batch].index[wait_len] init_end = df[df['batch']==batch].index[init_len] warm_end = df[df['batch']==batch].index[warm_len] run_end = df[df['batch']==batch].index[-1] df['phase'].loc[wait_start:wait_end] = phases[0] df['phase'].loc[wait_end:init_end] = phases[1] df['phase'].loc[init_end:warm_end] = phases[2] df['phase'].loc[warm_end:run_end] = phases[3] df.to_csv('stackoverflowqn.csv') 





