I have 1000 industrial sensors that send data once a day. The 10 sent parameters along with sensor pass/fail info is stored. The data format is below. If a sensor fails there is no further communication after that. There are also some sensors that just stop communicating even though their last data point was not a failure, most of such sensors have no further communication. The rest of the sensors function until the end of study.
Almost 40% of sensors have between 6 to 10 days of data due no communication or failure -- no communication mostly. The rest of the sensors have anywhere between 10 to 300 days of data again due to no communication or failure, 300 days being the end of study. The time to failure is between 6 & 300 days. About ~5% of the sensors fail in the study. I am familiar with survival analysis but I don't know if it can be used here since some sensors stop communicating even though their last status was not a failure.
I want to explore if any machine learning based approach can be used to train on this data to predict the probability of failure for a new sensor. The goal being to create a predictive maintenance model to identify indication of failure before it happens. How do I approach this problem?
I don't know which features are associated with the failure. Some feature engineering I am going to do is to calculate days of operation, daily rate of change of sensor parameters, 7 days moving average, etc. Then use the transformed data to train a logistic regression as a starting point. But honestly, I can't figure out how to identify the indication of an upcoming failure with enough time to actually do something about it.
2019-01-01 Device1 60 10000 30 0 2 ... Pass 2019-01-02 Device1 60 10002 30 0 2 ... Pass ... 2019-04-04 Device1 60 13002 29 0 2 ... Pass 2019-04-05 Device1 60 13039 29 0 2 ... Fail 2019-01-01 Device2 45 9876 0 0 2 ... Pass 2019-01-02 Device2 45 9876 0 0 2 ... Pass ... 2019-08-30 Device2 62 12321 18 2 2 ... Pass