I have got historic data of around 20 machine sensors with a time resolution of one second stored in csv-files which need to be imported to a SQL database prior to further data handling and analysis.
A represantative mockup of the data tim import looks like this:
+---------------------+------------------+------------------+------------------+-----+-------------------+ | timestamp | name_of_sensor_0 | name_of_sensor_1 | name_of_sensor_2 | ... | name_of_sensor_19 | +---------------------+------------------+------------------+------------------+-----+-------------------+ | 2019-12-25 05:35:20 | 10 | 11 | 12 | ... | 19 | +---------------------+------------------+------------------+------------------+-----+-------------------+ | 2019-12-25 05:35:21 | 20 | 21 | 22 | ... | 29 | +---------------------+------------------+------------------+------------------+-----+-------------------+ | 2019-12-25 05:35:22 | 30 | 31 | 32 | ... | 39 | +---------------------+------------------+------------------+------------------+-----+-------------------+ | 2019-12-25 05:35:23 | 40 | 41 | 42 | ... | 49 | +---------------------+------------------+------------------+------------------+-----+-------------------+ | 2019-12-25 05:35:24 | 50 | 51 | 52 | ... | 59 | +---------------------+------------------+------------------+------------------+-----+-------------------+ | 2019-12-25 05:35:25 | 60 | 61 | 62 | ... | 60 | +---------------------+------------------+------------------+------------------+-----+-------------------+ For each sensor there are some descriptive meta data which should also be available in the database since these might contain important information needed to gain insights during further data analysis. Each sensor has the following meta data:
- designator - acquisition_channel - system_code - unit_code - sub_system_code - function_code - major_counting_number - minor_counting_number - measurement_unit - description In order to combine the readings and the meta data, I thought about using two SQL tables. One containing all the meta data as given above:
SensorTable - id - designator - acquisition_channel - unit_code - system_code - sub_system_code - function_code - major_counting_number - minor_counting_number - measurement_unit - description And another table containing the readings of each sensor at a given timestamp. To be able to combine the data from both tables, I could perform a JOIN on the sensor_id which is a foreign key:
ReadingsTable - id - timestamp - sensor_uid - value Both table defintions are implemeted using `sqlalchemy` like this: class Sensor(Base): __tablename__ = 'sensors' id = Column(Integer, primary_key=True, autoincrement=True) designator = Column(String, unique=True, nullable=False) acquisition_channel = Column(String) unit_code = Column(String) system_code = Column(String) sub_system_code = Column(String) function_code = Column(String) major_counting_number = Column(String) minor_counting_number = Column(String) measurement_unit = Column(String, nullable=False) description = Column(String) readings = relationship('Reading') class Reading(Base): __tablename__ = 'readings' id = Column(Integer, primary_key=True, autoincrement=True) timestamp = Column(DateTime, nullable=False) sensor_id = Column(Integer, ForeignKey('sensors.id'), nullable=False) value = Column(Float, nullable=False) This table design looked pretty obvious to me and should fulfill the very basic principles of normalization. However, after having a look at the resulting table rows, I am wondering if I need to (or can) normalize the timestamp column any further. Every row in the ReadingsTable contains the sensor reading at a given timestamp. Since all sensor measure at the exact same time, I get a lot of duplicate timestamps in the same column. Recalling my data mockup from above, an excerpt of the ReadingsTable would look as following:
+-----+---------------------+-----------+-------+ | id | timestamp | sensor_id | value | +-----+---------------------+-----------+-------+ | 60 | 2019-12-25 05:35:22 | 1 | 30 | +-----+---------------------+-----------+-------+ | 61 | 2019-12-25 05:35:22 | 2 | 31 | +-----+---------------------+-----------+-------+ | 62 | 2019-12-25 05:35:22 | 3 | 32 | +-----+---------------------+-----------+-------+ | ... | ... | ... | ... | +-----+---------------------+-----------+-------+ Do I need to normalize the timestamp column any further due to the duplicate entries for each timestamp? How could I do this? Which adoptions should I make to my database / table design?
I had a look at this answer which suggests a very similar approach, but still does not address the duplicates in the timestamp column.