I am dealing with a tab separated file with nearly 200MM rows on Linux. In one the columns which contains binary values, I noticed the data type is not consistent and there is a large number of missing values. Here is an example:
input:
timestamp val 1589205592 0 1589205593 0.0 1589205594 1589205595 1 1589205595 1.0 I tried what was suggested here using awk, but seems it can be really slow since the file size is large. I am trying to fill these values with 0, make the data types consistent i.e., convert all float to int, and overwrite the current file.
output:
timestamp val 1589205592 0 1589205593 0 1589205594 0 1589205595 1 1589205595 1
.0? What if you have1.00? Should that also become1? I assume you don't want to change if the value is1.06, right? Or do you want to round the numbers to the closest integer value? What is the "data type" and how is it not consistent? Please edit your question and clarify.