Timeline for Fail fast is brittle

Current License: CC BY-SA 4.0

17 events

when toggle format	what		by	license	comment
Aug 5, 2024 at 15:11	comment	added	JacquesB		@DeanMacGregor The bank transfer was just an example to show the potential problems with processing non-conformant data. Your CSV might not be as critical as financial transactions, but presumably the data matters to someone who would prefer to avoid data loss or corruption. But in the case there isn't a specification at all (and you can't directly speak with the producer of the data to agree on a spec), then you will have to reverse-engineer the data format by investigating the messages.
Aug 5, 2024 at 10:44	comment	added	Dean MacGregor		@JacquesB The question is about a CSV consumer not an interbank communication protocol. I scrape a lot of CSV data and literally 0 of it comes from a source that is going to mutually agree with me to a rigid specification. The best I can ever hope for is that it stays relatively consistent over time. That said, I recognize that not everyone has the same experience and might be ingesting CSV files where there's a mutually agreed specification which is why I prefaced my comment with "It just depends".
Aug 4, 2024 at 12:06	comment	added	JacquesB		@DeanMacGregor It is very risky to make guesses if the data input does not conform to the agreed specification. Imagine for example a money transfer request is sent to two banks from an ATM. The message key is corrupted because of a bug or glitch. The one bank reject the transfer because it doesn't conform to the specification. But your bank accept the message despite it being malformed. Who is to blame is funds disappear?
Aug 1, 2024 at 3:07	comment	added	Dean MacGregor		It just depends. Is the program we're talking about ingesting data from a source over which we have no control where we know that sometimes a column is capitalized and sometimes not? If so, I want my program to be capable to not be strictly case sensitive on the way in but I do want it to be on the way out.
Jul 17, 2024 at 18:05	comment	added	Basilevs		In this case, we accept two, distinct formats Move and MOVE, but never mix them and report any inconsistencies.
Jul 17, 2024 at 18:03	comment	added	Basilevs		@JonRaynor, that's implementation details, not really relevant to discussion. However, I notice that you use operations, that take a wide space of inpults and accept them all, shrinking the space. This is unacceptable, as you would not be able report redundant spaces, invalid cases, etc. Instead your adapter layer could transform one valid format to another vlaid format without losing information. Example: MOVE=> Move, Move, mOve=> error
Jul 17, 2024 at 17:56	comment	added	Jon Raynor		@Basilevs - I was thinking one CSV protocol parsing/protocol, and then after CSV has been parsed having post processing that is client specific like trimming, upper casing, etc. on fields as described by a client configuration. That way you could always log what was originally parsed and then log the transformation that would be client specific. This also separates CSV protocol errors versus post processing errors. If a client is following CSV correct, there would be no need for further post processing (ideal case).
Jul 17, 2024 at 9:00	comment	added	Basilevs		@JonRaynor there is no need to relax a protocol in the scenario you describe. Instead, create two strict protocols, one for client A and one for client B.
Jul 16, 2024 at 21:14	comment	added	Jon Raynor		In the real world, things are not always cut and dry. I agree that things should be a tight as possible, but many times clients send less than ideal data and they are not willing to change it. One can default to a strict adherence to start with but build in exceptions/cleansing as needed. For example, client A may be strict, but client B may need trimming of whitespace. So, you can add in those directives over time via a processing configuration by client. These additional directives tell you parser about how to handle each field.
Jul 15, 2024 at 23:55	comment	added	Basilevs		@DocBrown people are capable to interpret good error messages. So the problem you describe can/should be handled by a validation application/macro.
Jul 15, 2024 at 12:55	comment	added	Doc Brown		I agree to all of what you wrote, except one thing: the assumption that CSV is always a machine-to-machine format on which everyone who uses it should apply that RFC. Often enough, people often use spreadsheets like Excel to produce such data, and one should investigate clearly where the CSV comes from before jumping to conclusions.
Jul 14, 2024 at 13:12	comment	added	Basilevs		Please add more on protocol maintenance and versioning. Those comments are gold.
Jul 13, 2024 at 21:03	vote	accept	NimChimpsky
Jul 13, 2024 at 14:28	history	edited	JacquesB	CC BY-SA 4.0	added 336 characters in body
Jul 13, 2024 at 12:53	history	edited	JacquesB	CC BY-SA 4.0	added 19 characters in body
Jul 13, 2024 at 12:46	history	edited	JacquesB	CC BY-SA 4.0	added 19 characters in body
Jul 13, 2024 at 12:34	history	answered	JacquesB	CC BY-SA 4.0

toggle format