Timeline for Fail fast is brittle
Current License: CC BY-SA 4.0
17 events
| when toggle format | what | by | license | comment | |
|---|---|---|---|---|---|
| Aug 5, 2024 at 15:11 | comment | added | JacquesB | @DeanMacGregor The bank transfer was just an example to show the potential problems with processing non-conformant data. Your CSV might not be as critical as financial transactions, but presumably the data matters to someone who would prefer to avoid data loss or corruption. But in the case there isn't a specification at all (and you can't directly speak with the producer of the data to agree on a spec), then you will have to reverse-engineer the data format by investigating the messages. | |
| Aug 5, 2024 at 10:44 | comment | added | Dean MacGregor | @JacquesB The question is about a CSV consumer not an interbank communication protocol. I scrape a lot of CSV data and literally 0 of it comes from a source that is going to mutually agree with me to a rigid specification. The best I can ever hope for is that it stays relatively consistent over time. That said, I recognize that not everyone has the same experience and might be ingesting CSV files where there's a mutually agreed specification which is why I prefaced my comment with "It just depends". | |
| Aug 4, 2024 at 12:06 | comment | added | JacquesB | @DeanMacGregor It is very risky to make guesses if the data input does not conform to the agreed specification. Imagine for example a money transfer request is sent to two banks from an ATM. The message key is corrupted because of a bug or glitch. The one bank reject the transfer because it doesn't conform to the specification. But your bank accept the message despite it being malformed. Who is to blame is funds disappear? | |
| Aug 1, 2024 at 3:07 | comment | added | Dean MacGregor | It just depends. Is the program we're talking about ingesting data from a source over which we have no control where we know that sometimes a column is capitalized and sometimes not? If so, I want my program to be capable to not be strictly case sensitive on the way in but I do want it to be on the way out. | |
| Jul 17, 2024 at 18:05 | comment | added | Basilevs | In this case, we accept two, distinct formats Move and MOVE, but never mix them and report any inconsistencies. | |
| Jul 17, 2024 at 18:03 | comment | added | Basilevs | @JonRaynor, that's implementation details, not really relevant to discussion. However, I notice that you use operations, that take a wide space of inpults and accept them all, shrinking the space. This is unacceptable, as you would not be able report redundant spaces, invalid cases, etc. Instead your adapter layer could transform one valid format to another vlaid format without losing information. Example: MOVE=> Move, Move, mOve=> error | |
| Jul 17, 2024 at 17:56 | comment | added | Jon Raynor | @Basilevs - I was thinking one CSV protocol parsing/protocol, and then after CSV has been parsed having post processing that is client specific like trimming, upper casing, etc. on fields as described by a client configuration. That way you could always log what was originally parsed and then log the transformation that would be client specific. This also separates CSV protocol errors versus post processing errors. If a client is following CSV correct, there would be no need for further post processing (ideal case). | |
| Jul 17, 2024 at 9:00 | comment | added | Basilevs | @JonRaynor there is no need to relax a protocol in the scenario you describe. Instead, create two strict protocols, one for client A and one for client B. | |
| Jul 16, 2024 at 21:14 | comment | added | Jon Raynor | In the real world, things are not always cut and dry. I agree that things should be a tight as possible, but many times clients send less than ideal data and they are not willing to change it. One can default to a strict adherence to start with but build in exceptions/cleansing as needed. For example, client A may be strict, but client B may need trimming of whitespace. So, you can add in those directives over time via a processing configuration by client. These additional directives tell you parser about how to handle each field. | |
| Jul 15, 2024 at 23:55 | comment | added | Basilevs | @DocBrown people are capable to interpret good error messages. So the problem you describe can/should be handled by a validation application/macro. | |
| Jul 15, 2024 at 12:55 | comment | added | Doc Brown | I agree to all of what you wrote, except one thing: the assumption that CSV is always a machine-to-machine format on which everyone who uses it should apply that RFC. Often enough, people often use spreadsheets like Excel to produce such data, and one should investigate clearly where the CSV comes from before jumping to conclusions. | |
| Jul 14, 2024 at 13:12 | comment | added | Basilevs | Please add more on protocol maintenance and versioning. Those comments are gold. | |
| Jul 13, 2024 at 21:03 | vote | accept | NimChimpsky | ||
| Jul 13, 2024 at 14:28 | history | edited | JacquesB | CC BY-SA 4.0 | added 336 characters in body |
| Jul 13, 2024 at 12:53 | history | edited | JacquesB | CC BY-SA 4.0 | added 19 characters in body |
| Jul 13, 2024 at 12:46 | history | edited | JacquesB | CC BY-SA 4.0 | added 19 characters in body |
| Jul 13, 2024 at 12:34 | history | answered | JacquesB | CC BY-SA 4.0 |