Can someone please review this test case code where tests are passing and fully covering the code
-
-
Number of slices to send:Optional 'thank-you' note:
-
-
Below is the actual code being tested in file spark.s3.reader.py :
Below is the output on running the test cases :
Ran 2 tests in 5.772s
OK
Also, I checked it is covering all lines of the method read_from_s3
Can someone please review whether my unit test code is correct?
Thanks
-
2 -
-
Number of slices to send:Optional 'thank-you' note:
-
-
To explain it in different words, what do they test? Pretty much nothing, or not so much useful. "Whether code executed top to down".
You may would want to mock a data frame as a result of reading from s3 bucket, and test some i.e. transforms with data and compare the expected outcome against the actual - that would be valuable.
Now, when it comes down to testing whether your application/function can read from an s3 bucket, that is an integration test, which wouldn't be part of your usual unit test suite.
-
-
Number of slices to send:Optional 'thank-you' note:
-
-
You can however have some asserts in addition to what you are mainly testing, whether there were an expected interaction with a mock.
i.e. pulled from a thin air, ignore syntax:
-
-
Number of slices to send:Optional 'thank-you' note:
-
-
Liutauras Vilda wrote:
You may would want to mock a data frame as a result of reading from s3 bucket, and test some i.e. transforms with data and compare the expected outcome against the actual - that would be valuable.
Yes, I had initially wanted to do that but mocking S3 hadn't worked fine for me.I had got error "No FileSystem for scheme "s3" "while using unittest.mock to mock spark.read.csv call to S3
Liutauras Vilda wrote:
Now, when it comes down to testing whether your application/function can read from an s3 bucket, that is an integration test, which wouldn't be part of your usual unit test suite.
If corelate this with the statement above, does it mean that unit test should not be written for it and rather integration test should be written.?
-
-
Number of slices to send:Optional 'thank-you' note:
-
-
Monica Shiralkar wrote:Yes, I had initially wanted to do that but mocking S3 hadn't worked fine for me.I had got error "No FileSystem for scheme "s3" "while using unittest.mock to mock spark.read.csv call to S3
Well, you were not really even testing anything related to s3. All you verified (assuming that test ran), that spark mock has been called, but that's not the full story. I don't know a lot of subtleties there, but what you potentially would have wanted at least, is to test whether spark.read.csv been called, assuming you'd want to ensure that reader is fixed to csv file type.
But again, do you see? This type of discussion is not only confusing, but pretty much useless for the application you are building - hence I said, that such test is just an extra maintenance without much value.
Monica Shiralkar wrote:If corelate this with the statement above, does it mean that unit test should not be written for it and rather integration test should be written.?
Well, correct. Unit tests shouldn't communicate with external blob storage, that's why I said, for what you need to test, you can mock a read blob, which would be a DataFrame, wouldn't it, and test your application business logic, assuming you successfully read a blob from s3 bucket and that got loaded into data frame.
While the integration test would be that reads an actual file from s3 bucket (not just that!) and tests some more elaborate behaviour, and which by proxy would test whether a reading from s3 bucket succeeds, meaning application has an access to it etc... I'm assuming an access to s3 bucket wouldn't be needed from where application gets built, but rather from where it runs, that's another reason why that supposed to be part of a bigger integration test, might be running once a day or so, which loads spark job to cluster (i.e. aws:emr gcp:dataproc) depending where you'd have it in practice.
I perhaps would leave integration tests for later, until your application more matures (along with your CI/CD pipeline).
-
-
Number of slices to send:Optional 'thank-you' note:
-
-
Liutauras Vilda wrote:you can mock a read blob, which would be a DataFrame, wouldn't it, and test your application business logic, assuming you successfully read a blob from s3 bucket and that got loaded into data frame.
Yes, I had initially tried to do that as below:
However, it had given error "No FileSystem for scheme "s3".
| Replace the word "snake" with "danger noodle" in all tiny ads. Paul Wheaton's 16th Kickstarter: Gardening playing cards for gardeners and homesteaders https://coderanch.com/t/889615/Paul-Wheaton-Kickstarter-Gardening-playing |











