Forum:

Testing

Can someone please review this test case code where tests are passing and fully covering the code

Ranch Hand

Posts: 2983

posted 2 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Below is the code for test case class on running which the 2 test cases are getting PASSED

Below is the actual code being tested in file spark.s3.reader.py :

Below is the output on running the test cases :

Ran 2 tests in 5.772s

OK

Also, I checked it is covering all lines of the method read_from_s3

Can someone please review whether my unit test code is correct?

Thanks

Liutauras Vilda

Sheriff

Posts: 9059

667

I like...

posted 2 years ago

2
Number of slices to send:

Optional 'thank-you' note:

Send

For me personally, such type of test(s) are a maintenance burden.

To explain it in different words, what do they test? Pretty much nothing, or not so much useful. "Whether code executed top to down".

You may would want to mock a data frame as a result of reading from s3 bucket, and test some i.e. transforms with data and compare the expected outcome against the actual - that would be valuable.

Now, when it comes down to testing whether your application/function can read from an s3 bucket, that is an integration test, which wouldn't be part of your usual unit test suite.

Java 8 (verified skill) Skill verified by Campbell Ritchie

Liutauras Vilda

Sheriff

Posts: 9059

667

I like...

posted 2 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

...and, don't get hooked too much on the lines of code executed during the test(s) run. 100% coverage may mean nothing. Test what is valuable to test, some sort of behaviour, which would give you a confidence about the function you are testing, not the reassurance that 100% of code been executed.

You can however have some asserts in addition to what you are mainly testing, whether there were an expected interaction with a mock.

i.e. pulled from a thin air, ignore syntax:

Monica Shiralkar

Ranch Hand

Posts: 2983

posted 2 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Thanks

Liutauras Vilda wrote:

You may would want to mock a data frame as a result of reading from s3 bucket, and test some i.e. transforms with data and compare the expected outcome against the actual - that would be valuable.

Yes, I had initially wanted to do that but mocking S3 hadn't worked fine for me.I had got error "No FileSystem for scheme "s3" "while using unittest.mock to mock spark.read.csv call to S3

Liutauras Vilda wrote:

Now, when it comes down to testing whether your application/function can read from an s3 bucket, that is an integration test, which wouldn't be part of your usual unit test suite.

If corelate this with the statement above, does it mean that unit test should not be written for it and rather integration test should be written.?

Liutauras Vilda

Sheriff

Posts: 9059

667

I like...

posted 2 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Monica Shiralkar wrote:Yes, I had initially wanted to do that but mocking S3 hadn't worked fine for me.I had got error "No FileSystem for scheme "s3" "while using unittest.mock to mock spark.read.csv call to S3

Well, you were not really even testing anything related to s3. All you verified (assuming that test ran), that spark mock has been called, but that's not the full story. I don't know a lot of subtleties there, but what you potentially would have wanted at least, is to test whether spark.read.csv been called, assuming you'd want to ensure that reader is fixed to csv file type.

But again, do you see? This type of discussion is not only confusing, but pretty much useless for the application you are building - hence I said, that such test is just an extra maintenance without much value.

Monica Shiralkar wrote:If corelate this with the statement above, does it mean that unit test should not be written for it and rather integration test should be written.?

Well, correct. Unit tests shouldn't communicate with external blob storage, that's why I said, for what you need to test, you can mock a read blob, which would be a DataFrame, wouldn't it, and test your application business logic, assuming you successfully read a blob from s3 bucket and that got loaded into data frame.

While the integration test would be that reads an actual file from s3 bucket (not just that!) and tests some more elaborate behaviour, and which by proxy would test whether a reading from s3 bucket succeeds, meaning application has an access to it etc... I'm assuming an access to s3 bucket wouldn't be needed from where application gets built, but rather from where it runs, that's another reason why that supposed to be part of a bigger integration test, might be running once a day or so, which loads spark job to cluster (i.e. aws:emr gcp:dataproc) depending where you'd have it in practice.

I perhaps would leave integration tests for later, until your application more matures (along with your CI/CD pipeline).

Monica Shiralkar

Ranch Hand

Posts: 2983

posted 2 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Liutauras Vilda wrote:you can mock a read blob, which would be a DataFrame, wouldn't it, and test your application business logic, assuming you successfully read a blob from s3 bucket and that got loaded into data frame.

Yes, I had initially tried to do that as below:

However, it had given error "No FileSystem for scheme "s3".

Replace the word "snake" with "danger noodle" in all tiny ads.

Paul Wheaton's 16th Kickstarter: Gardening playing cards for gardeners and homesteaders

https://coderanch.com/t/889615/Paul-Wheaton-Kickstarter-Gardening-playing