4

I'm a novice python hobbyist and have started experimenting with multi-threading using concurrent.futures.

Each individual thread is supposed to analyse an HTML file and then append certain items to a list. Once all threads have finished, the resulting list is then written to a CSV file.

The surprising result is that certain parts of a row seem to be offset by 1 row in the list, e.g.:

Expected result:

caseList = [ [a1, a2, a3], [b1, b2, b3], [c1, c2, c3], [d1, d2, d3], ] 

Actual result:

caseList = [ [a1, a2, a3], [b1, a2, a3], [c1, b2, b3], [d1, c2, c3] ] 

Where the letters represent exactly one HTML file that is supposed to be analysed by one thread. I can't exactly pinpoint where it changes, but it starts off correct but then certain rows partly contain items that should belong to the previous row.

I have read about race conditions and locking, but have also read comments that list.append should be thread safe. So not entirely sure what's at play here.

Here's my code:

caseList = [] with concurrent.futures.ThreadPoolExecutor() as executor: results = [executor.submit(searchCase, filename, pattern) for filename in logContents] for f in concurrent.futures.as_completed(results): caseList.append(f.result()) print(f.result()) 

Is there anything that I am obviously doing wrong here?

3
  • 1
    The thread-safety of list.append() isn't an issue here, since you are doing that entirely in the main thread. This looks like your threads are somehow sharing working variables. Commented Jan 28, 2020 at 5:46
  • Thanks, @jasonharper. That was my initial suspicion as well but the function searchCase only calls other functions which all use local variables only, so I'm unsure how this could happen. I will go back and double-check that again! Commented Jan 28, 2020 at 5:52
  • There should not be a race condition in your code. However you should not expect that the results returned from future.as_completed(results), will be yield from the generator in the same order. It is also explained in this SO question Avoiding race condition while using ThreadPoolExecutor Commented Feb 8, 2020 at 14:17

1 Answer 1

2

The response to this question is in Avoiding race condition while using ThreadPoolExecutor

You should not expect ordered results returned from the generator The for loop:

for f in concurrent.futures.as_completed(results): 

Exists to control the generator created by concurrent.futures.as_completed(results). However, results are yield as they are available. As it is an asynchronous execution, results will be yield un-ordered.

You can see this explanation in the current.future documentation here:

concurrent.futures.as_completed(fs, timeout=None)

Returns an iterator over the Future instances (possibly created by different Executor instances) given by fs that yields futures as they complete (finished or canceled futures). Any futures given by fs that are duplicated will be returned once. Any futures that completed before as_completed() is called will be yielded first. The returned iterator raises a concurrent.futures.TimeoutError if next() is called and the result isn’t available after timeout seconds from the original call to as_completed(). timeout can be an int or float. If timeout is not specified or None, there is no limit to the wait time.

Hope this helps

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.