3

I will try to keep what I am trying to do as simple as possible.

I have two classes ClassA and ClassB

ClassA has an instance method that contains a while loop that runs "infinitely" and collects data. ClassA is also passed an instance of ClassB. While ClassA collects this data, it is also checking the data that comes in to see if a certain signal has been received. If the signal has been received, an instance method in ClassB is called upon.

Consider the following main program driver:

from class_a import ClassA from class_b import ClassB database_connection = MongoDB #purely example class_b = ClassB(database_connection) class_a = ClassA(class_b) 

And then the classes:

Class class_a: def __init__(self, class_b): self.class_b def collect_data(self): while True: data = receiver() if (signal in data): self.class_b.send_data_to_database(data) Class class_b: def __init__(self, database): self.database = database def convert_data(self, data): return data + 1 def send_data_to_database(data): converted_data = convert_data(data) self.database.send(converted_data) 

Now here is my question. Should I have a thread for the "send_data_to_database()" instance method in Class B? My thought process is that possibly spawning a thread just to deal with sending data to a database, will be faster THAN the instance method NOT being threaded. Is my thinking wrong here? My knowledge of threading is limited. Ultimately, I am just trying to find the fastest way to send data to the database upon Class A recognizing that there is a signal in the data. Thanks to all of those who reply in advance.

7
  • 2
    Threads imply concurrency - i.e. multiple actions at once. Your code is purely sequential, with one action after another: ... -> receive -> check -> send -> receive -> .... Offloading a single action to a thread, e.g. send, is generally not worth it - starting the thread takes longer than just doing the action directly. Commented Jun 19, 2019 at 14:39
  • 1
    What becomes of collected data where the signal is not in the data? Does class A sleep between data collection runs, or does he just crank as fast as he can? Is there a realistic risk that he falls behind, or can he just take his own sweet time collecting data? What is the rest of the app doing besides this data collection piece? Or is this it? Commented Jun 19, 2019 at 14:41
  • @bigh_29 Data that does not have the signal in it, is omitted. ClassA does not sleep between data collection runs. To keep things simple, this is pretty much the app (besides the data being processed). There is no significant risk to data collection falling behind; My main concern is being able to send the data as fast as possible upon receiving that signal. Commented Jun 19, 2019 at 14:46
  • 3
    If there is no risk of data collection falling behind, there is no need for threading here. Certainly not opening a thread and closing it every time you want to write to the database, which would be slower. If the worry were that data collection could fall behind and you want the while loop to continue even when writes are occurring, then I would permanently open a thread with a second while loop monitoring a queue (from standard Python library). Send DB write requests to the queue as they come in and have the second thread handle them while the first thread continues. Commented Jun 19, 2019 at 14:56
  • 1
    @KyleDeGennaro Processes are even costlier than threads. If you do not have anything to do concurrently, doing things concurrently makes no sense. If you don't know whether you have anything to do concurrently, we cannot tell you either. Ultimately, concurrency is about weighing costs against benefits, and you have defined neither. How long does conversion take? How long does sending take? How long does receiving take? How long can receiving be delayed by sending before it is a problem? Are you CPU or I/O bound? And so on... Commented Jun 19, 2019 at 15:03

1 Answer 1

2

I would use threads if either of these are true:

  • The blocking I/O database calls in B can negatively impact A's ability to collect data in a timely manner.
  • These two data collection pieces together can negatively impact the responsiveness of other parts of the app (think unresponsive GUI)

If neither condition is true, then a single threaded app is a lot less hassle.

Consider using a Queue for concurrency if you do use threads. Class A can post data to a Queue that class B is waiting on. Here is a bare bones code example of what I mean:

from queue import Queue from threading import Thread, Event class class_a: def __init__(self, queue): self.queue = queue self.thread = Thread(target=self.collect_data) self.thread.start() def collect_data(self): for data in range(1000): if data % 3 == 0: print(f'Thread A sending {data} to queue') self.queue.put(data) else: print(f'Thread A discarding {data}') class class_b: def __init__(self): self.queue = Queue() self.thread = Thread(target=self.process_data) self.thread.daemon = True self.thread.start() def process_data(self): while True: data = self.queue.get() print(f'Thread B received {data} from queue') b = class_b() a = class_a(b.queue) 

Lastly, anytime you think about using parallelism in python, you have to ask whether multiprocessing makes more sense than multithreading. Multiprocessing is a better choice when CPU computation, rather than file or network I/O, becomes the limiting factor in the performance of the app. I don't think multiprocessing is a good fit for your project based on the information you provided.

Sign up to request clarification or add additional context in comments.

5 Comments

I see. Perhaps there must be a loss of time if a queue is implemented? Because now, ClassA sends data to the queue, while ClassB listens. In contrast to my original example, doesn't that add an extra step from getting the Data from ClassA to ClassB ?
I absolutely agree with the recommendation to use a queue. That way, the overhead of starting a thread for every occurrence of the signal in the data is removed.
Maybe I have a fault in my explanation; I don't plan to start a thread upon every occurrence of the signal in the data. Simply just a thread to listen for that signal and then send a request to the database via HTTP; Would threading the method that sends the HTTP request, be faster than not? Or is there no significant difference (and perhaps a waste in memory) to do this since everything MUST happen sequentially?
@KyleDeGennaro, pasted a code example to better explain. Class B spawns a single thread for the life fo the app. A Queue is a great way for one thread to send data to another. You might want to encapsulate the queue behind a helper method on B. Depends on whether you want class A do know about an instance of class B, or whether it should know about a shared Q instance. Either way, the queue is handling the communication between the two.
Thank you for the detailed response. Very helpful!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.