I have a main-class that takes a list of sources and returns two objects for each source; one with the required data and one analytics tool.
The Analytics-class has different methods depending on what source it is. The Data-class extracts data from different paths and cleans the data in different ways depending on the source. Importing/exporting is made through pandas read_excel(). The analytics tool outputs some calculations based on what source the data comes from.
class Main_class(): def __init__(self, sources = ['a','b','c']): self.data_sources = {} self.analytics = {} for s in sources: self.data_sources[s] = Data(s) self.analytics[s] = Analytics(s, self.data_sources[s]) Right now, my solution is to have one Data-class and one Analytics-class which has if-statements to adapt the functionality of the class depending on the source. This is not a scalable or otherwise good solution, I basically have checks in both classes where I say
acceptable_sources = ['a', 'b', 'c'] if source not in acceptable_sources: raise ValueError(f"Only acceptable sources are: {acceptable_sources}") Then, I need more checks to set the self.variables correctly, here's an example from the Data-class
self.data = {} if source == 'a': # if it's a, then there's 3 sources self.data[a] = pd.read_excel('a_1.xlsx') self.data[a] = pd.read_excel('a_2.xlsx') self.data[a] = pd.read_excel('a_2.xlsx') elif source == 'b': # if it's b, then there's 2 sources self.data[a] = pd.read_excel('b_1.xlsx') self.data[a] = pd.read_excel('b_2.xlsx') This is problematic, since there will be a lot of if-statements as the number of sources increase, but it might be the best solution, I'm not sure. Using the same idea in my Analytics-class, there will be a lot of unused functions for each source-case. Let's say there are 30 functions for source a, 25 functions for source b and 40 functions for source c. Some of these functions might be shared across sources, and some will be unique. So whichever source I use, there will be a lot of unused methods which seems like a waste.
My first thought was to make Analytics and Data into abstract classes and create unique classes for each compatible source, but then I wouldn't be able to instantiate them in the for-loop in my main class. Then I thought that I could include them in a Class_holder which basically checks which class I want to instantiate, and if it exists to return an object of that class. So for example if I have X possible sources, the Class_holder would be able to handle and return X different classes, and if it doesn't exist return an error. It would look something like
from analytics import A, B, C # classes I should create with correct methods class Class_holder: def __init__(self, source, data): self.acceptable_sources = ['a', 'b', 'c'] if source in acceptable_sources: raise ValueError(f"Only acceptable sources are: {acceptable_sources}") self.source = source self.data = data def return_analytics_class(self): if self.source == 'a': return A(self.data) elif self.source == 'b': return B(self.data) elif self.source == 'c': return C(self.data) And the classes I have called A, B, C could either be a combination of Data and Analytics, or I could separate it by having one Data and one Analytics-class for each source, then the Class_holder.return_class() would return a tuple with two classes. For the Class_holder-solution I would have to change my Main to something like
from a_file import Data from another_file import Class_holder class Main_class(): def __init__(self, sources = ['a','b','c']) self.data_sources = {} for s in sources: self.data_souces[s] = Data(s) self.analytics = {} for s in sources: self.analytics[s] = Class_holder(s, self.data_sources[s]).return_analytics_class() But then I'm back to my original problem, where I need to have checks in both Data and Class_holder to see if the sources are compatible, however this might solve the problem of only instantiating the correct analytics-functions for each source.
It just doesn't feel like an optimal way of doing this kind of task, so I'm turning to codereview to ask for a bit of guidance, if you know any design pattern or other solution for this kind of problem, I would greatly appreciate it.