I've come back to this question about 4 or 5 times since it was originally posted, started writing up an answer, started doubting my alternative implementation, scrapped it, and then come back about a month later. This tells me a few things:
- You have a tricky problem, where the "best" solution is not necessarily intuitive, and how you measure "best" can vary wildly
- Overall, the code is pretty good, so I can't just write a more nit-picky answer and ignore the larger question of "is this structured the right way"
Because I had such a hard time with this, I'm not going to completely rewrite this, or provide an alternate implementation, or anything like that. Instead, I'm going to call out some overall architecture and design choices that I think have made this tricky, some potential solutions with pros/cons, and then leave it as an exercise for the reader to decide what they want to do.
I'll also include a few nitpicky things about the code, because that's just who I am as a person and I'll feel bad if I don't review your code at all.
Lastly, I've never used Django before, so its very possible a much better solution exists within the framework, or that someone else has solved the problem in another 3rd party library, or whatever. I encourage you to do your own research on if Django can make this easier.
The problem
How can I refactor this function to make it easier to test?
You're right that this is hard to test as-is. The most significant reason for this, IMO, is that you're mixing up a lot of magic, a lot of state that your function doesn't own, and the business logic that your function does own.
Magic
Your function(s) are littered with magic numbers, strings, etc. These are all things that apparently have significant meaning, and are very tightly coupled with your implementation. If you ever decide that you want your colors to change, or you want to change somethings title, or the default resource type, or whatever, then you have a lot of logic that assumes the magic.
A good way to make this less magic, easier to test, and easier to update is to at least make all magic values constants at the beginning of the file. For example, if you do this msg_state.get('resource_type', 'all') then it seems a little unclear what we're doing; just some place-holder, or something more meaningful? But if you do this msg_state.get('resource_type', DEFAULT_RESOURCE_TYPE) then it becomes immediately clear.
Note that you wouldn't want to do this msg_state.get('resource_type', ALL_RESOURCE_TYPE) because that isn't really any better than just putting 'all' there, unless you think that the resource type meaning all would change values.
This is also a good time to propose using enums instead of strings, as you get runtime validation of correct values (e.g. ResourceTypeEnum.lAl will give you an error that lAl is not a valid enum, while 'lAl' won't warn you that you typoed your string.
State
Your function to build a message depends a lot on the state of your application, of third party libraries, the framework, etc. In just this one function, you explicitly query the following kinds of state:
- Django state (pages and tasks)
- Slack state (users)
- Application state (message, channels, teams)
This couples you really tightly with these things that your function can't control, and is the death of any truly repeatable unit testing. There are a lot of ways you could address them; for example you could make a service per state and use dependency injection or some other inversion of control methodology. This is what I was playing around with for a while, and eventually came up with a 500 line, OOPomination that was super generic and super unreadable. I don't know what support Django has for this, but I would guess that at some point someone has made dependency injection easier.
Ultimately, I think there is room for some level of encapsulation here. It is likely worthwhile to create separate methods for the different types of work, and filtering, etc. Could you subclass Page or AsyncTask to provide a slack_item property? Is there a better spot in your application to filter on resource type, or team, or availability, or channel, etc?
Business Logic
Once you peel everything back, your actual logic is pretty straightforward:
- Get the list of resources that meet some filtering criteria
- Transform them into json that meets some Slack specification
- Send it back
This part is pretty straightforward, and I don't have a lot to say about it. These each sound like good candidates for unit testing though, and might be worth splitting up that way.
A potential solution
I think whatever you do to restructure this is going to add more code and complexity (by becoming more generic/loosely coupled), and how much of that makes sense for your use-case depends. If your tool is meant to just be a quick-and-dirty tool for your team, it might not make sense to split up the code. If you want this to be around longer, or there are plans for future enhancements, this might be a good time to break things up. I think I'd roughly recommend the following:
- For different kinds of data (e.g.
Page vs AsyncTask vs other stuff), create different methods/handlers/classes/whatever for each one that is responsible for transforming it into the Slack format json. A perfect solution would be able to take advantage of duck-typing, where you could just do input_object.to_slack_json() - Play around with the filtering - can this be done by the caller instead of your application? Can you encapsulate how to do it in the same per-type method/handler/class/whatever as previously suggested?
- Instead of getting your state within the function, pass it in instead. Then it can be truly independent of how it is being run, and the unit test can construct the input as desired.
As promised, here are some
Little nitpicky things
When doing this:
if not msg_state: msg_state = {}
It may be easier to write it this way: msg_state = msg_state or {}. or will pick the first truthy-element, or the last non-truthy element (e.g. the empty dictionary).
Instead of using this (and needlessly building lists just to throw them away)
attachments = [ _build_filters(resource_type, availability), *[_build_page_item(p, user) for p in pages], *[_build_async_task_item(at, user) for at in async_tasks] ]
Consider using itertools.chain:
attachments = list( itertools.chain( (_build_filters(resource_type, availability), ), (_build_page_item(p, user) for p in pages), (_build_async_task_item(at, user) for at in async_tasks) ) )
Instead of using list + filter, and lambda functions that add just a bit of extra mental overhead
selected_resource_types = list(filter( lambda t: t['value'] == resource_type, resource_types)) selected_availability_choices = list(filter( lambda a: a['value'] == availability, availability_choices))
Use a list comprehension
selected_resource_types = [ rt for rt in resource_types if resource_type["value"] == rt ] selected_availability_choices = [ ac for ac in availability_choices if availability["value"] == ac ]
Or, define a class that represents a Slack option, or a selection of Slack options, and encapsulate this logic in that class. Then you don't have to repeat yourself between resource_types and availability_choices.
Rough outline:
class SlackOption: def __init__(self, text, value): self.text = text self.value = value class SlackOptions: def __init__(self, choices): self.choices = choices def get_selection(self, choice): return [option for option in self.choices if option.value == choice]
Page,AsyncTaskandSlackUserare undefined. Are they from some module or die you write them yourself? \$\endgroup\$Page,AsyncTaskandSlackUserare simply Django models. \$\endgroup\$