Skip to content

No global data dir#1160

Merged
VakarisZ merged 19 commits intodevelopfrom
no-global-data-dir
May 13, 2021
Merged

No global data dir#1160
VakarisZ merged 19 commits intodevelopfrom
no-global-data-dir

Conversation

@mssalvatore
Copy link
Collaborator

@mssalvatore mssalvatore commented May 11, 2021

What does this PR do?

Issue #1146

Refactor usage of data_dir:
Decouples the environment config from components that require the data_dir
Adds tests
Removes circular dependency between ConfigService and PostBreachFilesService
Consolidates some duplicate functionality

PR Checklist

  • Have you added an explanation of what your changes do and why you'd like to include them?
  • Is the TravisCI build passing?
  • Was the CHANGELOG.md updated to reflect the changes?
  • Was the documentation framework updated to reflect the changes?

Testing Checklist

  • Added relevant unit tests?
  • Have you successfully tested your changes locally? Elaborate:

    Tested by uploading/deleting post-breach actions and running the monkey locally

  • If applicable, add screenshots or log transcripts of the feature working
@mssalvatore mssalvatore requested review from VakarisZ and shreyamalviya and removed request for shreyamalviya May 11, 2021 17:52
@codecov
Copy link

codecov bot commented May 11, 2021

Codecov Report

Merging #1160 (b8d4452) into develop (0b21dac) will increase coverage by 0.13%.
The diff coverage is 28.86%.

Impacted file tree graph

@@ Coverage Diff @@ ## develop #1160 +/- ## =========================================== + Coverage 28.67% 28.81% +0.13%  =========================================== Files 412 414 +2 Lines 12755 12760 +5 =========================================== + Hits 3658 3677 +19  + Misses 9097 9083 -14 
Impacted Files Coverage Δ
monkey/monkey_island/cc/app.py 0.00% <0.00%> (ø)
monkey/monkey_island/cc/main.py 0.00% <0.00%> (ø)
monkey/monkey_island/cc/resources/local_run.py 0.00% <0.00%> (ø)
...ey/monkey_island/cc/resources/pba_file_download.py 0.00% <0.00%> (ø)
...nkey/monkey_island/cc/resources/pba_file_upload.py 0.00% <0.00%> (ø)
monkey/monkey_island/cc/services/database.py 0.00% <ø> (ø)
monkey/monkey_island/cc/services/initialize.py 0.00% <0.00%> (ø)
...nkey/monkey_island/cc/services/run_local_monkey.py 0.00% <0.00%> (ø)
monkey/monkey_island/cc/services/config.py 54.10% <22.22%> (-1.12%) ⬇️
monkey/common/config_value_paths.py 100.00% <100.00%> (ø)
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0b21dac...b8d4452. Read the comment docs.

Base automatically changed from untangle-logger-config to develop May 11, 2021 18:43
@mssalvatore mssalvatore force-pushed the no-global-data-dir branch from 7cb9bcf to 9476441 Compare May 11, 2021 18:44
@mssalvatore mssalvatore marked this pull request as ready for review May 11, 2021 18:45
Copy link
Contributor

@shreyamalviya shreyamalviya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

patch_new_user_classes # unused variable (monkey/tests/infection_monkey/utils/test_auto_new_user_factory.py:25)
patch_new_user_classes # unused variable (monkey/tests/infection_monkey/utils/test_auto_new_user_factory.py:31)
mock_home_env # unused variable (monkey/tests/monkey_island/cc/server_utils/test_island_logger.py:20)
custom_pba_directory # unused variable (monkey/tests/monkey_island/cc/services/test_post_breach_files.py:20)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for adding this here exactly? If not, maybe we should make it a rule to add these at the end of the file along with the PR/commit it was introduced in so it's easy to track, if ever needed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put it here because it's among the other monkey/tests/monkey_island files. But we can just make a rule of adding whitelist lines at the end in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this. I don't see any custom_pba_directory in monkey/tests/monkey_island/cc/services/test_post_breach_files.py on line 20. I presume it's talking about the fixture, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, a lot of the line numbers in this file probably don't match. Maybe we should remove them, the filename should be enough.



def run_local_monkey():
def run_local_monkey(dest_dir):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole method should end up in a service. Not sure if this is "in scope" of this PR though

file_path = (
Path(env_singleton.env.get_config().data_dir_abs_path).joinpath(filename).absolute()
Path(PostBreachFilesService.get_custom_pba_directory()).joinpath(filename).absolute()
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need these parentheses?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line is longer that 100 characters, so this is how black formatted it.

windows_filename = ConfigService.get_config_value(PBA_WINDOWS_FILENAME_PATH)

config_json["monkey"]["post_breach"]["PBA_linux_filename"] = linux_filename
config_json["monkey"]["post_breach"]["PBA_windows_filename"] = windows_filename
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be added to common.config_value_paths and used from there?

create_custom_pba_file("linux_file")
create_custom_pba_file("windows_file")

custom_pda_dir_contents = os.listdir(PostBreachFilesService.get_custom_pba_directory())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: custom_pda_dir_contents -> custom_pba_dir_contents

create_custom_pba_file("windows_file")

custom_pda_dir_contents = os.listdir(PostBreachFilesService.get_custom_pba_directory())
assert len(custom_pda_dir_contents) == 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tests create_custom_pba_file which is defined in the unit test. Maybe it would be better to do assertions of file created in the create_custom_pba_file method itself? Not a strict requirement though, you can leave as is.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that assert is there to make sure the test doesn't fail silently. From the test's perspective, it says:

  • make sure some files are in the dir
  • call remove_PBA_files()
  • make sure the dir is emtpy

There may be a better implementation, but I don't think that moving the assert to create_custom_pba_file() is it. I think I can add some utility methods to make the intent clearer.

monkey/test.py Outdated
@@ -0,0 +1,4 @@
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, what is this file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops.

Copy link
Contributor

@VakarisZ VakarisZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some improvements worth considering

@ghost
Copy link

ghost commented May 12, 2021

DeepCode's analysis on #b8d445 found:

  • ℹ️ 2 minor issues. 👇

Top issues

Description Example fixes
Assert statement contains an expression which may have side effects. Occurrences: 🔧 Example fixes
Missing close for open, add close or use a with block. Occurrences: 🔧 Example fixes

👉 View analysis in DeepCode’s Dashboard | Configure the bot

@@ -0,0 +1,4 @@
#!/bin/bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GAH!

logger = logging.getLogger(__name__)


class RunLocalMonkeyService:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be a noun like LocalMonkeyRunService, now it sounds like it's a method that runs the service.

from monkey_island.cc.services.run_local_monkey import RunLocalMonkeyService


def initialize_services(data_dir):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we need to add more configurable values we'll just keep adding parameters to this method like def initialize_services(data_dir, island_port, bootloader_port):? . PostBreachFilesService and RunLocalMonkeyService doesn't need any special initialization, they just need to access configuration parameters.

Copy link
Collaborator Author

@mssalvatore mssalvatore May 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we need to add more configurable values we'll just keep adding parameters to this method like def initialize_services(data_dir, island_port, bootloader_port):?

Yes. If the configuration list gets too long, we can create a config object and pass in the config object, rather than individual parameters. The initialization function can then pass only those things that each service needs to that service.

PostBreachFilesService and RunLocalMonkeyService doesn't need any special initialization, they just need to access configuration parameters.

Right, but they only need access to one specific configuration parameter. By passing them exactly what the need, they become decoupled from global state, easier to test, and their dependencies are clearer.

If we decide that either service should write its data somewhere else, we simply pass it a different directory. No code changes to the services are necessary. If they rely on a globally accessible config singleton, we need to modify the services themselves. Giving each service exactly what it needs to get its job done keeps it decoupled and flexible.

At the moment, we're hamstrung by the architecture, which is a bunch of classes that pretend to be static and contain no state, but actually do contain state in the form of an environment singleton. These services should be instance objects that are passed to the resources via dependency injection or accessed via a service locator.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. If the configuration list gets too long, we can create a config object and pass in the config object, rather than individual parameters. The initialization function can then pass only those things that each service needs to that service.

So what I'm suggesting is decoupling environment config files from the singleton. And that singleton would be the config object.

Right, but they only need access to one specific configuration parameter. By passing them exactly what the need, they become decoupled from global state, easier to test, and their dependencies are clearer.

So why not do:

class PublicVarReader(): PUBLIC_VAR = env_singleton.public_var 

This also passes exactly one configuration parameter and when testing you'd mock PUBLIC_VAR just the same. If public_var is indeed a global config parameter, like island port, it should stay in environment singleton no matter what. What could change is that one day we decide to pull the island port from cmd instead of a file, but that's the responsibility of a singleton builder. Imagine we added the ability to customize the port of an island by specifying it in config file. If you want to create a class that needs to know the island port, you'd first need to create a "class builder" somewhere else in the codebase to create a class itself. That's less intuitive than reading a global value from a singleton and adds unnecessary complexity IMO.
Nevertheless, if you thought about it and made this design choice with care, I trust that you have a good long term vision for the design and don't require a change.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I trust that you have a good long term vision for the design and don't require a change.
I have a long term plan :)

So what I'm suggesting is decoupling environment config files from the singleton.

Ultimately, environment config will very likely go away. The deployment type shouldn't really be configurable, at least not in the sense that the log level is configurable. The user password and hash shouldn't be written into a config file, but some other file specifically for tracking that that is read-only. We should, at some point, replace our authentication system with something 3rd party (it would be more secure) that would support multiple users (our users have asked for multi-user support).

This also passes exactly one configuration parameter and when testing you'd mock PUBLIC_VAR just the same.

Making this global means your class is coupled to the specific details of how configuration is managed. It also means you need to mock it; in other words, your tests are also coupled to the specific details of your configuration. Passing a class the values it needs decreases coupling, simplifies testing, and makes the interface explicit. Things that are decoupled are easier to change (see The Pragmatic Programmer, The Essence of Good Design and Decoupling).

Anyway, we can discuss this more on a call (better communication faster).

Copy link
Contributor

@VakarisZ VakarisZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor things, but the biggest question that I spent a lot of time thinking about is if this we need to initialize **classes** whenever we want to read a global parameter is a good design choice.

@mssalvatore mssalvatore force-pushed the no-global-data-dir branch from c8cc968 to 79eb744 Compare May 12, 2021 15:53
@mssalvatore
Copy link
Collaborator Author

mssalvatore commented May 12, 2021

Some minor things, but the biggest question that I spent a lot of time thinking about is if this we need to initialize **classes** whenever we want to read a global parameter is a good design choice.

It's not a good design choice. We should be using dependency injection or the service locator pattern. Hence this TODO:
image

Resolving this ATM is outside the scope of our current goal, which is to write all the data to a configurable, writable location on all platforms. This solution is a hack that is reasonably easy to fix and adapt to whatever architectural design decisions we make regarding how resources access services.

@VakarisZ VakarisZ merged commit c40465d into develop May 13, 2021
@VakarisZ VakarisZ deleted the no-global-data-dir branch May 13, 2021 10:03
@mssalvatore mssalvatore mentioned this pull request May 22, 2021
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants