Skip to content

clean up logging#1190

Merged
jeffra merged 6 commits intodeepspeedai:masterfrom
stas00:log-fixes
Jun 28, 2021
Merged

clean up logging#1190
jeffra merged 6 commits intodeepspeedai:masterfrom
stas00:log-fixes

Conversation

@stas00
Copy link
Collaborator

@stas00 stas00 commented Jun 26, 2021

This PR attempts to clean up uncontrollable by user debug prints, it:

  • replaces some prints with logs
  • disables some debug prints

When running with log level ERROR, it's now almost clean.

The only 2 logs I couldn't figure out is:

Adam Optimizer #0 is created with AVX2 arithmetic capability. Config: alpha=0.000030, betas=(0.900000, 0.999000), weight_decay=0.000000, adam_w=1 Adam Optimizer #0 is created with AVX2 arithmetic capability. Config: alpha=0.000030, betas=(0.900000, 0.999000), weight_decay=0.000000, adam_w=1 

One comes from cpp, no idea how to tell a cpp file to respect a log level.

The other one I couldn't find - any idea where it comes from? It looks like some kind created on the fly print.

@stas00
Copy link
Collaborator Author

stas00 commented Jun 28, 2021

@jeffra, how do we handle these 2:

Adam Optimizer #0 is created with AVX2 arithmetic capability. Config: alpha=0.000030, betas=(0.900000, 0.999000), weight_decay=0.000000, adam_w=1 Adam Optimizer #0 is created with AVX2 arithmetic capability. Config: alpha=0.000030, betas=(0.900000, 0.999000), weight_decay=0.000000, adam_w=1 

may be should ask @RezaYazdaniAminabadi since the first one is cpp, the 2nd one I can't find where is it coming from.

@jeffra
Copy link
Collaborator

jeffra commented Jun 28, 2021

They both seem to be coming from here, The first is printed via std::cout and the other is a printf: https://github.com/microsoft/DeepSpeed/blob/bc019a533971e1dcfde93972ae892b1a4f425041/csrc/adam/cpu_adam.cpp#L342-L344

I think we can transfer the current logging state via this method that's exposed to python side: https://github.com/microsoft/DeepSpeed/blob/bc019a533971e1dcfde93972ae892b1a4f425041/csrc/adam/cpu_adam.cpp#L319-L325

Here's where it's called on python side: https://github.com/microsoft/DeepSpeed/blob/c79184ebcc501ad64a8628afe18c0659d1bc4963/deepspeed/ops/adam/cpu_adam.py#L80-L86

@stas00
Copy link
Collaborator Author

stas00 commented Jun 28, 2021

Ah, I was looking under deepspeed - that's why I missed it! Thank you!

OK, so let's make the cpp side log for log_level <= logger.INFO?

create_adam(..., should_log=log_level <= logger.INFO) 

where the current log level we get from:

python -c "import deepspeed; from deepspeed.utils import logger; print(logger.getEffectiveLevel())" 20 

Could perhaps add a wrapper for utils.logging

def get_current_level(logger): return logger.getEffectiveLevel() 
@jeffra
Copy link
Collaborator

jeffra commented Jun 28, 2021

Ah, I was looking under deepspeed - that's why I missed it! Thank you!

OK, so let's make the cpp side log for log_level <= logger.INFO?

create_adam(..., should_log=log_level <= logger.INFO) 

where the current log level we get from:

python -c "import deepspeed; from deepspeed.utils import logger; print(logger.getEffectiveLevel())" 20 

Could perhaps add a wrapper for utils.logging

def get_current_level(logger): return logger.getEffectiveLevel() 

Yep, this looks good to me. get_current_level seems useful too. Thanks @stas00!

@stas00
Copy link
Collaborator Author

stas00 commented Jun 28, 2021

Added:

  1. to utils.logging: should_log_le + get_current_level - the former is a convenience wrapper
  2. changed create_adam_optimizer to make the logging configurable + refactored the code to remove the triplicate code

now if the current log_level > logging.INFO cpp is quiet.

@stas00
Copy link
Collaborator Author

stas00 commented Jun 28, 2021

@jeffra, this PR appears to be complete.

@jeffra jeffra merged commit a029239 into deepspeedai:master Jun 28, 2021
@stas00 stas00 deleted the log-fixes branch June 29, 2021 05:10
@stas00
Copy link
Collaborator Author

stas00 commented Jun 29, 2021

@RezaYazdaniAminabadi, I have just noticed pytorch has a cpp setup that interfaces python logging system, e.g.:
https://github.com/pytorch/pytorch/blob/65d527232769ac12e575fda09dd8b6883633260f/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp#L281
so this requires no passing arguments.

Perhaps deepspeed could adopt the same mechanics, and then clean logging in cpp extensions will be a breeze.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants