1

I am trying to setup a decent logging configuration in PySpark. I have a YAML configuration file which setups different loghandlers. Those handlers consist of the console, a file, and a SQLite DB using the format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s".

# SETUP LOGGING with open(cfile, 'rt') as f: config = yaml.safe_load(f.read()) logging.config.dictConfig(config) lg = logging.getLogger("mylog." + self.__name__()) 

So each time I call the lg.xxxx('message') everything gets handled quite nicely.

Now I found quite some posts on how to get the log4j from PySpark using log_handler = sc._jvm.org.apache.log4j. But now I'm lost on how to add this handler to my existing setup and catch all the messages that happen on the PySpark console and save them to the file and SQLite DB.

1 Answer 1

1

It is not possible to catch spark jvm logs by python handlers, both these are separated environments. You have two options - to log python messages using log4j from the sc._jvm and configure log4j handlers or use separate handlers and merge logs (for example in sqlite) after job finishes.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.