Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ocrd_utils.initLogging: also add handler to root logger #1288

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

bertsky
Copy link
Collaborator

@bertsky bertsky commented Oct 9, 2024

As a follow-up to ccb416b, but more general:

initLogging should always affect the root logger '', too. (Not just when disabling logging.)

Reasons:

  • to be consistent with file config (which by default also configures the unqualified root logger for our default handlers
  • to prevent imported libraries from initializing logging on their own (like import tensorflow) – the place to check whether logging has been set up yet is always getLogger('').hasHandlers(), also in Python's logging.basicConfig BTW

I know we deliberately changed this to exclude the root logger earlier. But I cannot find the reasoning behind that anymore. Maybe "we are a library, not an application"? Well, I agree actually, but then just don't call initLogging. (Perhaps at that time we got distracted by the fact that we still did call initLogging unconditionally when importing ocrd_utils.logging.)

BTW, IMO most places (core and modules code) can get their loggers from Python stdlib's logging.getLogger instead of our ocrd_utils.logging.getLogger, because we don't add anything to that nowadays.

…tent with file config and prevent imported libraries from initing logging first), but disable propagation for ocrd loggers (to avoid duplication)
@bertsky bertsky requested a review from kba October 9, 2024 16:53
@MehmedGIT
Copy link
Contributor

MehmedGIT commented Oct 10, 2024

Maybe "we are a library, not an application"? Well, I agree actually, but then just don't call initLogging.

But if initLogging is not invoked then the logging is (maybe it's a was now, not sure) not working at all when core code is imported in another project say Operandi. Ideally, the behavior of the root logger, ocrd and ocrd_network should not be affected by the default logging config file if the user does not do it specifically with a manual logging config file. Otherwise, it bugs programmatically set file handlers and log levels (in ocrd_network at least).

This said, the ocrd_network logging acts weirdly and I am not able to find out how to fix it. I did some experiments and observations that I am going to share in next Monday's meeting.

@kba
Copy link
Member

kba commented Oct 10, 2024

I know we deliberately changed this to exclude the root logger earlier. But I cannot find the reasoning behind that anymore. Maybe "we are a library, not an application"? Well, I agree actually, but then just don't call initLogging. (Perhaps at that time we got distracted by the fact that we still call initLogging unconditionally when importing ocrd_utils.logging.)

Yes, the idea was to behave unintrusively as a library but as you point out, in practice we don't

BTW, IMO most places (core and modules code) can get their loggers from Python stdlib's logging.getLogger instead of our ocrd_utils.logging.getLogger, because we don't add anything to that nowadays.

Agreed.

This said, the ocrd_network logging acts weirdly and I am not able to find out how to fix it. I did some experiments and observations that I am going to share in next Monday's meeting.

🙏

Let's discuss on Monday, I'm all for pushing for a clean logging because that is high on the list of most painful aspects of the OCR-D software at the moment.

@bertsky
Copy link
Collaborator Author

bertsky commented Oct 10, 2024

@MehmedGIT

But if initLogging is not invoked then the logging is (maybe it's a was now, not sure) not working at all when core code is imported in another project say Operandi.

What do you mean not working at all? Operandi can have its own logging.basicConfig or logging.config.fileConfig or whatever. All OCR-D components simply provide their messages on their respective loggers, the application decides what to do with them (creating handlers and setting levels).

If the application is a standalone OCR-D CLI (→ocrd_utils.initLogging) though, then no other components (think Tensorflow) should interfere with logging setup (i.e. handlers+levels). Hence this PR.

Ideally, the behavior of the root logger, ocrd and ocrd_network should not be affected by the default logging config file if the user does not do it specifically with a manual logging config file. Otherwise, it bugs programmatically set file handlers and log levels (in ocrd_network at least).

Sry, I don't understand that at all.

@bertsky
Copy link
Collaborator Author

bertsky commented Oct 10, 2024

@kba

(Perhaps at that time we got distracted by the fact that we still call initLogging unconditionally when importing ocrd_utils.logging.)

Yes, the idea was to behave unintrusively as a library but as you point out, in practice we don't

Actually, I was trying to point out that we do.

We do not call initLogging at module-import time anymore. Which is correct IMO. But then we can add a handler to the root logger when we do call it.

@MehmedGIT
Copy link
Contributor

What do you mean not working at all?

Not logging anything, and if initLogging is not called, the initialization phase of the Operandi Server just freezes.

Operandi can have its own logging.basicConfig or logging.config.fileConfig or whatever.

Yes, but there is some kind of interference because of the ocrd_logging.conf coming from the core. I have to manually provide ocrd_logging.conf to avoid some side effects but still not completely.

I am talking about the experience I had in the past. Maybe these issues are no longer there. I am not sure, because I have my hack to reconfigure the loggers at run-time after the initLogging call to prevent any undesired side effects.

Sry, I don't understand that at all.

I was talking about the ocrd_logging.conf file inside ocrd_utils.

@bertsky
Copy link
Collaborator Author

bertsky commented Oct 10, 2024

Not logging anything, and if initLogging is not called, the initialization phase of the Operandi Server just freezes.

I will try to unwrap: Operandi on startup calls ocrd_utils.initLogging. So you are saying that

  1. without ocrd_utils.initLogging, nothing gets logged at all.
    I would assume that Python will automatically do a logging.basicConfig – unless perhaps some rogue library you are using already inits logging (like our Tensorflow example). So you should normally at least get warnings. Have you tried instrumenting to see whether getLogger('').handlers shows anything?
  2. without ocrd_utils.initLogging, the server (i.e. probably on_startup_event) blocks.
    That's odd. What does the stack trace show in that state?

Also, have you tried calling logging.basicConfig or logging.config.fileConfig instead of the ocrd_utils version?

Yes, but there is some kind of interference because of the ocrd_logging.conf coming from the core. I have to manually provide ocrd_logging.conf to avoid some side effects but still not completely.

I don't understand how our package data log config can interfere at all, unless ocrd_utils.initLogging is called.

It sounds like what you describe has been observed when core would still do initLogging at import time (which was clearly an antipattern and it does not anymore).

I am talking about the experience I had in the past. Maybe these issues are no longer there.

I would hope so. Could you please try it out – perhaps even with and without this change here?

I am not sure, because I have my hack to reconfigure the loggers at run-time after the initLogging call to prevent any undesired side effects.

Ok, that's quite involved (I don't fully understand the InterceptHandler).

Side effects being handlers and levels you don't want? (So it could also be the same issue as above?)

Ideally, the behavior of the root logger, ocrd and ocrd_network should not be affected by the default logging config file if the user does not do it specifically with a manual logging config file. Otherwise, it bugs programmatically set file handlers and log levels (in ocrd_network at least).
(I was talking about the ocrd_logging.conf file inside ocrd_utils.)

So you were merely saying that an optional file config should not interfere with programmatic setup?

I agree – and that's what initLogging in its current form seems to achieve. (Just that the programmatic setup is missing the root logger, hence this PR.)

@MehmedGIT
Copy link
Contributor

MehmedGIT commented Oct 10, 2024

I don't understand how our package data log config can interfere at all, unless ocrd_utils.initLogging is called.

Because it has to be called when using core methods, and there is no workaround to avoid it (may no longer be valid).

It sounds like what you describe has been observed when core would still do initLogging at import time (which was clearly an antipattern and it does not anymore).

That may be the case, yes. I will have to verify again.

Side effects being handlers and levels you don't want? (So it could also be the same issue as above?)

Yes, absolutely. Take for instance just core (the latest version) - the ocrd_network.ProcessingWorker. From observations - the log level in the constructor seems to be ERROR or WARNING (not INFO) since I can only see messages with that log level forwarded to the logging file of the specific worker. However, the log level of ocrd_network.ProcessingWorker.publish_result_to_all method is INFO. Because these are normally forwarded to the log file. So when the default ocrd_logging.conf is used, although it sets ocrd_network logger level to INFO, it is not INFO everywhere I would expect it to be. The worst part is that I am not sure why.

The only place where everything is logged as expected to a file is the ocrd_network.process_helpers.invoke_processor() because of a new context manager, check here.

So you were merely saying that an optional file config should not interfere with programmatic setup?

Yes. It is not that optional in the end since core provides a default for you and the only way to avoid that is to provide a manual logging configuration file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants