Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instantiation of custom class leads to wrong defaults #460

Open
awaelchli opened this issue Feb 28, 2024 · 13 comments
Open

Instantiation of custom class leads to wrong defaults #460

awaelchli opened this issue Feb 28, 2024 · 13 comments
Labels
enhancement New feature or request

Comments

@awaelchli
Copy link

🐛 Bug report

When instantiating objects from a custom class where a default is specified in the signature, jsonargparse infers the defaults from the the wrong class:

To reproduce

from jsonargparse import CLI


class Super:
    pass


class Default(Super):
    def __init__(self, foo: str = "foo", bar: str = "bar"):
        self.foo = foo
        self.bar = bar


class Other(Super):
    def __init__(self, foo: str = "foo", bar: str = "baaaaar"):
        self.foo = foo
        self.bar = bar


def main(data: Super = Default()):
    print(data.foo)
    print(data.bar)


if __name__ == "__main__":
    CLI(main)
python repro.py --data Other --data.foo test

Output:

test
bar

Expected behavior

I expect the output to be

test
baaaaar

because I selected --data Other, so it should use the defaults from the Other class.

Environment

  • jsonargparse version (e.g., 4.8.0): 4.27.5
  • Python version (e.g., 3.9): 3.10
  • How jsonargparse was installed (e.g. pip install jsonargparse[all]): pip install -U jsonargparse[signatures]
  • OS (e.g., Linux): MacOS and Linux
@awaelchli awaelchli added the bug Something isn't working label Feb 28, 2024
@awaelchli
Copy link
Author

awaelchli commented Feb 28, 2024

A workaround for this is to define main like this:

def main(data: Optional[Super] = None):
    if data is None:
        data = Default()

This might be better practice in general.

@mauvilsa
Copy link
Member

This is not a bug. The current behavior is the result of many discussions about use cases for overriding arguments from command line. Currently I don't have at hand links to the most relevant issues. But the rationale is the following.

Base classes and subclasses can have many parameters/settings, and several subclasses can share parameters. If every time a user specifies a different class all the defaults are reset, then every time the user is forced to give all parameters they want. So, the behavior was decided to be that only what gets specified is changed. If a new class_path is specified, then only the class_path is changed, not the init_args that were set at that moment. If the new class does not accept some parameters that are in the current init_args, then these get dropped. But the shared parameters are kept, and they retain the current value. No reset to default.

Maybe it is easier to understand with actual use cases where it can be more obvious why this behavior make sense. Say someone implements multiple models with a common base class. All models have optimizer and scheduler as parameters, that means a complex nested subclass in each of them. A user might want to try out all models but keeping fixed optimizer and scheduler. This can be done with a base config, say base.yaml that includes common settings for optimizer and scheduler and possibly other parameters to keep fixed for all models. Then do:

cli.py fit --config=base.yaml --model=Model1
cli.py fit --config=base.yaml --model=Model2
...
cli.py fit --config=base.yaml --model=ModelN

If the argument --model=ModelX does a reset to defaults, then what is in base.yaml is lost and needs to be specified again. This is just one use case, but there are several.

Independent of motivation, the current behavior has been requested in previous issues, and changing it would break the flow of previous projects. So, this shouldn't be changed. However, there could be a new feature. Right now an argument like --class=ClassName means change the class_path and keep common existing init_args. A new syntax could be added to signify change class_path and reset init_args to defaults. But note, even this is not clear. What are the defaults? Is it only what comes from the source code? Or also take into account changes done with set_defaults? Or also consider what is given in default_config_files? Whatever is chosen, it is not immediately obvious to the cli user what the reset means.

@awaelchli
Copy link
Author

awaelchli commented Feb 29, 2024

Thanks for the detailed response. If it has already been discussed many times in the past, I don't want to add to the pile. I just don't know how to implement the datasets via the CLI now. This really surprised me. Like having Dataset1 as the default in a config file and then training with --data Dataset2 will literally now silently use the wrong data because the input data directory is taken from Dataset1 rather than Dataset2.

@mauvilsa
Copy link
Member

I do see a need to reset to defaults in some cases, so it can be worth to discuss further. Just that we need both, the current behavior and a way to reset.

For the case of a dataset, I would probably not add a default to make it required, and add the option as_positional=False. So the user always must provide a dataset. Though, I do see why someone would want to have a default. But in your case, why is it important to have a default?

@awaelchli
Copy link
Author

Thanks for considering it and the suggestion to make it required. While making it required in the Python signature could make sense, we also want to provide config files in a folder in Lit-GPT where users can find predefined experiments to reproduce numbers. If someone were to make an ablation from these configs and use a different dataset via command line, they will run into this issue anyway (we will likely have docs examples based on these configs).

@mauvilsa
Copy link
Member

For a new reset feature two things need to be decided:

  • When a reset is done, what does it reset to?
    One option could be that it resets to the defaults of the class being reset to, which is actually the behavior @awaelchli was expecting. So the parser could have as default some DefaultClass with possibly custom default init_args. A reset to some other class would ignore everything set in the parser. That is, if reset to the same DefaultClass then it would get the defaults of that class and not what is defined in the parser. This decision is kind of tricky since whatever is decided would most likely not be obvious to users unless looked up in some docs.

  • What would be the syntax to trigger a reset for command line and in config files?
    From command line there is already special syntax to append to a list, like --key+=value. So something similar would make sense. Could be for example --key~=NewClass or --key^=NewClass, and the same replacing the = with a space. This would actually be a shorthand for --key={"class_path": "NewClass", "init_args": "~"}. To reset in a config file it could be with a special key as key~: NewClass or providing class_path and init_args~: ... to reset and modify some args.

This are some initial ideas that would be best to think through more than once. How does it sound?

@mauvilsa mauvilsa added enhancement New feature or request and removed bug Something isn't working labels Mar 10, 2024
@awaelchli
Copy link
Author

Hey @mauvilsa

what does it reset to?

I can't think of a good use case where it wouldn't reset to the class I'm setting. The confusing I was facing was related to setting the class --data.MyClass and then expecting init args to be defaulting to this class's defaults. If this was resetting to something else, that would be even more confusing IMO.

What would be the syntax to trigger a reset for command line and in config files?

A syntax for this would be nice for advanced users. Maybe even == to indicate a "strong equal" as in make it actually "equal with defaults". Tilde or hat might be too similar logical negation.

Besides this more elaborate approach, would a simple flag on the CLI to switch the behavior also be an option? For simple CLI use cases where this is universally the expected behavior (i.e. we don't expect to switch the optimizer class like described), it would be nice if we didn't have to remember to use special syntax in the CLI arguments. Similar to the as_positional flag:

from jsonargparse import CLI

CLI(main, here=True)

But I'm struggling to find an intuitive name so it might be a sign that it's not a good idea afterall.

@mauvilsa
Copy link
Member

A way to change the behavior could be an option. But I would go for a global setting instead of a parameter to jsonargparse.CLI. This is because as a parameter the implementation becomes way more complex. And it is unlikely that someone wants to have in the same python process, two parsers that behave differently.

@awaelchli
Copy link
Author

@carmocca If there was such an option in jsonargparse, would we be able to set it in litgpt's __init__ or the entry points? Do you have any concerns/comments?

@carmocca
Copy link
Contributor

No concerns. We already set some globals: https://github.com/Lightning-AI/litgpt/blob/wip/litgpt%2F__main__.py#L83-L86

@mauvilsa
Copy link
Member

Actually there is a problem with using a global. Users of pytorch-lightning expect the current behavior. If it were a global set by litgpt, it could mean that just by installing litgpt, the behavior of LightningCLI could unexpectedly change if litgpt is imported.

@awaelchli
Copy link
Author

What if we set the global flag only in our entry point scripts and not at the import level? That would be the main use case. That might be a good practice for libraries in general, as you don't want to impose custom jsonargparse behavior on others but only on your own CLI within the package.

@carmocca
Copy link
Contributor

What if we set the global flag only in our entry point scripts and not at the import level?

Unless I'm missing something, this is what we do already in the link at #460 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants