Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reports #314

Merged
merged 22 commits into from
Oct 20, 2022
Merged

Reports #314

merged 22 commits into from
Oct 20, 2022

Conversation

Atticus1806
Copy link
Contributor

WIP PR to enable (mailed) reports for experiments similar to what MT already has/had.

Even though this is still in development feel free to comment on stuff.

@Atticus1806
Copy link
Contributor Author

So I had some more looks at the old code and I think with proper input and some typing this looks fine.
Also added an example report, but this will most likely be in some private user folder but for now if someone wants to test it.

I will most likely do some more changes on monday and then I will mark this as ready I think, at least for a 1.0.

@Atticus1806
Copy link
Contributor Author

Okay lets for now start discussing details on this via review.
Open question:
Should the example report format stay there? Should it be moved somewhere? Is this entirely user dependent (so in i6_experiments/user/...)?

@Atticus1806 Atticus1806 marked this pull request as ready for review September 26, 2022 07:29
@Atticus1806 Atticus1806 changed the title WIP: Reports Reports Sep 26, 2022
report/report.py Outdated
self,
name: str,
report: _Report_Type,
report_dir: tk.Path = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional[tk.Path]?

report/report.py Outdated
name: str,
report: _Report_Type,
report_dir: tk.Path = None,
report_format: Callable[[_Report_Type], str] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional[tk.Path]?

report/report.py Outdated
import time
import json

_Report_Type = Dict[str, Union[tk.Path, any]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any?

and import from typing? same as Dict?!

report/report.py Outdated
if self.report_format is None:
self.report_format = pprint.pformat

with gzip.open(str(self.out_report), "w") as f:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.out_report.get_path()

report/report.py Outdated
self.report_format = report_format
self.recipe_path = recipe_path

self.out_report = self.output_path("report.gz")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idea: maybe the job could have a flag to either compress the report file or not..

report/report.py Outdated Show resolved Hide resolved
Copy link
Contributor

@michelwi michelwi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit unsure about the tradeoff generalizability vs. nice reports you are doing here.

In my expectation, a Job like this should have two purposes
a) notify me via mail that a specific output is finished so that I can take further action.
b) create a nice table that collects the results

We could maybe even split the functionality: one job that creates a Report and a second job that sends mail notifications (and optionally attaches the output)

To create the report we should discuss the interface.. I don't like the current one with a dict that contains many mandatory keys that are ignored later for printing.
Do we want to hardcode a 2D table? Or a nD table with 0<=n<=2?
Then have 0, 1, or 2 vectors with table headings and an Array with values?
Maybe even support for automatic ascii table formatting using |---+---| etc..?

report/report.py Outdated Show resolved Hide resolved
report/report.py Outdated Show resolved Hide resolved
report/report.py Outdated Show resolved Hide resolved
report/report.py Outdated Show resolved Hide resolved
report/report.py Outdated Show resolved Hide resolved
report/report.py Outdated Show resolved Hide resolved
report/report.py Outdated Show resolved Hide resolved
report/report.py Outdated
import time
import json

_Report_Type = Dict[str, Union[tk.Path, any]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Union[tk.Path, any] really coveres anything, but did you mean tk.Variable instead of tk.Path? or maybe both, e.g. tk.AbstractPath?

Then if Variable is used, I don't see any more where get() is called. Should this be handeled in the report_format function? is pprint.pformat then a save default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is where things (might) get ugly. In the end the way the job was structured from where I copied it it really depends on what your report_format function is doing. The usual input would maybe be something like Union[tk.Variable, tk.Path, str] or similar, but you could think of arbitrary things here.
Actually this is where also other information like the keys come in, if the report makes use of them they are required.
The old jobs had a bunch of different formats and even more than one ReportJob (thats why it was named simple beforehand). The more I think about it maybe I will just remove a lot of stuff I deem irrelevant and then make more restrictions on the format of the report_format.

Then if Variable is used, I don't see any more where get() is called.

This cost me like 2 hours and its hidden in the call of str() and repr() that are done within the reports (or by pprint).
rwth-i6/sisyphus#104 this might even break things. But this just as a sidenote.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I think this is something that the report_format should handle, since this might be really user specific what the input is. So we could completely split it and say the ReportJob only gets a string it mails (then its just MailJob?) but I like that global information like the config_path and sis_command_line are actually set by the job since otherwise thats something the user would always have to deal with. And I am not sure if passing a string would require a job to produce this list since otherwise I think we would just end up with a lot of "UnifinishedVariable" in the report.

@Atticus1806
Copy link
Contributor Author

If I didn't miss anything all comments should be addressed in one way or the other. I again changed the structure of the job a bit more liberal this time to make it more simple.
The main idea now is:
There are a few "basic" informations like date, user, name etc. that are set by the job in addition to what the user gives into the job. The report format is now mandatory and should be able to handle str and tk.AbstractPath inputs. Any kwargs can be passed by job kwargs.
Again this is a more radical change so feel free to discuss changes.
The example will be moved/removed to i6_experiments once this PR is "done" as a last step, this is just to keep an example (and potential changes to it after changes in the job) easily accessible for now.

Copy link
Contributor

@michelwi michelwi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example will be moved/removed to i6_experiments once this PR is "done" as a last step, this is just to keep an example (and potential changes to it after changes in the job) easily accessible for now.

agreed.

The report format is now mandatory and should be able to handle str and tk.AbstractPath inputs.

I am not overly happy with the current interface between the report input and the report_format function. Maybe we can discuss this once other people have voiced their opinion.

report/report.py Outdated
self.compress = compress

self.sis_command_line = sys.argv
self.mail_address = getattr(gs, "MAIL_ADDRESS", None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find any MAIL_ADDRESS in sisyphus/global_settings.py, is this defined somewhere else or did you make it up?

We want to get away from defining recipe-specific settings in the sisyphus settings. especially if they change how a job behaves. So maybe make the mail address an input to the job?

Actually, I grow more and more fond of the Idea of a dedicated MailJob.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me this is part of my settings.py ever since I got my first "copy" of a sis setup, I thought this would be standard 😅
But this can be solved without problems I think. It just calls getpass.getuser() so this could be a default in a parameter.
I am still not sure how a MailJob would interact with just sisyphus variables etc. but I will write a quick draft of a job as proof of concept and push it here so we have something to compare to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also just having a MailJob would not hurt. So even if we later decide to keep this interface like it is right now, we can just also add a MailJob.

report/report.py Outdated
Comment on lines 59 to 67
if self.compress:
with gzip.open(self.out_report.get_path(), "w") as f:
f.write(
self.report_format(report, **self.report_format_kwargs).encode()
+ b"\n"
)
else:
with open(self.out_report.get_path(), "w") as f:
f.write(self.report_format(report, **self.report_format_kwargs) + "\n")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it intentional that one branch encodes the string and the other doesn't? Is it required by gzip.open vs open?
Or is it a problem of specifying mode "wt" vs "wb" explicitly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tbh. I dont know. The binary variant is copied from the old code and the non compressed version was added by me. But from my understanding (which might be off by a longshot, I dont have much experience with this) I think "wb" expects some kind of binary input and does not convert "internally".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see https://docs.python.org/3/library/functions.html#open

[mode] defaults to 'r' which means open for reading in text mode.

and https://docs.python.org/3/library/gzip.html

The mode argument can be any of 'r', 'rb', 'a', 'ab', 'w', 'wb', 'x' or 'xb' for binary mode, or 'rt', 'at', 'wt', or 'xt' for text mode. The default is 'rb'.

so 'w' for builtin open defaults to text mode and 'w' for gzip.open defaults to binary. I suggest to decide which one you want and then use 'wb' or 'wt' consistently.

@Atticus1806
Copy link
Contributor Author

Atticus1806 commented Sep 27, 2022

So I added a MailJob @michelwi. Is this how you intended it? If the "wait_on" is not used it produces "Unfinished Variables" and I think this is integral to the job so I think its fine to set inside the job (and not have the user have to think of setting it).
My code right now would now look like this (examplatory):

waits = list(scores.values())
content = gmm_example_report_format(scores)
report = MailJob(subject=alias_prefix, content=content, mail_address=getpass.getuser(), wait_on=waits)
tk.register_output(f"reports/{alias_prefix}", report.out_content)

Tested it with both compress on and off.
This would work for me, so the other Job could be removed (or kept if people prefer to have another option, both fine with me)

Edit I think there are cases where this does not work.

elif callable(self.report_values):
report = str(self.report_values())
else:
report = pprint.pformat(self.report_values, width=140)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about this if...elif...else clause.

why should the report_values be callable?? it is only of type Dict?!

the check on report_template could be restricted to None or even a callable..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The typing was wrong. So this can be callable (similar to how sisyphus register report has it as a possibilty) , for example some function that returns some string (interpretable) output. You cannot pass the string directly because you dont want to generate the string at graph construction different reasons.
This is quite close to just passing a format function but since sisyphus supported it we decided to also do this here.

report/report.py Outdated Show resolved Hide resolved
report/report.py Outdated Show resolved Hide resolved
report/report.py Outdated
def run(self):

if self.report_template:
report = self.report_template(self.report_values)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice if report_values and report_template follows the same interface as register_report in sisyphus. This allows to reuse existing "sisyphus reports" with these jobs to send mails.
In sisyphus report_template is not a callable but a format string (see https://github.com/rwth-i6/sisyphus/blob/032239c4f473742b96178f9614521324ac0472f7/sisyphus/graph.py#L187).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to support both, but added the option if the type of the report_template is str to be used as format string.

report/report.py Outdated

if self.send_contents:
value = self.sh(
f"zcat -f {self.result.get_path()} | mail -s '{subject}' {self.mail_address}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid using self.sh here in case this ever gets part of something automatic, as it allows for random shell execution.

Some like (this is random code):

p1 = subprocess.Popen(["zcat", "-f", self.result.get_path()], stdout=subprocess.PIPE)
value = subprocess.check_output(["mail", "-s", subject, self.mail_address], stdin=p1.stdout)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated it, but didnt test it yet. Does this also work like this when send_contents is False? Haven't used much of subprocess yet, so just tried to "copy" it from the other case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not directly, but with the newer subprocess.run method you can directly provide a string for stdin. It will internally use Popen.communicate.

report/report.py Show resolved Hide resolved
report/report.py Outdated

if self.send_contents:
value = self.sh(
f"zcat -f {self.result.get_path()} | mail -s '{subject}' {self.mail_address}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not directly, but with the newer subprocess.run method you can directly provide a string for stdin. It will internally use Popen.communicate.

@Atticus1806
Copy link
Contributor Author

Okay applied the comments and changes. So again requesting for feedback or approval.

Copy link
Contributor

@JackTemaki JackTemaki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly about added explanation/documentation, otherwise only two minor remarks.

report/report.py Show resolved Hide resolved
report/report.py Outdated Show resolved Hide resolved
report/report.py Outdated Show resolved Hide resolved
@Atticus1806
Copy link
Contributor Author

Comments are addressed now

Copy link
Contributor

@JackTemaki JackTemaki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If everything works as expected it looks good to me

@JackTemaki JackTemaki merged commit e904b98 into main Oct 20, 2022
@JackTemaki JackTemaki deleted the bene_report branch October 26, 2022 10:19
christophmluscher added a commit to christophmluscher/i6_core that referenced this pull request Nov 1, 2022
Co-authored-by: Christoph M. Lüscher <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants