-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reports #314
Reports #314
Conversation
So I had some more looks at the old code and I think with proper input and some typing this looks fine. I will most likely do some more changes on monday and then I will mark this as ready I think, at least for a 1.0. |
Okay lets for now start discussing details on this via review. |
report/report.py
Outdated
self, | ||
name: str, | ||
report: _Report_Type, | ||
report_dir: tk.Path = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional[tk.Path]
?
report/report.py
Outdated
name: str, | ||
report: _Report_Type, | ||
report_dir: tk.Path = None, | ||
report_format: Callable[[_Report_Type], str] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional[tk.Path]
?
report/report.py
Outdated
import time | ||
import json | ||
|
||
_Report_Type = Dict[str, Union[tk.Path, any]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any
?
and import from typing? same as Dict
?!
report/report.py
Outdated
if self.report_format is None: | ||
self.report_format = pprint.pformat | ||
|
||
with gzip.open(str(self.out_report), "w") as f: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.out_report.get_path()
report/report.py
Outdated
self.report_format = report_format | ||
self.recipe_path = recipe_path | ||
|
||
self.out_report = self.output_path("report.gz") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idea: maybe the job could have a flag to either compress the report file or not..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit unsure about the tradeoff generalizability vs. nice reports you are doing here.
In my expectation, a Job like this should have two purposes
a) notify me via mail that a specific output is finished so that I can take further action.
b) create a nice table that collects the results
We could maybe even split the functionality: one job that creates a Report and a second job that sends mail notifications (and optionally attaches the output)
To create the report we should discuss the interface.. I don't like the current one with a dict that contains many mandatory keys that are ignored later for printing.
Do we want to hardcode a 2D table? Or a nD table with 0<=n<=2?
Then have 0, 1, or 2 vectors with table headings and an Array with values?
Maybe even support for automatic ascii table formatting using |---+---| etc..?
report/report.py
Outdated
import time | ||
import json | ||
|
||
_Report_Type = Dict[str, Union[tk.Path, any]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Union[tk.Path, any]
really coveres anything, but did you mean tk.Variable
instead of tk.Path
? or maybe both, e.g. tk.AbstractPath
?
Then if Variable is used, I don't see any more where get()
is called. Should this be handeled in the report_format
function? is pprint.pformat
then a save default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is where things (might) get ugly. In the end the way the job was structured from where I copied it it really depends on what your report_format
function is doing. The usual input would maybe be something like Union[tk.Variable, tk.Path, str]
or similar, but you could think of arbitrary things here.
Actually this is where also other information like the keys come in, if the report makes use of them they are required.
The old jobs had a bunch of different formats and even more than one ReportJob (thats why it was named simple beforehand). The more I think about it maybe I will just remove a lot of stuff I deem irrelevant and then make more restrictions on the format of the report_format.
Then if Variable is used, I don't see any more where get() is called.
This cost me like 2 hours and its hidden in the call of str()
and repr()
that are done within the reports (or by pprint).
rwth-i6/sisyphus#104 this might even break things. But this just as a sidenote.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I think this is something that the report_format should handle, since this might be really user specific what the input is. So we could completely split it and say the ReportJob
only gets a string it mails (then its just MailJob
?) but I like that global information like the config_path
and sis_command_line
are actually set by the job since otherwise thats something the user would always have to deal with. And I am not sure if passing a string would require a job to produce this list since otherwise I think we would just end up with a lot of "UnifinishedVariable" in the report.
If I didn't miss anything all comments should be addressed in one way or the other. I again changed the structure of the job a bit more liberal this time to make it more simple. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The example will be moved/removed to i6_experiments once this PR is "done" as a last step, this is just to keep an example (and potential changes to it after changes in the job) easily accessible for now.
agreed.
The report format is now mandatory and should be able to handle str and tk.AbstractPath inputs.
I am not overly happy with the current interface between the report
input and the report_format
function. Maybe we can discuss this once other people have voiced their opinion.
report/report.py
Outdated
self.compress = compress | ||
|
||
self.sis_command_line = sys.argv | ||
self.mail_address = getattr(gs, "MAIL_ADDRESS", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't find any MAIL_ADDRESS
in sisyphus/global_settings.py
, is this defined somewhere else or did you make it up?
We want to get away from defining recipe-specific settings in the sisyphus settings. especially if they change how a job behaves. So maybe make the mail address an input to the job?
Actually, I grow more and more fond of the Idea of a dedicated MailJob
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me this is part of my settings.py
ever since I got my first "copy" of a sis setup, I thought this would be standard 😅
But this can be solved without problems I think. It just calls getpass.getuser()
so this could be a default in a parameter.
I am still not sure how a MailJob
would interact with just sisyphus variables etc. but I will write a quick draft of a job as proof of concept and push it here so we have something to compare to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also just having a MailJob
would not hurt. So even if we later decide to keep this interface like it is right now, we can just also add a MailJob
.
report/report.py
Outdated
if self.compress: | ||
with gzip.open(self.out_report.get_path(), "w") as f: | ||
f.write( | ||
self.report_format(report, **self.report_format_kwargs).encode() | ||
+ b"\n" | ||
) | ||
else: | ||
with open(self.out_report.get_path(), "w") as f: | ||
f.write(self.report_format(report, **self.report_format_kwargs) + "\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it intentional that one branch encodes the string and the other doesn't? Is it required by gzip.open
vs open
?
Or is it a problem of specifying mode "wt" vs "wb" explicitly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tbh. I dont know. The binary variant is copied from the old code and the non compressed version was added by me. But from my understanding (which might be off by a longshot, I dont have much experience with this) I think "wb" expects some kind of binary input and does not convert "internally".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see https://docs.python.org/3/library/functions.html#open
[mode] defaults to 'r' which means open for reading in text mode.
and https://docs.python.org/3/library/gzip.html
The mode argument can be any of 'r', 'rb', 'a', 'ab', 'w', 'wb', 'x' or 'xb' for binary mode, or 'rt', 'at', 'wt', or 'xt' for text mode. The default is 'rb'.
so 'w' for builtin open
defaults to text mode and 'w' for gzip.open
defaults to binary. I suggest to decide which one you want and then use 'wb' or 'wt' consistently.
So I added a MailJob @michelwi. Is this how you intended it? If the "wait_on" is not used it produces "Unfinished Variables" and I think this is integral to the job so I think its fine to set inside the job (and not have the user have to think of setting it). waits = list(scores.values())
content = gmm_example_report_format(scores)
report = MailJob(subject=alias_prefix, content=content, mail_address=getpass.getuser(), wait_on=waits)
tk.register_output(f"reports/{alias_prefix}", report.out_content) Tested it with both compress on and off. Edit I think there are cases where this does not work. |
elif callable(self.report_values): | ||
report = str(self.report_values()) | ||
else: | ||
report = pprint.pformat(self.report_values, width=140) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about this if...elif...else clause.
why should the report_values
be callable?? it is only of type Dict
?!
the check on report_template
could be restricted to None or even a callable..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The typing was wrong. So this can be callable (similar to how sisyphus register report has it as a possibilty) , for example some function that returns some string (interpretable) output. You cannot pass the string directly because you dont want to generate the string at graph construction different reasons.
This is quite close to just passing a format function but since sisyphus supported it we decided to also do this here.
report/report.py
Outdated
def run(self): | ||
|
||
if self.report_template: | ||
report = self.report_template(self.report_values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be nice if report_values
and report_template
follows the same interface as register_report
in sisyphus. This allows to reuse existing "sisyphus reports" with these jobs to send mails.
In sisyphus report_template is not a callable but a format string (see https://github.com/rwth-i6/sisyphus/blob/032239c4f473742b96178f9614521324ac0472f7/sisyphus/graph.py#L187).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to support both, but added the option if the type of the report_template
is str to be used as format string.
report/report.py
Outdated
|
||
if self.send_contents: | ||
value = self.sh( | ||
f"zcat -f {self.result.get_path()} | mail -s '{subject}' {self.mail_address}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should avoid using self.sh
here in case this ever gets part of something automatic, as it allows for random shell execution.
Some like (this is random code):
p1 = subprocess.Popen(["zcat", "-f", self.result.get_path()], stdout=subprocess.PIPE)
value = subprocess.check_output(["mail", "-s", subject, self.mail_address], stdin=p1.stdout)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated it, but didnt test it yet. Does this also work like this when send_contents is False? Haven't used much of subprocess yet, so just tried to "copy" it from the other case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not directly, but with the newer subprocess.run
method you can directly provide a string for stdin. It will internally use Popen.communicate
.
report/report.py
Outdated
|
||
if self.send_contents: | ||
value = self.sh( | ||
f"zcat -f {self.result.get_path()} | mail -s '{subject}' {self.mail_address}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not directly, but with the newer subprocess.run
method you can directly provide a string for stdin. It will internally use Popen.communicate
.
Okay applied the comments and changes. So again requesting for feedback or approval. |
Co-authored-by: Christoph M. Lüscher <[email protected]>
Co-authored-by: Christoph M. Lüscher <[email protected]>
0ef026d
to
3b0e7cd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mostly about added explanation/documentation, otherwise only two minor remarks.
Comments are addressed now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If everything works as expected it looks good to me
Co-authored-by: Christoph M. Lüscher <[email protected]>
WIP PR to enable (mailed) reports for experiments similar to what MT already has/had.
Even though this is still in development feel free to comment on stuff.