Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: Better error control in the validation of docstrings #57879

Merged

Conversation

datapythonista
Copy link
Member

Making the validation of docstrings more robust, main changes:

  • If we ignore an error that doesn't fail, the CI will report it and break (there were 30 errors being ignored that are already fixed and removed from the list of ignores here)
  • Instead of specifying all the errors to validate (that are all but one now) we can specify errors to skip (there is one only that we skip, the lack of an extended summary in a docstring). This simplifies the script a bit
  • In case an unknown error code is used when ignoring errors, the message should be more descriptive
  • I created an alias for --ignore_errors as -i, so the list of errors to ignore is a bit easier to read

CC: @dontgoto @jordan-d-murphy

@datapythonista datapythonista added Docs CI Continuous Integration labels Mar 18, 2024
@jordan-d-murphy
Copy link
Contributor

Nice work! thank you @datapythonista

@jordan-d-murphy
Copy link
Contributor

jordan-d-murphy commented Mar 18, 2024

with this ticket (#57879), and now that CI: speedup docstring check consecutive runs #57826 is merged in, I'm closing the following issues And opening a new issue to address these based on the new approach we've implemented.

DOC: fix GL08 errors in docstrings
DOC: fix PR01 errors in docstrings
DOC: fix PR07 errors in docstrings
DOC: fix SA01 errors in docstrings
DOC: fix RT03 errors in docstrings
DOC: fix PR02 errors in docstrings

Thanks for all the work that's gone into this! this is a much cleaner approach, and fixing these will now be more straightforward. Big win in my opinion!

@datapythonista
Copy link
Member Author

And opening a new issue to address these based on the new approach we've implemented.

What I'd do is create a master issue if there is not one already to fix the docstrings, and then create smaller issues labelled as "good first issue". For example:

Issue 1 to address:

        -i pandas.Categorical.__array__ SA01\
        -i pandas.Categorical.codes SA01\
        -i pandas.Categorical.dtype SA01\
        -i pandas.Categorical.from_codes SA01\
        -i pandas.Categorical.ordered SA01\
        -i pandas.CategoricalDtype.categories SA01\
        -i pandas.CategoricalDtype.ordered SA01\
        -i pandas.CategoricalIndex.codes SA01\
        -i pandas.CategoricalIndex.ordered SA01\

Issue 2 to address:

        -i pandas.HDFStore.append PR01,SA01\
        -i pandas.HDFStore.get SA01\
        -i pandas.HDFStore.groups SA01\
        -i pandas.HDFStore.info RT03,SA01\
        -i pandas.HDFStore.keys SA01\
        -i pandas.HDFStore.put PR01,SA01\
        -i pandas.HDFStore.select SA01\
        -i pandas.HDFStore.walk SA01\

Issue 3 to address:

        -i pandas.Int16Dtype SA01\
        -i pandas.Int32Dtype SA01\
        -i pandas.Int64Dtype SA01\
        -i pandas.Int8Dtype SA01\

Issue 4 to address:

        -i pandas.Interval PR02\
        -i pandas.Interval.closed SA01\
        -i pandas.Interval.left SA01\
        -i pandas.Interval.mid SA01\
        -i pandas.Interval.right SA01\

...

I think it'll make the work of contributors easier by addressing those in groups. In particular, the see also section of many of those would be quite easy since the docstrings they'll be cross-referencing each other in many cases.

If you don't have triagge permissions in this repo, please let me know, I'll give them to you, so you can labelled the issues as "good first issue" and anything else needed.

@jordan-d-murphy
Copy link
Contributor

Thanks for the guidance, @datapythonista !

I agree, that sounds like a great approach. I'll set it up once this gets merged in so I can grab the updated code snippets from main.

I don't have those permissions, it would be helpful if you can grant them to me. Thank you!

@dontgoto
Copy link
Contributor

That's some great simplifications for the error handling logic!

Copy link
Member Author

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mroeschke if you have time, do you mind having a look at this? We changed how we ignore the pending docstring errors, both in #57826 and here again. And PRs fixing docstrings are conflicting, and they'll conflict again after this one. So it'd be good to merge this as soon as reasonable so contributors need to fix the conflicts once.

ignore_deprecated=False,
ignore_errors=None,
)
assert exit_status == 0
assert exit_status == 3
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When calling the script for a single function, until now it always returned an exit status of 0 even when there were errors. We don't really check this status anywhere right now, but I think it makes more sense that it also returns the number of errors, as we do when we call the script for all functions.

This is why I the exit status needs to be changed here.

assert exit_status == 2*2
assert exit_status_ignore_func == exit_status - 1
# two functions * two not global ignored errors - one function ignored error
assert exit_status == 2 * 2 - 1
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diff of all this part seems a bit complex, but I just reordered the two calls since the test was a bit difficult to read before, as it was calling the two functions first, and then asserting the exit codes in the reverse order as they were being called. There is not change in logic other than replacing the error parameter with ignore_errors as in the rest.

if raw_ignore_errors:
for obj_name, error_codes in raw_ignore_errors:
# function errors "pandas.Series PR01,SA01"
if obj_name != "*":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just use a separate flag for ignoring all errors for a specific code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's simpler this way. In a follow up PR I'll try to remove the star. So, we'll be able to simply use --ignore-errors PR01. I guess what you don't like is the star?

And I may use None for the key when the error should always be ignored. But since this PR became already too big, I preferred not to also edit the argparse here.

What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the *, while commonplace in other tools, is still a little more opaque to me than --ignore-all CODE or similar.

Not a blocker to me, but would be nice to consider in a followup

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally I like this approach as well. Just a few comments

@mroeschke mroeschke added this to the 3.0 milestone Mar 19, 2024
@mroeschke mroeschke merged commit 37b9303 into pandas-dev:main Mar 19, 2024
46 of 47 checks passed
@mroeschke
Copy link
Member

Thanks @datapythonista

@jordan-d-murphy
Copy link
Contributor

Opened DOC: Enforce Numpy Docstring Validation (Parent Issue) #58063 as a parent issue for fixing docstrings based on the refactoring in code_checks.sh

Feel free to swing by and help out! 🙂

pmhatre1 pushed a commit to pmhatre1/pandas-pmhatre1 that referenced this pull request May 7, 2024
…57879)

* CI: Better error control in the validation of docstrings

* Fix CI errors

* Fixing tests

* Update scripts/validate_docstrings.py

Co-authored-by: Matthew Roeschke <[email protected]>

---------

Co-authored-by: Matthew Roeschke <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration Docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants