Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic Humanize and describe_multi Bug Fix #997

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

anishnya
Copy link
Member

Pull Request Checklist

Thank you for taking the time to improve Arrow! Before submitting your pull request, please check all appropriate boxes:

  • 🧪 Added tests for changed code.
  • 🛠️ All tests pass when run locally (run tox or make test to find out!).
  • 🧹 All linting checks pass when run locally (run tox -e lint or make lint to find out!).
  • 📚 Updated documentation for changed code.
  • ⏩ Code is up-to-date with the master branch.

If you have any questions about your code changes or any of the points above, please submit your questions along with the pull request and we will try our best to help!

Description of Changes

Closes #996 and #973. I'll add the documentation for dynamic humanize after we make sure we like the approach we have here. Also, I'm linking #983 since I had to make some change to describe_multi that will be useful info for the locale implementation guide.

@anishnya anishnya changed the title Added dynamic humanize support and fix a big related to humanize Dynamic Humanize and describe_multi Bug Fix Jun 27, 2021
@codecov
Copy link

codecov bot commented Jun 27, 2021

Codecov Report

Merging #997 (995ee01) into master (baebfff) will not change coverage.
The diff coverage is 100.00%.

❗ Current head 995ee01 differs from pull request most recent head e3a7f93. Consider uploading reports for the commit e3a7f93 to get more accurate results
Impacted file tree graph

@@            Coverage Diff            @@
##            master      #997   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           10        10           
  Lines         2223      2164   -59     
  Branches       350       345    -5     
=========================================
- Hits          2223      2164   -59     
Impacted Files Coverage Δ
arrow/arrow.py 100.00% <100.00%> (ø)
arrow/locales.py 100.00% <100.00%> (ø)
arrow/factory.py 100.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update baebfff...e3a7f93. Read the comment docs.

@@ -1122,6 +1122,7 @@ def humanize(
locale: str = DEFAULT_LOCALE,
only_distance: bool = False,
granularity: Union[_GRANULARITY, List[_GRANULARITY]] = "auto",
dynamic: bool = False,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this False by default to avoid a breaking change? That's understandable, but also disappointing since to me, the dynamic behaviour seems much more useful than outputting a bunch of zeros for units.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarkKoz, yes we left as False by default in order to avoid changing the behaviour of humanize drastically. I agree that it would make more sense to leave it as True by default however. @jadchaar @krisfremen @systemcatch what are your thoughts?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the least, I hope changing this can be considered for the next major release (assuming you're following SemVer).

Unrelated: dynamic isn't a good name — it's vague and non-self-descriptive. omit_zeros or something similar would be clearer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should leave it as False for the time being, do a warning for changing behavior and change it to default True after a few versions.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds reasonable. Any thoughts on my name suggestion?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Errr throwing out a few ideas for the name, only_natural, minimal, drop_zeros. omit_zeros is fine as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think omit_zeros is probably the best name

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I think the dynamic naming is fine, as IMO it indicates it will use dynamically any of the granularity fields that are specified, as the time progresses or is shifted.

Although, omit_zeroes is a nice alternative name, I would prefer dynamic.

MarkKoz added a commit to MarkKoz/arrow that referenced this pull request Aug 8, 2021
It's more convenient to pass a simple short string than to pass a list
of 7 strings.

Resolve arrow-py#997
arrow/arrow.py Outdated
Comment on lines 1301 to 1305
if not timeframes and dynamic:
raise ValueError(
"All provided granulairty values produced an output of zero. "
"Consider using smaller granularities, or set the dynamic flag to False. "
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about defaulting to "just now" rather than raising an exception? If you imagine this being used with user input, it would pretty much be a requirement to always wrap it in a try-except due to this being raised.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, you misspelled granularity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A user could have a delta of let's say 2 days but only have the granularity of ["year, "month", "week"]. If they had dynamic on, it would output "just now." I think it would be a better idea to error out, then to give an inaccurate answer in that scenario.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it doesn't convert in that case e.g. 2 days = 2/7ths of a week? That's a good point then.

Will this still raise the exception if all granularities are provided, but all values are 0, or will it actually display "just now" in that case? I think it should be able to do that.

Copy link

@MarkKoz MarkKoz Aug 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, it could show "just now" if all units evaluate to zero, regardless of which granularities are provided, not just if all are provided.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though that would be inconsistent with the behaviour of e.g. granularity="year" returning '0 years' rather than 'just now'. On the other hand, it seems like it raising an error in those cases should be avoidable somehow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our main goal is that if you provide a granularity, you expect the output to contain said granularity (with the exception of the omit zeros/dynamic functionality). Trying to figure out whether we should or shouldn't adapt the output to include some other unit seems unnecessary when we already have the auto function in humanize.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raising an error absolves arrow of having to make the tough decision of how to handle this. Letting the user handle the error gives them flexibility, but not all users might prefer that at the cost of having to practically always handle this error if they're dealing with unknown inputs.

It's a matter of which use case is more common: wanting custom behaviour to handle this edge case, or wanting to not have to think about it. Either way, the user can anticipate what the result will be by subtracting the times and manually inspecting the delta before calling humanize. If they see the delta will result in all zeros, they can handle it instead of relying on the default behaviour proposed below. Of course, that's not as convenient as just catching an exception, but I don't see a way to make both sides happy.


The most consistent solution may be to return zero in the smallest unit of the given granularity. This would ensure that while dynamic=True may omit some units in the given granularity, it will never introduce new units. Consider

>>> a = arrow.get(2021, 8, 8)
>>> b = arrow.get(2021, 8, 10)
>>> a.humanize(b, granularity=["year", "month", "week"])
'0 years 0 months and 0 weeks ago'

It has no problem omitting the "2 days" even though it's the only non-zero unit. This is arguably not very useful, but it's what the current behaviour is. There are probably use cases that need to strictly follow the granularity, and those users appreciate this behaviour. Anyway, following from this behaviour, it should then also be acceptable for this to happen

>>> a = arrow.get(2021, 8, 8)
>>> b = arrow.get(2021, 8, 10)
>>> a.humanize(b, granularity=["year", "month", "week"], dynamic=True)
'0 weeks ago'

If the user has dynamic on, that is an expression of an intent to cut down on the zeros in the output. I'd say it's more practical to make a compromise to return 1 zero than to take a strict stance of "must have no zeros" and be forced to raise an exception.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can only do so much, arrow in the end is a library meant to work together with the dev, not think or do things the dev might not be aware of and does behind the scenes without awareness and not raise an exception that the dev might even be expecting to see raised.

Comment on lines +1270 to +1272
if dynamic and trunc(abs(value)) == 0:
pass
elif trunc(abs(value)) != 1:
Copy link

@MarkKoz MarkKoz Aug 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make any significant difference to save the value of trunc(abs(value)) rather than calculating it twice? This could also be said for the other parts of the diff that use trunc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decent catch.

Profiling the code, you'd need to run about 10k of the trunc(abs()) calls to even come close to seeing a 1ms difference.

Copy link
Member

@krisfremen krisfremen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work so far, I want to have another pass at it as I ran upon #1019 while testing.

Cheers!

arrow/arrow.py Outdated Show resolved Hide resolved
Comment on lines +1270 to +1272
if dynamic and trunc(abs(value)) == 0:
pass
elif trunc(abs(value)) != 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decent catch.

Profiling the code, you'd need to run about 10k of the trunc(abs()) calls to even come close to seeing a 1ms difference.

arrow/arrow.py Outdated
Comment on lines 1301 to 1305
if not timeframes and dynamic:
raise ValueError(
"All provided granulairty values produced an output of zero. "
"Consider using smaller granularities, or set the dynamic flag to False. "
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can only do so much, arrow in the end is a library meant to work together with the dev, not think or do things the dev might not be aware of and does behind the scenes without awareness and not raise an exception that the dev might even be expecting to see raised.

@@ -1122,6 +1122,7 @@ def humanize(
locale: str = DEFAULT_LOCALE,
only_distance: bool = False,
granularity: Union[_GRANULARITY, List[_GRANULARITY]] = "auto",
dynamic: bool = False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I think the dynamic naming is fine, as IMO it indicates it will use dynamically any of the granularity fields that are specified, as the time progresses or is shifted.

Although, omit_zeroes is a nice alternative name, I would prefer dynamic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Humanize Format Relative Error For Multiple Granularities
4 participants