Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Lint/Casecmp Cop #9753

Open
parkerfinch opened this issue Apr 29, 2021 · 0 comments
Open

Add Lint/Casecmp Cop #9753

parkerfinch opened this issue Apr 29, 2021 · 0 comments

Comments

@parkerfinch
Copy link
Contributor

Background

There has been continuous confusion around the Performance/Casecmp cop.

Right now the Performance/Casecmp cop suggests changing case-insensitive string comparisons from e.g. str.downcase == 'foo' to str.casecmp('foo').zero?.

A problem is that these are not necessarily equivalent, since #casecmp works only on ASCII characters.

# Note that this has a "LATIN SMALL LETTER SHARP S"
str1 = 'straße'

# And this one has a "LATIN CAPITAL LETTER SHARP S"
str2 = 'STRAẞE'

# String#casecmp fails to compare them correctly
str1.casecmp(str2).zero?
=> false

# String#downcase does compare them correctly
str1.downcase == str2.downcase
=> true

This is a known issue, which is why the Performance/Casecmp cop was marked as unsafe.

But there's another issue at play here, with String#downcase and String#upcase not round-tripping successfully and giving incorrect results for some unicode characters.

str1.downcase == str1.upcase.downcase
=> false

The String#casecmp? method performs case folding and compares the strings "correctly":

str1.casecmp?(str1.upcase.downcase)
=> true

A downside of the #casecmp? method is that it is slower than the #lower or the #casecmp approaches, which is why it hasn't been adopted in the performance cop. However, it does seem to be more correct, which is why I agree with @zverok that it should be a Linting cop.

Describe the solution you'd like

I would like to adopt the approach that @zverok suggested here and, instead of having a performance cop, make this a linting cop.

We can separately remove the performance cop (since it changes a program and would conflict with this one). (I'm happy to do that over in rubocop-performance if it sounds good).

We can create a new cop (called Lint/Casecmp, or maybe Lint/CasecmpP to prevent name clashes with the Performance cop) to suggest using casecmp? instead of the #downcase or #upcase approaches.

Let me know if this sounds like a good approach! I am happy to make a PR for this if it will move forward.

Describe alternatives you've considered

One alternative would be to consider casecmp(...).zero? a violation and suggest that it is changed to casecmp?(...). This could lead to confusing conflicts if people have a version of rubocop-performance with the Performance/Casecmp cop and the new Lint/Casecmp cop, so I think it makes sense to consider casecmp(...).zero? to be acceptable. (It's also the fastest approach when dealing with ASCII-only text, so there are many valid use cases for it.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant