Add Lint/Casecmp Cop #9753

parkerfinch · 2021-04-29T23:00:36Z

Background

There has been continuous confusion around the Performance/Casecmp cop.

Right now the Performance/Casecmp cop suggests changing case-insensitive string comparisons from e.g. str.downcase == 'foo' to str.casecmp('foo').zero?.

A problem is that these are not necessarily equivalent, since #casecmp works only on ASCII characters.

# Note that this has a "LATIN SMALL LETTER SHARP S"
str1 = 'straße'

# And this one has a "LATIN CAPITAL LETTER SHARP S"
str2 = 'STRAẞE'

# String#casecmp fails to compare them correctly
str1.casecmp(str2).zero?
=> false

# String#downcase does compare them correctly
str1.downcase == str2.downcase
=> true

This is a known issue, which is why the Performance/Casecmp cop was marked as unsafe.

But there's another issue at play here, with String#downcase and String#upcase not round-tripping successfully and giving incorrect results for some unicode characters.

str1.downcase == str1.upcase.downcase
=> false

The String#casecmp? method performs case folding and compares the strings "correctly":

str1.casecmp?(str1.upcase.downcase)
=> true

A downside of the #casecmp? method is that it is slower than the #lower or the #casecmp approaches, which is why it hasn't been adopted in the performance cop. However, it does seem to be more correct, which is why I agree with @zverok that it should be a Linting cop.

Describe the solution you'd like

I would like to adopt the approach that @zverok suggested here and, instead of having a performance cop, make this a linting cop.

We can separately remove the performance cop (since it changes a program and would conflict with this one). (I'm happy to do that over in rubocop-performance if it sounds good).

We can create a new cop (called Lint/Casecmp, or maybe Lint/CasecmpP to prevent name clashes with the Performance cop) to suggest using casecmp? instead of the #downcase or #upcase approaches.

Let me know if this sounds like a good approach! I am happy to make a PR for this if it will move forward.

Describe alternatives you've considered

One alternative would be to consider casecmp(...).zero? a violation and suggest that it is changed to casecmp?(...). This could lead to confusing conflicts if people have a version of rubocop-performance with the Performance/Casecmp cop and the new Lint/Casecmp cop, so I think it makes sense to consider casecmp(...).zero? to be acceptable. (It's also the fastest approach when dealing with ASCII-only text, so there are many valid use cases for it.)

The text was updated successfully, but these errors were encountered:

parkerfinch mentioned this issue May 5, 2021

Retire Performance/Casecmp cop rubocop/rubocop-performance#240

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Lint/Casecmp Cop #9753

Add Lint/Casecmp Cop #9753

parkerfinch commented Apr 29, 2021

Add Lint/Casecmp Cop #9753

Add Lint/Casecmp Cop #9753

Comments

parkerfinch commented Apr 29, 2021

Background

Describe the solution you'd like

Describe alternatives you've considered