You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now the Performance/Casecmp cop suggests changing case-insensitive string comparisons from e.g. str.downcase == 'foo' to str.casecmp('foo').zero?.
A problem is that these are not necessarily equivalent, since #casecmp works only on ASCII characters.
# Note that this has a "LATIN SMALL LETTER SHARP S"str1='straße'# And this one has a "LATIN CAPITAL LETTER SHARP S"str2='STRAẞE'# String#casecmp fails to compare them correctlystr1.casecmp(str2).zero?=>false# String#downcase does compare them correctlystr1.downcase == str2.downcase=>true
This is a known issue, which is why the Performance/Casecmp cop was marked as unsafe.
But there's another issue at play here, with String#downcase and String#upcase not round-tripping successfully and giving incorrect results for some unicode characters.
str1.downcase == str1.upcase.downcase=>false
The String#casecmp? method performs case folding and compares the strings "correctly":
str1.casecmp?(str1.upcase.downcase)=>true
A downside of the #casecmp? method is that it is slower than the #lower or the #casecmp approaches, which is why it hasn't been adopted in the performance cop. However, it does seem to be more correct, which is why I agree with @zverok that it should be a Linting cop.
Describe the solution you'd like
I would like to adopt the approach that @zverok suggested here and, instead of having a performance cop, make this a linting cop.
We can separately remove the performance cop (since it changes a program and would conflict with this one). (I'm happy to do that over in rubocop-performance if it sounds good).
We can create a new cop (called Lint/Casecmp, or maybe Lint/CasecmpP to prevent name clashes with the Performance cop) to suggest using casecmp? instead of the #downcase or #upcase approaches.
Let me know if this sounds like a good approach! I am happy to make a PR for this if it will move forward.
Describe alternatives you've considered
One alternative would be to consider casecmp(...).zero? a violation and suggest that it is changed to casecmp?(...). This could lead to confusing conflicts if people have a version of rubocop-performance with the Performance/Casecmp cop and the new Lint/Casecmp cop, so I think it makes sense to consider casecmp(...).zero? to be acceptable. (It's also the fastest approach when dealing with ASCII-only text, so there are many valid use cases for it.)
The text was updated successfully, but these errors were encountered:
Background
There has been continuous confusion around the Performance/Casecmp cop.
Right now the
Performance/Casecmp
cop suggests changing case-insensitive string comparisons from e.g.str.downcase == 'foo'
tostr.casecmp('foo').zero?
.A problem is that these are not necessarily equivalent, since
#casecmp
works only on ASCII characters.This is a known issue, which is why the
Performance/Casecmp
cop was marked as unsafe.But there's another issue at play here, with
String#downcase
andString#upcase
not round-tripping successfully and giving incorrect results for some unicode characters.The
String#casecmp?
method performs case folding and compares the strings "correctly":A downside of the
#casecmp?
method is that it is slower than the#lower
or the#casecmp
approaches, which is why it hasn't been adopted in the performance cop. However, it does seem to be more correct, which is why I agree with @zverok that it should be a Linting cop.Describe the solution you'd like
I would like to adopt the approach that @zverok suggested here and, instead of having a performance cop, make this a linting cop.
We can separately remove the performance cop (since it changes a program and would conflict with this one). (I'm happy to do that over in rubocop-performance if it sounds good).
We can create a new cop (called
Lint/Casecmp
, or maybeLint/CasecmpP
to prevent name clashes with the Performance cop) to suggest usingcasecmp?
instead of the#downcase
or#upcase
approaches.Let me know if this sounds like a good approach! I am happy to make a PR for this if it will move forward.
Describe alternatives you've considered
One alternative would be to consider
casecmp(...).zero?
a violation and suggest that it is changed tocasecmp?(...)
. This could lead to confusing conflicts if people have a version ofrubocop-performance
with thePerformance/Casecmp
cop and the newLint/Casecmp
cop, so I think it makes sense to considercasecmp(...).zero?
to be acceptable. (It's also the fastest approach when dealing with ASCII-only text, so there are many valid use cases for it.)The text was updated successfully, but these errors were encountered: