[5001] SMF 2.1/2.0 Case sensitive usernames (russian) when login

Started by jk3, November 06, 2012, 08:31:48 AM

Previous topic - Next topic

jk3

0005001: Login error with non-ascii charaters

Has tested on clean SMF 2.1 alpha & 2.0.2 release with UTF-8.

I created two users, both russian login Река and english Peka with identical passwords -- 1234.
See data, that transfer from client to server on first login try:


Russian login:
==============
Река
username: %D0%A0%D0%B5%D0%BA%D0%B0
hash_passwrd: 24e8a0938351505b928018c282d885bd5d1c29f1

река
username: %D1%80%D0%B5%D0%BA%D0%B0
hash_passwrd: ad6c3fd904d2f6ea8be628a0f6c4a020b1d8607c


English login:
==============
Peka
username: Peka
hash_passwrd: 845b321c98abc069d944bc3edc9b3a8854ffd28e

peka
username: peka
hash_passwrd: 845b321c98abc069d944bc3edc9b3a8854ffd28e


In english username case has no effect on hash, but in russian has => case-sensitive russian login :(

emanuele

I'm very sorry for the late reply here... :-[

Did you try the patch attached to the bug reports?


Take a peek at what I'm doing! ;D




Hai bisogno di supporto in Italiano?

Aiutateci ad aiutarvi: spiegate bene il vostro problema: no, "non funziona" non è una spiegazione!!
1) Cosa fai,
2) cosa ti aspetti,
3) cosa ottieni.

jk3

Try, with no effect.

I suppose problem also on client side in script.js with function strtolower.

MrPhil

Does the JS strtolower() know about UTF-8, or only ASCII? Does the browser's language and character set come into play here? If the page is UTF-8, I would think that all JS string functions would operate in UTF-8. You might want to double check that your page is really in UTF-8 and not in a single byte Latin-x Cyrillic page (overridden by your server). I've heard of servers forcing Latin-1, but not other encodings. Do a View > Character Encoding (or similar on your browser) and confirm that it thinks the page is UTF-8. If it is, maybe someone could brew up a quick JS test to see if strtolower() does the right thing with Cyrillic text. It could be as simple as adding an alert() call to show the original text and the lower case version. If no one has done it in the next 12 hours, I'll take a look at it tonight. I can't input Cyrillic, so someone else will have to do the actual testing. In the meantime, if the browser itself is configured in English, see if you can switch it to Russian (for prompts and messages) and if that makes any difference.

Arantor

The JS strtolower method is modelled to work the same as PHP native strtolower, i.e. not aware of anything that isn't Latinised.
Holder of controversial views, all of which my own.


MrPhil

I took at look through the very strange code in script.js, and it appears that they convert the UTF-16 values used by Javascript to a series of UTF-8 bytes. So far, so good. Then they fold each of these bytes (8 bits) individually to lower case using a Windows-1252 table! Whatever the characters were before (say, Cyrillic text), they will now be gibberish. The problem becomes that a Cyrillic "P" isn't going to end up the same sequence of 3 bytes that a Cyrillic "p" would, after folding to lower case.

If you'd like to experiment, you might try the following: change (2 places)  .php_to8bit().php_strtolower()   to   .toLowerCase.php_to8bit()   (or maybe .toLowerCase().php_to8bit() ).

This should do a proper UTF-16 fold to lower case, and then convert the numbers (0..65535 or maybe larger) to a stream of UTF-8 bytes which hex_sha1() should hopefully be able to digest. I hope that's all there is to it, and there isn't PHP code (on the server side) that also has to be fixed. That would be a problem, as I don't think there's UTF-16 support built into PHP. Let us know if it works or if you discover anything interesting.

P.S. I'm sure all your Cyrillic passwords will have to be changed if you do this.

jk3

Follow back to my example:
Quote
Russian login:
==========
Река
username: %D0%A0%D0%B5%D0%BA%D0%B0
hash_passwrd: 24e8a0938351505b928018c282d885bd5d1c29f1

река
username: %D1%80%D0%B5%D0%BA%D0%B0
hash_passwrd: ad6c3fd904d2f6ea8be628a0f6c4a020b1d8607c
When I transfer from client to server data:

username: %D1%80%D0%B5%D0%BA%D0%B0 // == река
hash_passwrd: 24e8a0938351505b928018c282d885bd5d1c29f1 // == hash from Река

Login is sucsessful!

No matter what case username is transferring from client to server, but hash matters.

That's why I think problem is --  to generate on client side the same hash as was generated on server side (without any changes in php code).

jk3

Now, after all my experiments a think problem still on both sides.


If member name in lowercase I can fix user input by applying in hashLoginPassword() such code: value.toLowerCase().php_to8bit().php_strtolower()

But if member name in Titled case this patch will be wrong and totally denied login for this member.

See:

username = Надо

php_to8bit()                                208 / 157 / 208 / 176 / 208 / 180 / 208 / 190
toLowerCase().php_to8bit()                  208 / 189 / 208 / 176 / 208 / 180 / 208 / 190
php_to8bit().php_strtolower()               240 / 157 / 240 / 176 / 240 / 180 / 240 / 190       // Only in this case hash is correct!
toLowerCase().php_to8bit().php_strtolower() 240 / 189 / 240 / 176 / 240 / 180 / 240 / 190

username = надо

php_to8bit()                                208 / 189 / 208 / 176 / 208 / 180 / 208 / 190
toLowerCase().php_to8bit()                  208 / 189 / 208 / 176 / 208 / 180 / 208 / 190
php_to8bit().php_strtolower()               240 / 189 / 240 / 176 / 240 / 180 / 240 / 190
toLowerCase().php_to8bit().php_strtolower() 240 / 189 / 240 / 176 / 240 / 180 / 240 / 190

This can be only if hash was generated on server for "Надо", not "надо".

MrPhil

I can't see using both toLowerCase() and php_strtolower() together in the Javascript. Anyway, toLowerCase() should fold UTF text properly to lower case, and php_strtolower() assumes Windows-1252 encoding of each byte in the UTF-8 stream, which is ridiculous. You can see from your tests that toLowerCase().php_to8bit() gives the same result for both the capitalized input and the lower case input.

I haven't looked on the PHP (server) side to see if it's duplicating some of this work. If it is, it would be difficult to duplicate toLowerCase() for all alphabets with mixed case (you might just handle Latin and Cyrillic if you had to write it yourself).

jk3

Quote from: MrPhil on January 29, 2013, 12:39:24 PMphp_strtolower() assumes Windows-1252 encoding of each byte in the UTF-8 stream, which is ridiculous.
But why only php_to8bit().php_strtolower() combination gives correct hash?

While generating hash PHP takes the same procedure (Windows-1252 encoding of each byte in the UTF-8 stream()) ??

MrPhil

On what grounds are you declaring that to be the "correct" hash? If php_to8bit().php_strtolower() was used to generate the password hash in the first place, then of course it will match (probably only for the same "case" of user ID as originally used). Someone familiar with what SMF password usage is supposed to be doing will have to address anything further, especially whether the same process takes place somewhere in the server PHP code. I can assure you that taking a UTF-8 multibyte stream and folding each byte to lower case using Windows-1252 tables is definitely not the right way to do it. Javascript's toLowerCase() method should be adequate on the browser side (assuming all browsers implement it correctly), but for Unicode someone might have to write the equivalent for the server PHP side (if it is hashing passwords, too).

If SMF would simply put transmission of passwords under SSL encryption, the problem would solve itself... Or, they could simply declare user IDs to be case sensitive, and not play with trying to fold any alphabet to lower case.

Arantor

You do realise that the current process is pretty much all that can be done in SMF? I guess it hadn't occurred that there might be *reasons* why things are done the way they are.

The main problem is unreliability of server behaviour. You absolutely need the server and client to be doing the same thing given the inputs. That means the case folding has to be the same.

The problem is, PHP does not have reliable and consistent case folding. Even $smcFunc['strtolower'] will provide different results depending on what is installed (mb_string, iconv come to mind). Since these variations are not disclosed to the client, and these things can even vary between versions of library, let alone between library, you have two choices.

Either, 1) you provide JS equivalents of all the possible translations the server could do and somehow indicate from the server to the client which should be in use, or 2) reduce it down to the most reliable overall behaviour, which is to use strtolower on the server, a function that behaves consistently and replicate its functionality on the client end. SMF chose to do 2).

QuoteIf SMF would simply put transmission of passwords under SSL encryption, the problem would solve itself...

Yes, because of course everyone running SMF should have their own paid-for certificate (using SSL is server configuration, with appropriate certificates, not application level)

QuoteOr, they could simply declare user IDs to be case sensitive, and not play with trying to fold any alphabet to lower case.

Yes, because we like forcing every user of any version to reset their password when upgrading or converting from other systems, and we like imposing technical restrictions (rather than compromises) on things.
Holder of controversial views, all of which my own.


jk3

I almost defeated the problem!

I apply patch from developer and additionally fix script.js:

-- doForm.hash_passwrd.value = hex_sha1(hex_sha1(doForm.user.value.php_to8bit().php_strtolower() + doForm.passwrd.value.php_to8bit()) + cur_session_id);
+ doForm.hash_passwrd.value = hex_sha1(hex_sha1(doForm.user.value.toLowerCase().php_to8bit() + doForm.passwrd.value.php_to8bit()) + cur_session_id);

For new users it's work perfectly!

But! Old users can login only from login2 page (where password transferred unencrypted).
Password change by user or by admin panel for this user doesn't correct this!

I suppose developer should correct PHP code so hash for old users (that changes they passwords) must be identical hash for new user with the same username.

hash(username) // after patch, new user
=
hash(username) // after patch, old user that changes password

So, should not be difference between registering new user and changing password old user.

Arantor

Why has this been moved to fixed? Last I checked it wasn't fixed already.
Holder of controversial views, all of which my own.


Antes

Quote from: Arantor on May 18, 2014, 11:02:44 AM
Why has this been moved to fixed? Last I checked it wasn't fixed already.

http://dev.simplemachines.org/mantis/view.php?id=5001 please check the mantis, we moved that to GitHub.

Arantor

Holder of controversial views, all of which my own.


motechman

I'm a new SMF admin, and I just discovered username case sensitivity on my 2.07 installation with English language.

Is this behavior by design, or is there a configuration setting that will change the default?

It's very unusual to have case sensitivity on usernames in my lengthy experience of systems that require login credentials.

Arantor

There's only case sensitivity on cases where strtolower() won't handle the transition properly. Often if the characters are accented they won't be handled properly.
Holder of controversial views, all of which my own.


motechman

I have a test account with the username TypicalUser, and unless I capitalize the T and the U I get an "invalid password" error when trying to login with the correct behavior. I made certain it was due to case sensitivity of the username by cutting and pasting the password in.

This is with recently downloaded SMF version 2.07 on an American host (Siteground.com) that uses CentOS server OS.

Arantor

If it's just normal ASCII characters, there is no reason for the username to be case sensitive and on stock 2.0.7 I have absolutely no trouble with that. The password, of course, must be case sensitive.
Holder of controversial views, all of which my own.


Advertisement: