-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Comment tokenizer can break when using mbstring function overloading #692
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -54,7 +54,7 @@ public function tokenizeString($string, $eolChar, $stackPtr) | |||
*/ | |||
|
|||
for ($c = 0; $c < $numChars; $c++) { | |||
if ($string[$c] !== '/' && $string[$c] !== '*') { | |||
if (substr($string, $c, 1) !== '/' && substr($string, $c, 1) !== '*') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To my knowledge old code does the same as new code. Am I missing something.
Old code won't work on PHP7, but new code will however. That's good thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Old code works perfectly fine on PHP7. The unit tests have been running on it most of the year.
Are you passing the |
We use |
If you've just switched to substr(), it means you're probably overloading the built-in string functions using the multibyte extension. PHP_CodeSniffer doesn't do multibyte support like that. It can't rely on people doing the overloading. PHP_CodeSniffer uses the iconv extension and the encoding that you supply instead. It uses the special iconv_* functions to do length and substring actions where needed. So in this case, there may very well be a problem, but your fix wont actually solve it for the majority of users. None of your submissions actually displayed your comment correctly, so I'll see if I can replicate it myself and add a test. Then I'll see which of those function calls needs fixing and if I can do it with iconv. |
Also, if you're getting unit test failures, you probably don't have the iconv extension installed. The failing tests are likely due to the multibyte test comments that are in the test files. |
I've tried replicating commenting problems with various multi-byte characters and I can't do it. I remember writing this code and not needing any iconv functions in it because it doesn't actually matter if the lengths of the tokens are wrong. The chars I'm looking for are fixed lengths (like So I don't know what it going on with your code. Maybe it is a result of a problem higher up. Just check you have the iconv extension installed or the encoding option is doing nothing. Besides that, if you can give me a code snippet that fails, I'll do more testing on it. |
This may not be true if say a russian letter in UTF-8 is encoded as 3 bytes, while ASCII code of individual byte matches the |
Maybe you are right. I tried to repeat the bug locally on Win7. The newly set php, cloned repository phpcs and exec: |
Nonetheless:
|
I'm well aware, but it shouldn't matter in this case because of the specific checks I am doing. As I said above, I remember extensively testing this code while I was writing it because I originally had iconv stuff everywhere and eventually realised it didn't need to support multi-byte strings because they are fixed elsewhere in the main tokenizer. And performance is a huge deal for tokenizing, so even switching to substr or preg* sees a massive decrease on large code bases. |
Show me, please, how you check description of tag phpDoc. |
I'm not really sure what you are asking. Are you reporting a related bug? It's probably going to be easier if you post code that produces errors so I can create tests and fix the problems. |
Ooops! Sorry. I forgot to say, we are using the option:
As a result, some string functions work with symbols ( |
I've added a commit that changes the way the comment tokenizer works so that references to positions inside strings work correctly. This works because those positions are only used for whitespace and known characters ( |
Just realised this should have been closed last month. |
I tried as it is written here: guidelines for contributing
And..
Tests for a clean repository:
And tests after applying my patch:
:)
P.S. Reason for changes: http://pear.php.net/bugs/bug.php?id=20943