Skip to content

Unclosed strings at EOF sometimes tokenized as T_WHITESPACE by the JS tokenizer #1718

Closed
@jrfnl

Description

@jrfnl

A JS file with the following code - note the missing closing ' !

alert('hi);

is being tokenized as follows:

0 :: L001 :: C1 :: T_OPEN_TAG :: (0) ::
1 :: L001 :: C1 :: T_WHITESPACE :: (0) ::

2 :: L002 :: C1 :: T_STRING :: (5) :: alert
3 :: L002 :: C6 :: T_OPEN_PARENTHESIS :: (1) :: (
4 :: L002 :: C7 :: T_WHITESPACE :: (5) :: 'hi);
5 :: L002 :: C12 :: T_CLOSE_TAG :: (0) ::

I believe that the fourth token 'hi); is incorrectly tagged as T_WHITESPACE. While this is caused by a parse error in the original JS code, this is clearly not whitespace.

Maybe the catch-all T_STRING or even a T_UNKNOWN would be more appropriate ?

Full tokenizer processing log:

        *** START JS TOKENIZING ***
        Process char 0 => \n (buffer: )
        Process char 1 => a (buffer: \n)
        => Added token T_WHITESPACE (\n)
        Process char 2 => l (buffer: a)
        Process char 3 => e (buffer: al)
        Process char 4 => r (buffer: ale)
        Process char 5 => t (buffer: aler)
        Process char 6 => ( (buffer: alert)
        => Added token T_STRING (alert)
                * char is token, looking ahead 8 chars *
                => Looking ahead 1 chars => ('
                => Looking ahead 2 chars => ('h
                => Looking ahead 3 chars => ('hi
                => Looking ahead 4 chars => ('hi)
                => Looking ahead 5 chars => ('hi);
                * look ahead found nothing *
        => Added token T_OPEN_PARENTHESIS (()
        Process char 7 => ' (buffer: )
                * looking for string closer *
                Process char 8 => h (buffer: ')
                Process char 9 => i (buffer: 'h)
                Process char 10 => ) (buffer: 'hi)
                Process char 11 => ; (buffer: 'hi))
        => Added token T_WHITESPACE ('hi);)
        *** END TOKENIZING ***
        *** START TOKEN MAP ***
        *** END TOKEN MAP ***
        *** START SCOPE MAP ***
        *** END SCOPE MAP ***
        *** START LEVEL MAP ***
        Process token 0 on line 1 [col:1;len:0;lvl:0;]: T_OPEN_TAG =>
        Process token 1 on line 1 [col:1;len:0;lvl:0;]: T_WHITESPACE => \n
        Process token 2 on line 2 [col:1;len:5;lvl:0;]: T_STRING => alert
        Process token 3 on line 2 [col:6;len:1;lvl:0;]: T_OPEN_PARENTHESIS => (
        Process token 4 on line 2 [col:7;len:5;lvl:0;]: T_WHITESPACE => 'hi);
        Process token 5 on line 2 [col:12;len:0;lvl:0;]: T_CLOSE_TAG =>
        *** END LEVEL MAP ***
        *** START ADDITIONAL JS PROCESSING ***
        Process token 0: T_OPEN_TAG =>
        Process token 1: T_WHITESPACE => \n
        Process token 2: T_STRING => alert
        Process token 3: T_OPEN_PARENTHESIS => (
        Process token 4: T_WHITESPACE => 'hi);
        Process token 5: T_CLOSE_TAG =>
        *** END ADDITIONAL JS PROCESSING ***

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions