This prevents a regex pattern from fooling the parser if it contains
some recognized constructs, like comment or string literal starts.
Closes#2992393 and #3398636.
If a title contained multi-byte UTF-8 characters, it wasn't properly
recognized due to the title being longer (in bytes) than the underline.
So, fix the title length computation to properly count the characters,
not the bytes.
Note that this fix only handles ASCII, one-byte charsets and UTF-8, it
won't help with other multi-bytes encodings. However, the whole parser
expects ASCII-compatible encoding anyway, and in most situations it
will be fed the Geany's UTF-8 buffer.
Closes#3578050.
If we generated methods, properties or class children tags for a
variable, generate a class tag for the variable itself so the children
aren't orphaned.
If a property value had more than one token, the parser choked on it
and failed to parse further properties of the object. Fix that by
properly skipping the property's value. If that value is a sub-object,
parse it recursively.
Closes#3470609.
If an `if` haven't had braces, the code used to check itself for an
`else` after it, eating the next token if it wasn't actually an `else`.
So, drop the check for the else altogether since parseLine() handles
`else`s by calling parseIf() anyway.
This fixes constructs like:
if (foo)
bar();
function baz() {
// ...
}
Closes#3568542.
This makes `Foo.bar = function()` properly report a function tag "bar"
with scope "Foo" rather than a function tag "Foo.bar" with no scope.
Part of #3570192.
There is no need to set the token position information in the loop
searching for the initial token character, simply do that when we
finally found the token start.
The external declaration of "File" in read.h (defined in read.c) was
improperly tagged as "const" for it not to be modifiable outside of
read.c. Although it is good to protect this global variable against
improper modification, the use of "const" here makes it perfectly valid
for the compiler to assume that the fields in this structure never
changes during runtime, thus allowing it to do optimizations on this
assumption. However, this assumption is wrong because this structure
actually gets modified by many read.c's functions, and thus possibly
lead to improper and unexpected behavior if the compiler sees a window
for optimizing fields access.
Moreover, protecting "File" as it was with the "const" type qualifier
required a hack to be able to include read.h in read.c since "const"
and non-"const" declarations conflicts.
Actually, at least the JavaScript parser did suffer of the issue,
because it calls getSourceLineNumber() macro (expanding to a direct
"File" member access) several times in one single function, making it
easy for the compilers to cache the value as an optimization. Both GCC
and CLang showed this behavior with optimization enabled. As a result,
the line numbers of JavaScript tags were often incorrect.
This ungetc() call don't look legitimate and actually leads to lots
of warnings about ungetc() being called when another character was
already backed up.
When reading a C macro, make sure to only use as much byes we actually
got and not as much as we requested. This should not be a problem
anymore now 61c5216 fixed a too long read, but it's safer anyway.