Instead of hand-parsing the argument list and possibly choking inside
comments, strings, heredoc, etc., use the normal token mechanism and
simply convert the tokens to a string representation as the argument
list. This technique might perhaps lead to some missing characters in
the argument list representation in the (unlikely) case a token
appearing in the argument list is reported as a generic type for which
it is impossible to know the correct string representation, but this
could always be fixed by adding a specific token type and is anyway
less problematic than maybe breaking further parsing of the document.
PHP namespaces don't work anything like a block, so the implementation
is specific and not combined with scope management. Namespaces cannot
be nested, and they may apply either to the rest of the file (until the
next namespace declaration, if any) or to a specific block.
Namespaces applying to the rest of the file:
namespace Foo;
/* code in namespace Foo */
namespace Bar\Baz;
/* code in namespace Bar\Baz */
Namespaces applying to blocks:
namespace Foo {
/* code in namespace Foo */
}
namespace Bar\Baz {
/* code in namespace Bar\Baz */
}
namespace {
/* code in root namespace */
}
Only generate tags for variable declarations without assignments inside
classes and interfaces not to get fooled by rvalues.
This prevents generation of a "$bar" tag for something like:
$foo = $bar;
while still generating "$bar" tag for:
class Foo {
var $bar;
}
Fix parsing of functions declarations with a leading ampersand (&),
used to make the function return a reference:
function &foo($arg1, $arg2) {
/* ... */
}
Rewrite the PHP parser as a real parser, not using regexes. This is
more complex but allows for better parsing.
Visible changes:
* Scope reporting;
* Variables inside functions are no longer reported (this is a
deliberate choice, but can be easily changed);
* Only the PHP part is parsed (e.g. it doesn't report JavaScript
functions anymore);
* Function arguments spanning several lines are properly reported;
* Interfaces are not yet parsed.
Otherwise the new parser should behave like the old one, at least
where it used to be right. Parsing of more constructs and reporting
more details is planned.
CTags defines __unused__ and __printf__, which not only are reserved
identifiers, but actually are used by GNUC as arguments of the
__attribute__() extension. This used to work because no code seeing
those definitions was trying to use them as __attribute__() argument,
but a recent change in GLib made it use it in atomic operation, which
are used by the tagmanager, which itself includes the CTags header
defining those, leading to a weird build failure -- since __unused__
expanded to an unexpected value.
To fix this, rename CTag's __ununsed__ to UNUSED and __printf__ to
PRINTF.
This prevents a regex pattern from fooling the parser if it contains
some recognized constructs, like comment or string literal starts.
Closes#2992393 and #3398636.
If a title contained multi-byte UTF-8 characters, it wasn't properly
recognized due to the title being longer (in bytes) than the underline.
So, fix the title length computation to properly count the characters,
not the bytes.
Note that this fix only handles ASCII, one-byte charsets and UTF-8, it
won't help with other multi-bytes encodings. However, the whole parser
expects ASCII-compatible encoding anyway, and in most situations it
will be fed the Geany's UTF-8 buffer.
Closes#3578050.
If we generated methods, properties or class children tags for a
variable, generate a class tag for the variable itself so the children
aren't orphaned.
If a property value had more than one token, the parser choked on it
and failed to parse further properties of the object. Fix that by
properly skipping the property's value. If that value is a sub-object,
parse it recursively.
Closes#3470609.
If an `if` haven't had braces, the code used to check itself for an
`else` after it, eating the next token if it wasn't actually an `else`.
So, drop the check for the else altogether since parseLine() handles
`else`s by calling parseIf() anyway.
This fixes constructs like:
if (foo)
bar();
function baz() {
// ...
}
Closes#3568542.
This makes `Foo.bar = function()` properly report a function tag "bar"
with scope "Foo" rather than a function tag "Foo.bar" with no scope.
Part of #3570192.