We should isolate ctags from Geany completely and use separate types. At
the moment langType is shared by both Geany and ctags. For Geany redefine
it as TMParserType (which was currently used as the name of the enum and
was unused) and use everywhere in Geany. At the same time convert some
ints to TMParserType where they denote the parser.
This is strictly speaking an API change but no plugin uses langType at the
moment so its renaming doesn't cause any problems.
The only remaining visible ctags type is tagEntryInfo - it is however
used only inside tagmanager (and can be later removed quite easily too
by slightly reorganizing TM source files).
At the moment the Geany code uses arbitrary combination of the following
synonyms
TM_PARSER_NONE / LANG_IGNORE / -2
TM_PARSER_AUTO / LANG_AUTO / -1
Especially using just the numbers makes things very confusing.
In the Geany code this patch replaces all occurrences of -2 and LANG_IGNORE
with TM_PARSER_NONE. It also removes LANG_IGNORE from the header which
isn't needed any more.
In addition, TM_PARSER_AUTO/LANG_AUTO shouldn't be used at all. We want
filetype detection based on Geany's definitions and not based on the
hard-coded ctags definitions. Remove it completely.
Finally, as it's clearer now what the constants mean, the patch fixes the
implementation of langs_compatible() (tag or file can never be of type
AUTO but we should rather check for NONE filetypes which we should
consider incompatible between each other).
Pop up scope completion dialog for namespaces too; e.g. for
boost::
show all symbols defined in the namespace. Determine whether the namespace
scope completion should be used based on whether user typed a scope
separator. If so, perform completion for namespaces before normal scope
completion - this seems to work better e.g. for Scintilla where
Scintilla::
would otherwise pop up the varible sci instead of showing everything
in the namespace (might be more questionable for languages where
the scope separator is identical to the dereference operator like
Java's "." but we have to make some choice anyway).
The performance seems to be reasonable - for the completion all tags
have to be walked but after testing with big C++ projects like
boost and Mozilla, the completion takes only something like 0.2s
which is acceptable as the delay happens only on typing the scope
completion separator and feels kind of expected.
Also tested with linux kernel sources which normally lack any scope
information by hacking TM a bit and injecting 10-character scope for
each tag - then the completion takes something over 0.5s.
We only perform search based on variable name so if a variable is e.g. of
the type std::Foo, we can drop the std:: prefix and search only for the
Foo type.
This is far from perfect and contains a lot of guessing. It showed
good results based on our tests cases, fixing several issues and not
introducing any more issues (admittedly, after working around a subtle
one regarding D static ifs).
Closes#845.
Also, don't perform subtractions to check pointer bounds, to avoid
unsigned value wraparound. This is very unlikely as it would either
mean a very large `nth` value or a very small value for the current
line pointer, but better safe than sorry.
See http://en.cppreference.com/w/cpp/language/string_literalCloses#877.
---
This contains a pretty ugly hack to fetch the previous character, in
order not to get fooled by string concatenation hidden behind a macro,
like in `FOUR"five"`, which is not a raw string literal but simply the
identifier `FOUR` followed by the string `"five"`.
While this may sound uncommon, it is not and lead to complaints [2][3]
when Scintilla [1] broke this when they introduced C++11 raw string
literal support themselves.
The implementation here still contains a bug with line continuations: a
raw literal of the form:
```c
const char *str = R\
"xxx(...)xxx";
```
is not properly recognized as such, although it's perfectly valid (yet
probably very uncommon). For the record, Scintilla has also suffers
from this but nobody complained about it yet.
[1] http://scintilla.org/
[2] https://sourceforge.net/p/scintilla/bugs/1207/
[3] https://sourceforge.net/p/scintilla/bugs/1454/
Because the tm_source_file_new() is an exported API, it has to be at least a
boxed type to be usable for gobject introspection. The boxed type uses
reference counting as opposed to memory duplication.
The obligatory tm_source_file_dup() is not exported (doesn't have to).
First, the search for existing type with the given scope should be done
also for namespaces.
Second, with the string operations we get no scope as empty string ""
but the rest of TM functions expect scope to be set to NULL in such
case. Fix that.
When we already have a struct-like type in namespace search,
we don't need any extra resolution - we already have the
right type. Skip the whole typedef resolution in this case.
Make sure the anonymous types are from the same file as the
variable of that type (or, when performing typedef resolution, from
the same file as the typedef).
On the way, simplify find_scope_members() a bit and fix some minor
problems.
Make the unused code compile and remove unused tm_get_current_function()
(we have similar symbols_get_current_function() and there's no reason
to keep it)
The main reason for separating m_workspace_find() into two parts is the
fact that when matching only the prefix, the result may contain too
many results and we need to go through all of them, return them and at the
end discard most of them.
For instance, when considering the linux kernel project with 2300000 tags
and when autocompletion is set to be invoked after typing a single character,
we get on average something like 100000 results (tag_num/alphabet_size).
But from these 100000 results, we get only the first 30 which we display
in the popup and discard the rest which means going through the list of
the 100000 tags and comparing them for no reason.
Thanks to using binary search for the start and the end of the sequence of
matching tags (added in a separate patch), we can get the start of the
sequence and the length of the sequence very quickly without going through
it.
For the prefix search we can limit the number of tags we are interested
in and go through at most this number of returned tags (to be precise,
times two, because we need to go both through the workspace array and
global tags array and remove the extras only after sorting the two).
It would be possible to combine both tm_workspace_find() and
tm_workspace_find_prefix() into a single function but the result is a bit
hard to read because some of the logic is used only in tm_workspace_find()
and some only in tm_workspace_find_prefix() so even though there is some
code duplication, I believe it's easier to understand this way.
Consider types with members to have the same properties everywhere (this
might differ language to language but this assumption should behave
reasonably for any language).
Don't check member type in find_scope_members_tags() - we already check
scope which should be sufficient and will work even if some language
uses function/variable instead of method/member/field.
For instance, consider
class A {
int a;
int b;
}
class B {
A c;
void foo() {
c. //<---- (3)
}
}
int main() {
c. //<---- (1)
foo.c. //<---- (2)
}
Consider cases (1) and (2) first - in the case (1) scope completion
shouldn't be performed because c isn't a global variable; however,
in case (2) it should be performed because c is a member.
To fix this, we can check whether the typed variable ('c' in this case)
is preceeded by another dot - if it is, use member tags for scope
completion; otherwise don't use them.
There's one exception from this rule - in the case (3) we are accessing
a member variable from a member function at the same scope so the
function should have access to the variable. For this we can use the
scope at the position of the cursor. It should be
B::foo
in this case, more generally ...::class_name::function_name. We need
to check if class_name exists at the given scope and if the member
variable we are trying to access is inside ...::class_name - if so,
scope completion can be performed using member tags (without explicit
invocation on a member).
See also https://sourceforge.net/p/ctags/bugs/194/
I didn't use the exact upstream patch only altering the C++ code path,
because as far as I know no c.c language recognize two consecutive
colons separated by whitespace as a single token, so there's no point
in carrying on mistakes from the past.
We just need to skip the (...) and perform autocompletion as before.
Shift pos by 1 in the whole function so we don't have to look 2 characters
back (makes the function easier to read).
Functions contain pointers in their return values - remove them before
searching for the type.
Also restrict the searched variable/function/type tags a bit only to
types which make sense for the search.
In principle this is very similar to the normal scope search. If the
provided name belongs to a type that can contain members (contrary to a
variable in scope search), perform the namespace search. With namespace
search show all possible members that are at the given scope.
Since we perform the scope search at file level, don't perform the
namespace search for tags that can span multiple files otherwise we get
incomplete results which could be confusing to users. This involves
namespaces and packages.
Rethink how to extract members from the struct types. Inspired by
the patch using the same file as the typedef to search for structs,
we can do the same to extract the members only from the file
containing the struct and not the whole workspace. This makes
this operation fast enough so we don't have to keep the extracted
members in a special array (this will become especially useful
for namespace search because for it we would have to extract
all tags and then the extracted array would have the same
size as the workspace so we'd lose the performance gain).
Since the above works only for tags having the file information,
that is, not the global tags, we'll lose some performance
when searching the global tags. I think people don't create
the tag files for complete projects but rather for header files
which contain less tags and still the performance should be
better than before this patch set because we go through the
global tag list only once (was twice before).
On the way, clean up the source a bit, add more comments and move
some code from find_scope_members() to find_scope_members_tags().
Even though enums contain members, their members are accessed in a
different way than members of classes and structs. E.g. consider:
typedef enum {A, B, C, D} MyEnum;
Variable of this type is declared as
MyEnum myVar;
myVar can be assigned a value from MyEnum; however, we don't access myVar
over the dot operator so we don't need the list of all members after
typing
myVar.
This patch eliminates some false positives after typing .
The implementation of this function is almost the same like the original
m_workspace_find_scoped_members() and there's nothing interesting here
we wouldn't be able to recreate trivially.
By comparing the file pointer in the loop we can speed it up a bit
because we can avoid the strcmp() (this function is the slowest part of
the scope completion based on profiling).
Also move the pointer array creation to this function and return it which
is a bit cleaner.
Disclaimer: I have absolutely no idea how the original function works.
After gazing into the code for one hour, I just gave up and wrote my own
version of it based on what I think the function should do
but maybe I'm just missing something what justifies the original
implementation's insanity.
The previous commit fixed the situation when e.g. anon_struct_0 was in the
current file by checking the current file first.
In the case the struct type definition isn't found in the current file,
at the moment we get all members from all anon_struct_0 which can be a
really long list. This list isn't very helpful to users because all the
members from all the structs are mixed. Moreover, not all possible members
are in the list because there are e.g. just members from anon_struct_0 but
not from anon_struct_1 etc. which from the point of view of this function
is a different type.
Instead, restrict the returned members to just a single file (anonymous
structs have unique name per file so it means there will be just one
from the file). Of course the picked file can be wrong and the returned
members might be from a different struct the user wanted but at least
the list will make more sense to users.
At the moment it can happen that even though a member is found in the
currently edited file, the search at the end of the function finds
the type inside another file. This typically happens for anonymous
structs so e.g. for anon_struct_0{...} from the current file we get
members from anon_struct_0{...} from all open documents plus gloabl tags.
Search in an increasing "circle" - start with current file only (trying
all possible types of the variable), continue with workspace array and
finally, if not found, search in the global tags.
All of these typos were found by codespell, so credits go the
the authors of this incredibly useful tool.
I manually confirmed and adapted all changes, which includes
reflowing over-long lines or filling up with spaces for alignment.
Some of these typos may need forwarding to their original authors.
codespell reported a lot words where I am unsure; I have not
included those corrections.
These appear under 64-bit Windows. Only the sciwrappers.c warning is
potentially dangerous. For win32.c, the "handle" provides some useful
information, while "lStdHandle" does not.