1) force a config reload after some initial tests.
this will allow to identify memleaks using the valgrind test,
as this will free all structures allocated for the config, and
recreate them.
2) test ErrorFile directive by adding several of them.
this should help catch regressions such as the one fixed in
4847d8cdb3bfd9b30a10bfed848174250475a69b.
it will also test memleaks in the related code paths.
3) test some scenarios that should produce errors and use the
configured ErrorFile directives.
get_request_entity()'s purpose is to drain remaining unread bytes
in the request read pipe before handing out an error page,
and kinda surprisingly, also when connection to the stathost is
done.
in the stathost case tinyproxy just skipped proper processing and
jumped to the error handler code, and remembering whether a
connection to the stathost was desired in a variable, then doing
things a bit differently depending on whether it's set.
i tried to fix issues with get_request_entity in
88153e944f7d28f57cccc77f3228a3f54f78ce4e (which is basically the
right fix for the issue it tried to solve, but incomplete),
and resulting from there in 78cc5b72b18a3c0d196126bfbc5d3b6473386da9.
the latter fix wasn't quite right since we're not supposed to check
whether the socket is ready for writing, and having a return value
of 2 instead of 1 got resulted in some of the if statements not
kicking in when they should have.
this also resulted in the stathost page no longer working.
after in-depth study of the issue i realized that we only need to
call get_request_entity() when the headers aren't completely read,
additional to setting the proper connection timeout as
88153e944f7d28f57cccc77f3228a3f54f78ce4e already implemented.
the changes of 78cc5b72b18a3c0d196126bfbc5d3b6473386da9 have been
reverted.
another fallout of the config refactoring finished by
2e02dce0c3de4a231f74b44c34647406de507768.
apparently no one using the ErrorFile directive used git master
during the last months, as there have been no reports about this issue.
in networking, hitting a timeout requires that *nothing* happens during the
interval. whenever anything happens, the timeout is reset.
there's no need to do custom time calculations, it's perfectly fine to let
the kernel handle it using the select() syscall.
additionally the code added in 0b9a74c29036f9215b2b97a301b7b25933054302
assures that read and write syscalls() don't block indefinitely and return
on the timeout too, so there's no need to switch sockets back and forth
between blocking/nonblocking.
introduced in 88153e944f7d28f57cccc77f3228a3f54f78ce4e.
when connect method is used (HTTPS), and e.g. a filtered domain requested,
there's no data on readfds, only on writefds.
this caused the response from the connection to hang until the timeout was
hit. in the past in such scenario always a "no entity" response
was produced in tinyproxy logs.
tested with 32K acl rules, generated by
for x in `seq 128` ; do for y in `seq 255` ; do \
echo "Deny 10.$x.$y.0/24" ; done ; done
after loading the config (which is dogslow too), tinyproxy
required 9.5 seconds for the acl check on every request.
after switching the list implementation to sblist, a request
with the full acl check now takes only 0.025 seconds.
the time spent for loading the config file is identical for both
list implementations, roughly 30 seconds.
(in a previous test, 65K acl rules were generated, but every
connection required almost 2 minutes to crunch through the list...)
get_request_entity() is only called on error, for example if a client
doesn't pass a check_acl() check. in such a case it's possible that
the client fd isn't yet ready to read from.
using select() with a timeout timeval of {0,0} causes it to return
immediately and return 0 if there's no data ready to be read.
this resulted in immediate connection termination rather than returning
the 403 access denied error page to the client and a confusing
"no entity" message displayed in the proxy log.
the code wrongly processed the site_spec (here: domain) parameter
only when PT_TYPE == PT_NONE.
re-arranged code to process it correctly whenever passed.
additionally the mask is now also applied to the passed subnet/ip,
so a site_spec like 127.0.0.1/8 is converted into 127.0.0.0/8.
also the case where inet_aton fails now produces a proper error
message.
note that the code still doesn't process ipv6 addresses and mask.
to support it, we should use the existing code in acl.c and refactor
it so it can be used from both call sites.
closes#83closes#165
previously, in order to detect and insert {variables} into error/stats
templates, tinyproxy iterated char-by-char over the input file, and would
try to parse anything inside {} pairs and treat it like a variable name.
this breaks CSS, and additionally it's dog slow as tinyproxy wrote every
single character to the client via a write syscall.
now we process line-by-line, and inspect all matches of the regex
\{[a-z]{1,32}\}. if the contents of the regex are a known variable name,
substitution is taking place. if not, the contents are passed as-is to
the client. also the chunks before and after matches are written in
a single syscall.
closes#108
it's quite unexpected for an application running foreground in a
terminal to keep running when the terminal is closed.
also in such a case (if file logging is disabled) there's no way to
see what's happening to the proxy.
fixes an error in travis, which makes a shallow clone of 50 commits.
if the last tag is older than 50 commits, we get:
"fatal: No names found, cannot describe anything."
this caused a premature exit due to an assert error in safe_write()
on this line: assert (count > 0);
because the version variable in tinyproxy was empty.
inet_ntoa() uses a static buffer and is therefore not threadsafe.
additionally it has been deprecated by POSIX.
by using inet_ntop() instead the code has been made ipv6 aware.
note that this codepath was only entered in the unlikely event that
no hosts header was being passed to the proxy, i.e. pre-HTTP/1.1.
* check return values of memory allocation and abort gracefully
in out-of-memory situations
* use sblist (linear dynamic array) instead of linked list
- this removes one pointer per filter rule
- removes need to manually allocate/free every single list item
(instead block allocation is used)
- simplifies code
* remove storage of (unused) input rule
- removes one char* pointer per filter rule
- removes storage of the raw bytes of each filter rule
* add line number to display on out-of-memory/invalid regex situation
* replace duplicate filter_domain()/filter_host() code with a single
function filter_run()
- reduces code size and management effort
with these improvements, >1 million regex rules can be loaded with
4 GB of RAM, whereas previously it crashed with about 950K.
the list for testing was assembled from
http://www.shallalist.de/Downloads/shallalist.tar.gzcloses#20
the file docs/filter-howto.txt was removed, as it contained misleading
information since it was first checked in.
it suggests the syntax for filter rules is fnmatch()-like, when in
fact they need to be specified as posix regular expressions.
additionally it contained a lot of utterly unrelated and irrelevant/
outdated text.
a few examples with the correct syntax have now been added to
tinyproxy.conf.5 manpage.
closes#212
it turned out that the upstream section in tinyproxy.conf.5 wasn't rendered
properly, because in asciidoc items following a list item are always explicitly
appended to the last list item.
after several hours of finding a workaround, it was decided to change the
manpage generator to pod2man instead.
as pod2man ships together with any perl base install, it should be available
on almost every UNIX system, unlike asciidoc which requires installation
of a huge set of dependencies (more than 1.3 GB on Ubuntu 16.04), and the
replacement asciidoctor requires a ruby installation plus a "gem" (which is
by far better than asciidoc, but still more effort than using the already
available pod2man).
tinyproxy's hard requirement of a2x (asciidoctor) for building from source
caused rivers of tears (and dozens of support emails/issues) in the past, but
finally we get rid of it. a tool such as a2x with its XML based bloat-
technology isn't really suited to go along with a supposedly lightweight
C program.
if it ever turns out that even pod2man is too heavy a dependency, we could
still write our own replacement in less than 50 lines of awk, as the pod
syntax is very low level and easy to parse.