Commit Graph

1636 Commits (34a8b28414de419c984f832a299383f6d7149f72)

Author SHA1 Message Date
rofl0r 34a8b28414 save headers in an ordered dictionary
due to the usage of a hashmap to store headers, when relaying them
to the other side the order was not prevented.
even though correct from a standards point-of-view, this caused
issues with various programs, and it allows to fingerprint the use
of tinyproxy.

to implement this, i imported the MIT-licensed hsearch.[ch] from
https://github.com/rofl0r/htab which was originally taken from
musl libc. it's a simple and efficient hashtable implementation
with far better performance characteristic than the one previously
used by tinyproxy. additionally it has an API much more well-suited
for this purpose.

orderedmap.[ch] was implemented from scratch to address this issue.
behind the scenes it uses an sblist to store string values, and a htab
to store keys and the indices into the sblist.
this allows us to iterate linearly over the sblist and then find the
corresponding key in the hash table, so the headers can be reproduced
in the order they were received.

closes #73
2020-09-15 23:11:59 +01:00
rofl0r 9d5ee85c3e fix free()ing of config items
- we need to free the config after it has been succesfully loaded,
  not unconditionally before reloading.
- we also need to free them before exiting from the main program
  to have clean valgrind output.
2020-09-15 23:11:59 +01:00
rofl0r 372d7ff824 shutdown: free children from right place 2020-09-15 22:32:42 +01:00
rofl0r 2f3a3828ac Revert "childs.c: fix minor memory leak"
This reverts commit 6dd3806f7d.
2020-09-15 22:25:53 +01:00
rofl0r 6dd3806f7d childs.c: fix minor memory leak
this would leak only once on program termination, so it's no big
deal apart from having spurious reachable memory in valgrind logs.
2020-09-15 20:02:12 +01:00
rofl0r 7eb6600aeb main: orderly shutdown on SIGINT too
the appropriate code in the signal handler was already set up,
but for some reason the signal itself not being handled.
2020-09-14 20:59:02 +01:00
rofl0r 7014d050d9 run_tests: make travis happy, use signal nr instead of name 2020-09-14 17:02:36 +01:00
rofl0r ff23f3249b conf.c: include common.h 2020-09-14 17:02:36 +01:00
rofl0r 17e19a67cf run_tests: do some more extensive testing
1) force a config reload after some initial tests.
   this will allow to identify memleaks using the valgrind test,
   as this will free all structures allocated for the config, and
   recreate them.
2) test ErrorFile directive by adding several of them.
   this should help catch regressions such as the one fixed in
   4847d8cdb3.
   it will also test memleaks in the related code paths.
3) test some scenarios that should produce errors and use the
   configured ErrorFile directives.
2020-09-13 01:09:21 +01:00
rofl0r c64ac9edbe fix get_request_entity()
get_request_entity()'s purpose is to drain remaining unread bytes
in the request read pipe before handing out an error page,
and kinda surprisingly, also when connection to the stathost is
done.

in the stathost case tinyproxy just skipped proper processing and
jumped to the error handler code, and remembering whether a
connection to the stathost was desired in a variable, then doing
things a bit differently depending on whether it's set.

i tried to fix issues with get_request_entity in
88153e944f (which is basically the
right fix for the issue it tried to solve, but incomplete),
and resulting from there in 78cc5b72b1.
the latter fix wasn't quite right since we're not supposed to check
whether the socket is ready for writing, and having a return value
of 2 instead of 1 got resulted in some of the if statements not
kicking in when they should have.
this also resulted in the stathost page no longer working.

after in-depth study of the issue i realized that we only need to
call get_request_entity() when the headers aren't completely read,
additional to setting the proper connection timeout as
88153e944f already implemented.
the changes of 78cc5b72b1 have been
reverted.
2020-09-13 00:37:19 +01:00
rofl0r bfe59856b2 tests/webclient: return error when HTTP status > 399 2020-09-13 00:35:38 +01:00
rofl0r 4847d8cdb3 add_new_errorpage(): fix segfault accessing global config
another fallout of the config refactoring finished by
2e02dce0c3.

apparently no one using the ErrorFile directive used git master
during the last months, as there have been no reports about this issue.
2020-09-12 21:38:04 +01:00
rofl0r df9074db6e vector.h: missing include <unistd.h> for ssize_t 2020-09-12 15:56:36 +01:00
rofl0r 9e40f8311f handle_connection(): print process_*_headers errno information 2020-09-10 21:13:31 +01:00
rofl0r f1bd259e6e handle_connection: replace "goto fail" with func call
this allows to see in a backtrace from where the error was
triggered.
2020-09-10 14:48:39 +01:00
rofl0r e94cbdb3a5 handle_connection(): factor out failure code
this allows us in a next step to replace goto fail with a call to that
function, so we can see in a backtrace from where the failure was
triggered.
2020-09-10 14:37:56 +01:00
rofl0r b549ba5af3 remove bogus custom timeout handling code
in networking, hitting a timeout requires that *nothing* happens during the
interval. whenever anything happens, the timeout is reset.
there's no need to do custom time calculations, it's perfectly fine to let
the kernel handle it using the select() syscall.

additionally the code added in 0b9a74c290
assures that read and write syscalls() don't block indefinitely and return
on the timeout too, so there's no need to switch sockets back and forth
between blocking/nonblocking.
2020-09-09 12:37:23 +01:00
rofl0r b4e3f1a896 fix negative timeout resulting in select() EINVAL 2020-09-09 11:59:40 +01:00
rofl0r 78cc5b72b1 get_request_entity: fix regression w/ CONNECT method
introduced in 88153e944f.
when connect method is used (HTTPS), and e.g. a filtered domain requested,
there's no data on readfds, only on writefds.

this caused the response from the connection to hang until the timeout was
hit. in the past in such scenario always a "no entity" response
was produced in tinyproxy logs.
2020-09-08 14:45:24 +01:00
rofl0r 58cfaf2659 make acl lookup 450x faster by using sblist
tested with 32K acl rules, generated by

    for x in `seq 128` ; do for y in `seq 255` ; do \
    echo "Deny 10.$x.$y.0/24" ; done ; done

after loading the config (which is dogslow too), tinyproxy
required 9.5 seconds for the acl check on every request.
after switching the list implementation to sblist, a request
with the full acl check now takes only 0.025 seconds.
the time spent for loading the config file is identical for both
list implementations, roughly 30 seconds.

(in a previous test, 65K acl rules were generated, but every
connection required almost 2 minutes to crunch through the list...)
2020-09-07 22:09:35 +01:00
rofl0r ebc7f15ec7 acl: typedef access_list to acl_list_t
this allows to switch the underlying implementation easily.
2020-09-07 21:53:14 +01:00
rofl0r efa5892011 check_acl: do full_inet_pton() only once per ip
if there's a long list of acl's, doing full_inet_pton() over
and over with the same IP isn't really efficient.
2020-09-07 20:57:16 +01:00
rofl0r 88153e944f get_request_entity: respect user-set timeout
get_request_entity() is only called on error, for example if a client
doesn't pass a check_acl() check. in such a case it's possible that
the client fd isn't yet ready to read from.
using select() with a timeout timeval of {0,0} causes it to return
immediately and return 0 if there's no data ready to be read.
this resulted in immediate connection termination rather than returning
the 403 access denied error page to the client and a confusing
"no entity" message displayed in the proxy log.
2020-09-07 20:49:07 +01:00
rofl0r f720244baa README.md: describe how transparent proxying can be used
addressing #45
2020-09-07 18:08:57 +01:00
rofl0r 487a062fcc change loglevel of start/stop/reload messages to NOTICE
this allows to see them when the verbose INFO loglevel is not desired.

closes #78
2020-09-07 16:59:37 +01:00
rofl0r 23b0c84653 upstream: fix ip/mask calculation for types other than none
the code wrongly processed the site_spec (here: domain) parameter
only when PT_TYPE == PT_NONE.
re-arranged code to process it correctly whenever passed.
additionally the mask is now also applied to the passed subnet/ip,
so a site_spec like 127.0.0.1/8 is converted into 127.0.0.0/8.
also the case where inet_aton fails now produces a proper error
message.

note that the code still doesn't process ipv6 addresses and mask.
to support it, we should use the existing code in acl.c and refactor
it so it can be used from both call sites.

closes #83
closes #165
2020-09-07 16:11:51 +01:00
Brett Randall 559faf7957 website stylesheet: added pre margin-bottom: 20px.
this improves rendering of literal code paragraphs.
2020-09-07 12:34:35 +01:00
rofl0r a8848d4bd8 html-error: substitute template variables via a regex
previously, in order to detect and insert {variables} into error/stats
templates, tinyproxy iterated char-by-char over the input file, and would
try to parse anything inside {} pairs and treat it like a variable name.
this breaks CSS, and additionally it's dog slow as tinyproxy wrote every
single character to the client via a write syscall.
now we process line-by-line, and inspect all matches of the regex
\{[a-z]{1,32}\}. if the contents of the regex are a known variable name,
substitution is taking place. if not, the contents are passed as-is to
the client. also the chunks before and after matches are written in
a single syscall.

closes #108
2020-09-07 04:32:13 +01:00
[anp/hsw] 17ae1b512c Do not give error while storing invalid header 2020-09-07 01:12:50 +01:00
rofl0r d0fae11760 config parser: increase possible line length limit
let's use POSIX LINE_MAX (usually 4KB) instead of 1KB.

closes #226
2020-09-07 01:07:00 +01:00
rofl0r 7c37a61e00 manpages: update copyright years 2020-09-06 23:16:29 +01:00
rofl0r 65e79b84a4 update documentation about signals 2020-09-06 23:15:41 +01:00
rofl0r 8c86e8b3ae allow SIGUSR1 to be used as an alternative to SIGHUP
this allows a tinyproxy session in terminal foreground mode to reload
its configuration without dropping active connections.
2020-09-06 23:11:22 +01:00
rofl0r 95b1a8ea06 main.c: remove set_signal_handler code duplication 2020-09-06 23:08:10 +01:00
rofl0r 8ba0ac4e86 do not catch SIGHUP in foreground-mode
it's quite unexpected for an application running foreground in a
terminal to keep running when the terminal is closed.
also in such a case (if file logging is disabled) there's no way to
see what's happening to the proxy.
2020-09-06 22:46:26 +01:00
rofl0r 3da66364de configure.ac: fail if version script returns empty string 2020-09-06 20:32:52 +01:00
rofl0r 0d71223a1d send_html_file(): also set empty variables to "(unknown)" 2020-09-06 20:06:59 +01:00
rofl0r f1a6d063b0 version.sh: fix empty result when git describe fails
fixes an error in travis, which makes a shallow clone of 50 commits.
if the last tag is older than 50 commits, we get:
"fatal: No names found, cannot describe anything."

this caused a premature exit due to an assert error in safe_write()
on this line: assert (count > 0);

because the version variable in tinyproxy was empty.
2020-09-06 20:04:01 +01:00
rofl0r 0d26fab317 run_tests.sh: print more diagnostic if killing tp fails 2020-09-06 17:48:14 +01:00
rofl0r 55208eb2f6 run_tests.sh: print pid if killing tp fails 2020-09-06 17:20:06 +01:00
rofl0r ab27e4c68b configure.ac: check for all "desired" CFLAGS at once
in case they're all accepted, which would be the case with any
halfways recent GCC, we save a lot of time over testing each flag
sequentially.
2020-09-06 16:58:28 +01:00
rofl0r f20681e0c6 configure.ac: remove unused checks for malloc/realloc 2020-09-06 16:40:52 +01:00
rofl0r 8685d23225 configure.ac: remove check for strdup()
it was being used unconditionally anyway.
2020-09-06 16:32:37 +01:00
rofl0r 36c9b93cfe transparent: remove usage of inet_ntoa(), make IPv6 ready
inet_ntoa() uses a static buffer and is therefore not threadsafe.
additionally it has been deprecated by POSIX.

by using inet_ntop() instead the code has been made ipv6 aware.

note that this codepath was only entered in the unlikely event that
no hosts header was being passed to the proxy, i.e. pre-HTTP/1.1.
2020-09-06 16:22:11 +01:00
rofl0r 51b8be3ee4 add tinyproxy website template to docs/web
this allows to automatically generate the website from the current
tinyproxy.conf.5 template.

    make
    cd docs/web
    make
2020-09-06 13:45:40 +01:00
Brett Randall 5e594e593a Added BasicAuth to tinyproxy.conf man page. 2020-09-06 12:25:46 +01:00
rofl0r 233ce6de3b filter: reduce memory usage, fix OOM crashes
* check return values of memory allocation and abort gracefully
  in out-of-memory situations

* use sblist (linear dynamic array) instead of linked list
  - this removes one pointer per filter rule
  - removes need to manually allocate/free every single list item
    (instead block allocation is used)
  - simplifies code

* remove storage of (unused) input rule
  - removes one char* pointer per filter rule
  - removes storage of the raw bytes of each filter rule

* add line number to display on out-of-memory/invalid regex situation

* replace duplicate filter_domain()/filter_host() code with a single
  function filter_run()
  - reduces code size and management effort

with these improvements, >1 million regex rules can be loaded with
4 GB of RAM, whereas previously it crashed with about 950K.

the list for testing was assembled from
http://www.shallalist.de/Downloads/shallalist.tar.gz

closes #20
2020-09-05 19:42:34 +01:00
rofl0r c4dc3ba007 filter: fix documentation about rules
the file docs/filter-howto.txt was removed, as it contained misleading
information since it was first checked in.

it suggests the syntax for filter rules is fnmatch()-like, when in
fact they need to be specified as posix regular expressions.

additionally it contained a lot of utterly unrelated and irrelevant/
outdated text.

a few examples with the correct syntax have now been added to
tinyproxy.conf.5 manpage.

closes #212
2020-09-05 17:33:53 +01:00
Nicolai Søborg 281488a729 Change loglevel for "Maximum number of connections reached"
I was hit by this, and did not see anything in the log, connections was just hanging.
Think warning is a better log level
2020-09-01 15:07:03 +01:00
rofl0r f825bea4c1 travis: asciidoc is no longer needed 2020-08-20 14:32:16 +01:00