Commit Graph

232 Commits (master)

Author SHA1 Message Date
rofl0r 7d1e86ccae don't try to send 408 error to closed client socket
read_request_line() is exercised on the client's fd, and it fails
when the client closed the connection. therefore it's wrong
to send an error message to the client in this situation.
additionally, the error message states that the server closed
the connection.

might fix #383
2022-05-02 14:50:42 +00:00
rofl0r 235b1c10a7 implement filtertype keyword and fnmatch-based filtering
as suggested in #212, it seems the majority of people don't understand
that input was expected to be in regex format and people were using
filter lists containing plain hostnames, e.g. `www.google.com`.

apart from that, using fnmatch() for matching is actually a lot less
computationally expensive and allows to use big blacklists without
incurring a huge performance hit.

the config file now understands a new option `FilterType` which can
be one of `bre`, `ere` and `fnmatch`.
The `FilterExtended` option was deprecated in favor of it.
It still works, but will be removed in the release after the next.
2022-05-02 13:13:40 +00:00
Malte S. Stretz 1576ee279f Return 5xx when upstream is unreachable
Currently a 404 is returned for a misconfigured or unavailable upstream
server.  Since that's a server error it should be a 5xx instead; a 404
is confusing when used as a forward proxy and might even be harmful when
used as a reverse proxy.

It is debatable if another 5xx code might be better; the misconfigured
situation might better be a 500 whereas the connection issue could be
a 503 instead (as used eg. in haproxy).
2022-02-13 21:46:03 +00:00
rofl0r eced6822f8 properly deal with client sending chunked data
this fixes OPTIONS requests sent from apache SVN client using their
native HTTP proxy support.

closes #421

tested with `svn info http://svnmir.bme.freebsd.org/ports/`
2022-02-13 21:11:37 +00:00
rofl0r 79d0b0fa79 fix timeout not being applied to outgoing connections
the fix in 0b9a74c290 was incomplete, as it
applied the socket timeout only to the socket received from accept(), but
not to sockets created for outgoing connections.
2022-01-20 20:25:42 +00:00
rofl0r 39d7bf6c70 improve error message for "Error reading readable client_fd"
maybe this helps to track down the cause of #383.
2021-07-23 20:17:18 +01:00
rofl0r 563978a3ea socks4 upstream: add safety check for hostname length 2021-06-25 02:55:22 +01:00
rofl0r 7ea9f80d3f fix segfault in socks4 upstream with unresolvable hostname
using a socks4 tor upstream with an .onion url resulted in
gethostbyname() returning NULL and a subsequent segfault.
not only did the code not check the return value of gethostbyname(),
that resolver API itself isn't threadsafe.

as pure SOCKS4 supports only IPv4 addresses, and the main SOCKS4
user to this date is tor, we just use SOCKS4a unconditionally and
pass the hostname to the proxy without trying to do any local name
resolving.

i suspect in 2021 almost all SOCKS4 proxy servers in existence use
SOCKS4a extension, but should i be wrong on this, i prefer issue
reports to show up and implement plain SOCKS4 fallback only when
i see it is actually used in practice.
2021-06-25 02:43:00 +01:00
rofl0r a869e71ac3 add support for outgoing connections with HTTP/1.1
since there are numerous changes in HTTP/1.1, the proxyserver will
stick to using HTTP/1.0 for internal usage, however when a connection
is requested with HTTP/1.x from now on we will duplicate the minor revision
the client requested, because apparently some servers refuse to accept
HTTP/1.0

addresses #152.
2021-04-16 14:51:01 +01:00
rofl0r 2529597ea0 reverse: redirect if path without trailing slash is detected
if for example:

ReversePath = "/foo/"

and user requests "http://tinyproxy/foo" the common behaviour for HTTP
servers is to send a http 301 redirect to the correct url.
we now do the same.
2021-04-16 14:41:40 +01:00
rofl0r db5c0e99b4 reqs: fix UB passing ssize_t to format string expecting int 2020-10-19 20:30:10 +01:00
rofl0r da1bc1425d tune error messages to show select or poll depending on what is used 2020-09-17 21:03:51 +01:00
rofl0r 683a354196 remove vector remains 2020-09-16 02:39:09 +01:00
rofl0r e929e81a55 add_header: use sblist
note that the old code inserted added headers at the beginning of the
list, reasoning unknown. this seems counter-intuitive as the headers
would end up in the request in the reverse order they were added,
but this was irrelevant, as the headers were originally first put
into the hashmap hashofheaders before sending it to the client.
since the hashmap didn't preserve ordering, the headers would appear
in random order anyway.
2020-09-16 02:39:09 +01:00
rofl0r 10cdee3bc5 prepare transition to poll()
usage of select() is inefficient (because a huge fd_set array has to
be initialized on each call) and insecure (because an fd >= FD_SETSIZE
will cause out-of-bounds accesses using the FD_*SET macros, and a system
can be set up to allow more than that number of fds using ulimit).
for the moment we prepared a poll-like wrapper that still runs select()
to test for regressions, and so we have fallback code for systems without
poll().
2020-09-15 23:12:00 +01:00
rofl0r 0c8275a90e refactor conns.[ch], put conn_s into child struct
this allows to access the conn member from the main thread handling
the childs, plus simplifies the code.
2020-09-15 23:12:00 +01:00
rofl0r 155bfbbe87 replace leftover users of hashmap with htab
also fixes a bug where the ErrorFile directive would create a
new hashmap on every added item, effectively allowing only
the use of the last specified errornumber, and producing memory
leaks on each config reload.
2020-09-15 23:12:00 +01:00
rofl0r 34a8b28414 save headers in an ordered dictionary
due to the usage of a hashmap to store headers, when relaying them
to the other side the order was not prevented.
even though correct from a standards point-of-view, this caused
issues with various programs, and it allows to fingerprint the use
of tinyproxy.

to implement this, i imported the MIT-licensed hsearch.[ch] from
https://github.com/rofl0r/htab which was originally taken from
musl libc. it's a simple and efficient hashtable implementation
with far better performance characteristic than the one previously
used by tinyproxy. additionally it has an API much more well-suited
for this purpose.

orderedmap.[ch] was implemented from scratch to address this issue.
behind the scenes it uses an sblist to store string values, and a htab
to store keys and the indices into the sblist.
this allows us to iterate linearly over the sblist and then find the
corresponding key in the hash table, so the headers can be reproduced
in the order they were received.

closes #73
2020-09-15 23:11:59 +01:00
rofl0r c64ac9edbe fix get_request_entity()
get_request_entity()'s purpose is to drain remaining unread bytes
in the request read pipe before handing out an error page,
and kinda surprisingly, also when connection to the stathost is
done.

in the stathost case tinyproxy just skipped proper processing and
jumped to the error handler code, and remembering whether a
connection to the stathost was desired in a variable, then doing
things a bit differently depending on whether it's set.

i tried to fix issues with get_request_entity in
88153e944f (which is basically the
right fix for the issue it tried to solve, but incomplete),
and resulting from there in 78cc5b72b1.
the latter fix wasn't quite right since we're not supposed to check
whether the socket is ready for writing, and having a return value
of 2 instead of 1 got resulted in some of the if statements not
kicking in when they should have.
this also resulted in the stathost page no longer working.

after in-depth study of the issue i realized that we only need to
call get_request_entity() when the headers aren't completely read,
additional to setting the proper connection timeout as
88153e944f already implemented.
the changes of 78cc5b72b1 have been
reverted.
2020-09-13 00:37:19 +01:00
rofl0r 9e40f8311f handle_connection(): print process_*_headers errno information 2020-09-10 21:13:31 +01:00
rofl0r f1bd259e6e handle_connection: replace "goto fail" with func call
this allows to see in a backtrace from where the error was
triggered.
2020-09-10 14:48:39 +01:00
rofl0r e94cbdb3a5 handle_connection(): factor out failure code
this allows us in a next step to replace goto fail with a call to that
function, so we can see in a backtrace from where the failure was
triggered.
2020-09-10 14:37:56 +01:00
rofl0r b549ba5af3 remove bogus custom timeout handling code
in networking, hitting a timeout requires that *nothing* happens during the
interval. whenever anything happens, the timeout is reset.
there's no need to do custom time calculations, it's perfectly fine to let
the kernel handle it using the select() syscall.

additionally the code added in 0b9a74c290
assures that read and write syscalls() don't block indefinitely and return
on the timeout too, so there's no need to switch sockets back and forth
between blocking/nonblocking.
2020-09-09 12:37:23 +01:00
rofl0r b4e3f1a896 fix negative timeout resulting in select() EINVAL 2020-09-09 11:59:40 +01:00
rofl0r 78cc5b72b1 get_request_entity: fix regression w/ CONNECT method
introduced in 88153e944f.
when connect method is used (HTTPS), and e.g. a filtered domain requested,
there's no data on readfds, only on writefds.

this caused the response from the connection to hang until the timeout was
hit. in the past in such scenario always a "no entity" response
was produced in tinyproxy logs.
2020-09-08 14:45:24 +01:00
rofl0r 88153e944f get_request_entity: respect user-set timeout
get_request_entity() is only called on error, for example if a client
doesn't pass a check_acl() check. in such a case it's possible that
the client fd isn't yet ready to read from.
using select() with a timeout timeval of {0,0} causes it to return
immediately and return 0 if there's no data ready to be read.
this resulted in immediate connection termination rather than returning
the 403 access denied error page to the client and a confusing
"no entity" message displayed in the proxy log.
2020-09-07 20:49:07 +01:00
[anp/hsw] 17ae1b512c Do not give error while storing invalid header 2020-09-07 01:12:50 +01:00
rofl0r 233ce6de3b filter: reduce memory usage, fix OOM crashes
* check return values of memory allocation and abort gracefully
  in out-of-memory situations

* use sblist (linear dynamic array) instead of linked list
  - this removes one pointer per filter rule
  - removes need to manually allocate/free every single list item
    (instead block allocation is used)
  - simplifies code

* remove storage of (unused) input rule
  - removes one char* pointer per filter rule
  - removes storage of the raw bytes of each filter rule

* add line number to display on out-of-memory/invalid regex situation

* replace duplicate filter_domain()/filter_host() code with a single
  function filter_run()
  - reduces code size and management effort

with these improvements, >1 million regex rules can be loaded with
4 GB of RAM, whereas previously it crashed with about 950K.

the list for testing was assembled from
http://www.shallalist.de/Downloads/shallalist.tar.gz

closes #20
2020-09-05 19:42:34 +01:00
rofl0r 0b9a74c290 enforce socket timeout on new sockets via setsockopt()
the timeout option set by the config file wasn't respected at all
so it could happen that connections became stale and were never released,
which eventually caused tinyproxy to hit the limit of open connections and
never accepting new ones.

addresses #274
2020-07-15 09:59:25 +01:00
rofl0r 3230ce0bc2 anonymous: fix segfault loading config item
unlike other functions called from the config parser code,
anonymous_insert() accesses the global config variable rather than
passing it as an argument. however the global variable is only set
after successful loading of the entire config.

we fix this by adding a conf argument to each anonymous_* function,
passing the global pointer in calls done from outside the config
parser.

fixes #292
2020-03-16 13:19:39 +00:00
rofl0r c63d5d26b4 access config via a pointer, not a hardcoded struct address
this is required so we can elegantly swap out an old config for a
new one in the future and remove lots of boilerplate from config
initialization code.

unfortunately this is a quite intrusive change as the config struct
was accessed in numerous places, but frankly it should have been
done via a pointer right from the start.

right now, we simply point to a static struct in main.c, so there
shouldn't be any noticeable changes in behaviour.
2020-01-15 16:09:41 +00:00
rofl0r cd005a94ce implement detection and denial of endless connection loops
it is quite easy to bring down a proxy server by forcing it to make
connections to one of its own ports, because this will result in an endless
loop spawning more and more connections, until all available fds are exhausted.
since there's a potentially infinite number of potential DNS/ip addresses
resolving to the proxy, it is impossible to detect an endless loop by simply
looking at the destination ip address and port.

what *is* possible though is to record the ip/port tuples assigned to outgoing
connections, and then compare them against new incoming connections. if they
match, the sender was the proxy itself and therefore needs to reject that
connection.

fixes #199.
2019-12-21 00:43:45 +00:00
rofl0r f6d4da5d81 do hostname resolution only when it is absolutely necessary for ACL check
tinyproxy used to do a full hostname resolution whenever a new client
connection happened, which could cause very long delays (as reported in #198).

there's only a single place/scenario that actually requires a hostname, and
that is when an Allow/Deny rule exists for a hostname or domain, rather than
a raw IP address. since it is very likely this feature is not very widely used,
it makes absolute sense to only do the costly resolution when it is unavoidable.
2019-12-21 00:43:45 +00:00
rofl0r 734ba1d970 fix usage of stathost in combination with basic auth
http protocol requires different treatment of proxy auth vs server auth.

fixes #246
2019-06-14 01:18:19 +01:00
rofl0r c651664720 fix socks5 upstream user/pass subnegotiation check
RFC 1929 specifies that the user/pass auth subnegotation repurposes the version
field for the version of that specification, which is 1, not 5.
however there's quite a good deal of software out there which got it wrong and
replies with version 5 to a successful authentication, so let's just accept both
forms - other socks5 client programs like curl do the same.

closes #172
2018-05-29 21:59:11 +02:00
rofl0r b8c6a2127d implement user/password auth for socks5 upstream proxy
just like the rest of the socks code, this was stolen from
proxychains-ng, of which i'm happen to be the maintainer of,
so it's not an issue (the licenses are identical, too).
2018-02-27 20:13:07 +00:00
rofl0r 39132b9787 rename members of proxy_type enum to have a common prefix
and add a NONE member.
2018-02-25 23:52:23 +00:00
rofl0r bf76aeeba1 implement HTTP basic auth for upstream proxies
loosely based on @valenbg1's code from PR #38

closes #38
closes #96
2018-02-25 15:13:45 +00:00
rofl0r bd04ed00d8 Basic Auth: send correct response codes and headers acc. to rfc7235
as reported by @natedogith1
2018-02-06 16:57:02 +00:00
rofl0r 8db511b9bf add support for basic HTTP authentication
using the "BasicAuth" keyword in tinyproxy.conf.

base64 code was written by myself and taken from my own library "libulz".
for this purpose it is relicensed under the usual terms of the tinyproxy
license.
2018-02-06 16:57:02 +00:00
rofl0r 7a3fd81a8d fix types used in SOCKS4/5 support code
the line

    len = buff[0]; /* max = 255 */

could lead to a negative length if the value in buff[0] is > 127.
2018-02-06 16:11:39 +00:00
Gonzalo Tornaria 8906b0734e add SOCKS upstream proxy support (socks4/socks5)
original patch submitted in 2006 to debian mailing list:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=392848%29#12

this version was rebased to git and updated by Russ Dill <russ.dill@gmail.com>
in 2015 (the original patch used a different config file format).

as discussed in #40.

commit message by @rofl0r.
2018-02-06 16:11:39 +00:00
Stephan Leemburg c5da1cc934 Continue with forward proxy if ReverseOnly is not true and no mapping available (#35)
allow non-reverse mappings if reverseonly is not enabled
2016-09-10 19:22:45 +02:00
Michael Adam 800c3a250c BB#110 Increase number of hash buckets from 32 to 256.
This should make hash processing generally faster.

There is a treadeoff between memory footprint and
speed of processing. 10 KB instead of 1.2 KB of
hash table per process should not be a huge problem
even on very limited current systems.

Who really needs to stick to 32 buckets could
recompile. We could also think about making
this configurable at some point.

Signed-off-by: Michael Adam <obnox@samba.org>
2014-12-13 01:41:56 +01:00
Michael Adam 545463c75d BB#110 limit the number of headers per request to prevent DoS
Based on patch provided by gpernot@praksys.org on bugzilla.

Signed-off-by: Michael Adam <obnox@samba.org>
2014-12-13 01:28:07 +01:00
Michael Adam 76bd008cf9 reqs: fix typo in a debug message in get_request_entity()
Signed-off-by: Michael Adam <obnox@samba.org>
2013-11-23 11:59:47 +01:00
Michael Adam 3710accf72 reqs: Fix CID 1130969 (part 3) - unchecked return value from library.
Check the return value of socket_blocking (fcntl) at the
end of relay_connection() for client socket.

Signed-off-by: Michael Adam <obnox@samba.org>
2013-11-22 21:56:39 +01:00
Michael Adam e07c363df2 reqs: Fix CID 1130969 (part 2) - unchecked return value from library.
Check the return value of socket_blocking (fcntl) at the
end of relay_connection().

Signed-off-by: Michael Adam <obnox@samba.org>
2013-11-22 21:44:12 +01:00
Michael Adam c82840bfcb reqs: Fix CID 1130972 - remove logically dead code.
url == NULL is caught above.

Found by coverity.

Signed-off-by: Michael Adam <obnox@samba.org>
2013-11-22 18:58:19 +01:00
Michael Adam 0a99803425 reqs: Fix CID 1130967 - unchecked return value from library.
Check the return code of fcntl via socket_blocking
in pull_client_data().

Found by coverity.

Signed-off-by: Michael Adam <obnox@samba.org>
2013-11-22 18:49:45 +01:00