These were intended to make it possible to access these settings
in the MT settings UI. However, apparently it's not possible to
have nil as a default, and MT throws warnings in this scenario.
Remove these settings until that's properly supported.
- Fix: don't trim off one extra sample (size instead of size+1) of
sample buffer.
- Restructure to fix scope proliferation of variables.
- Support publishing to JSON and MT log on custom intervals,
enable publishing to log by default.
There was a redundant expiration check that was hitting the auth
database rather intensely every 5 seconds ... reduce that to check
only one player name per globalstep to spread it out.
- The number of total steps is not very useful, and is redundant
with the average across steps when uptime is >= report_period
- Add "mean time between steps" statistic, which is averaged
across time over the reporting period instead of across the
count of steps. It more accurately represents the subjective
impact of rare large outliers on the overall experience.
- Rename "mean" to "avg" since it's shorter, and we now have two
statistics named "mean".
- It's now possible to toggle the histogram separately, so it can
show 3 effective modes: off, basic, and full.
- There's now a separate priv to give users access to only the
summary lines.
- User preferences are tracked independently of privilegs so that
they can be adjusted even when limited by privs, and modes can
be restored automatically when privs are changed.
- User preferences are applied immediately when changing modes,
rather than waiting for the next interval.
The first number should always be only the very first, lowest
sample, and the top the very last, largest. There are only 4
different "intervals" between the numbers.
This calculation is finally verified to get the top and bottom
numbers correct. It should logically follow that the inner
numbers are some form of correct, though whether they are biased
on where they land between samples is possible, but probably
irrelevant.
Before, we were accumulating time into "buckets" during globalstep
to limit the maximum amount of data we were handling, and make the
job of building the histogram a lot easier.
This unfortunately caused some issues with the new statistics like
mean (which can't see into the topmost bucket that captures
everything oversized) and quantiles (that are similarly capped, and
limited to bucket granularity)
Instead now, just do the simpler thing of capturing each globalstep
dtime as a distinct sample in an unordered list. Each reporting
interval, we shift the list to prune off old samples, and then
compute statistics the "hard way", including sorting to produce the
quantiles. This is a bit cheaper on the globalstep side but a bit
more expensive on the reporting side, but should still take only
about 1 millisecond or so on real-world systems.
The statistics produced, especially mean and quantiles, are much
more accurate, and a bit more precise.
Also add a warning in documentation about the summary statistics
being approximage, in case somebody tries to do the actual math
and finds it doesn't quite match up.
The documentation says something about "indexed by ID", and we
shouldn't assume that IDs will remain sequential/compact, so we
should technically count the keys rather than assuming that the
"array part length" indicator will always be correct.
If a player quits before agreeing to the terms, and gets added to
the purge queue, but then reauths immediately, they can get
re-purged before they finish emerging, causing an assert fail on
trying to record their last login time.
If a player re-registers, immediately remove them from the purge
retry queue, so their login time can safely be recorded.
Also, if the server shuts down with incomplete players in the
"lobby", then add them all to the purge queue too.
After we kick a player, then an async process of shutting down that
player's connection is initiated, and their player data is flushed
to the database AFTER that. If we remove the player data at the
same time as kicking, it's not actually removed, but the auth IS
removed, so we have a bunch of player data left behind that we
don't realize, and there's no mod-API way to enumerate them.
For now, just disallow this use case. The time between calling
/kick and /destroy_player should be sufficient to avoid the race
condition, at least, for now.
There are a number of mods in the pack that store metadata about
players "inside out", in mod_storage instead of player:get_meta,
because player meta is not accessible when the player is not online
and that's necessary for some mods. Player meta is also tied to
the "player character" and not to the auth account, so inside out
storage is necessary when we want data to survive resetting the
player character while keeping the account.
Unfortunately, this prevents MT from automatically cleaning up
the metadata when a player is destroyed, which can happen for
various reasons outside the control of the mod. This can cause
keys to accumulate in the mod storage database, which may hurt
performance over time.
To keep things tidy, periodically scan the mod storage keys for
each such mod (add a scan cycle, or integrate with an existing
one) and automatically clean up old mod storage keys.
szutil_xplevel was already doing this on startup and command;
convert it to do periodic scans like the other mods.
- Rearchitect so that we generate a list of lines as a separate
process from displaying them, so that we can add new things
beyond just the bucket graphs.
- Display cached lagometer to players as soon as they join instead
of having to wait for the 2 second "cold start" before seeing
anything.
- Add a status line to the bottom of the lagometer with live
display of things that would be in /status.
- Add the status line things to the JSON dump too.
When an MT server shuts down, it doesn't always kick the players
and announce their departure before the shutdown process runs,
so it's possible for players to be "connected" at the time the
server actually stops running mod code. This means that there is
no opportunity to announce that the server is now empty once it's
actually shut down.
Here we assume that in most cases, the server will be brought
back up immediately, and better late than never. If there were any
players online at the time the server was shut down, announce the
server status (i.e. that it's now empty) upon the next startup.
This should mitigate the issue where a a player joins the server,
then the server is shutdown silently, and then the same player
joins again, making chat logs nonsensical. At least this way
you will see that the player is no longer connected at some point
before they reconnect.
If the szutil_lagometer_publish_json setting is enabled, write a
JSON file to the world path with lagometer data every publish
cycle, so that external systems can access it.
Also fix formatting issues in the lagometer format string.
It seems like auth was being purged, but not player data.
I found 2 possible issues:
- The "cache" table was having keys removed while it was being
iterated, which could cause skipped pairs.
- I noticed a delay in player data being written to the database,
so I wonder if write-behind player data was being inserted
after the deletion was attempted.
The net effect was that player auth data was apparently being
purged reliably, but player character data was being left behind
and cluttering up the database.
To try to fix this, fix the table mutation/iteration conflict,
and schedule multiple deletion retries (for about 15 seconds) to
ensure we really get it.