'/?&#' -> '/?#' and '?&#' -> '?#'
According to https://www.ietf.org/rfc/rfc3986.txt, URLs are
"organized hierarchically" by using "the slash ("/"), question
mark ("?"), and number sign ("#") characters to delimit components"
An URL alone isn't good enough to distinguish between a gallery or a
gallery-listing, so the new extractor decides what to do based on the
page's content.
Instead of getting a complete 'filename' from an URL and splitting that
into 'name' and 'extension', the new approach gets rid of the complete
version and renames 'name' to 'filename'. (Using anything other than
{extension} for a filename extension doesn't really work anyway)
Example: "https://example.org/path/filename.ext"
before:
- filename : filename.ext
- name : filename
- extension: ext
now:
- filename : filename
- extension: ext
Child extractors are now directly constructed with Extractor.from_url()
if the extractor class is known beforehand, instead of using
extractor.find() and searching through all possible extractor classes.