Univention Bugzilla – Bug 37716
UMC-Webserver: decodes filenames of uploads in latin-1
Last modified: 2023-06-20 20:56:50 CEST
Uploading a file to the umc-webserver will result that the filename is given in a latin-1 decoded unicode object to the UMC module processes. Instead of decoding b'\xc3\x84nderung.pdf' in UTF-8 to 'Änderung.pdf' it decodes it in latin-1 which will result in u'\xc3\x84nderung.PDF'. In UCS we overwrote the python-default encoding from ascii to UTF-8. If we give the unicode object now to some filesystem calls e.g. shutil.move() or open() python encodes it using UTF-8 which will result in '\xc3\x83\xc2\x84nderung.PDF'. → If we e.g. save the object now on the harddisk we have corrupt latin-1 files on a UTF-8 filesystem.
This then causes bugs like Bug #36846.
Fix would be either to enable the cherrypy plugin which guesses the correct encoding by specifying this in the config: tools.decode.on=True tools.decode.encoding="UTF-8" Alternatively we can add it by hand: try: filename = filename.encode('latin1').decode('UTF-8') except UnicodeDecodeError: pass self._log('info', 'Converting filename from=%r to=%r' % (store.filename, filename))
(In reply to Florian Best from comment #2) > Fix would be either to enable the cherrypy plugin which guesses the correct > encoding by specifying this in the config: > tools.decode.on=True > tools.decode.encoding="UTF-8" Cherrypy does this (correct RFC 7230 behavior): k, v = line.split(ntob(":"), 1) k = k.strip().decode('ISO-8859-1') v = v.strip().decode('ISO-8859-1') The fact is just that browser implementations are broken, they send UTF-8 while they are forbidden to do so (correct would be to send either latin-1 or use quoted printable encoding). The manual patch from comment #2 works perfect.
http://greenbytes.de/tech/tc2231/ http://greenbytes.de/tech/webdav/rfc5987.html
*** This bug has been marked as a duplicate of bug 43633 ***