Bug 37716 - UMC-Webserver: decodes filenames of uploads in latin-1
UMC-Webserver: decodes filenames of uploads in latin-1
Status: RESOLVED DUPLICATE of bug 43633
Product: UCS
Classification: Unclassified
Component: UMC (Generic)
UCS 5.0
Other Linux
: P5 normal (vote)
: UCS 4.0-x
Assigned To: UMC maintainers
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2015-02-06 13:13 CET by Florian Best
Modified: 2023-06-20 20:56 CEST (History)
1 user (show)

See Also:
What kind of report is it?: ---
What type of bug is this?: ---
Who will be affected by this bug?: ---
How will those affected feel about the bug?: ---
User Pain:
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:
best: Patch_Available+


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Florian Best univentionstaff 2015-02-06 13:13:14 CET
Uploading a file to the umc-webserver will result that the filename is given in a latin-1 decoded unicode object to the UMC module processes.
Instead of decoding b'\xc3\x84nderung.pdf' in UTF-8 to 'Änderung.pdf' it decodes it in latin-1 which will result in u'\xc3\x84nderung.PDF'.
In UCS we overwrote the python-default encoding from ascii to UTF-8.
If we give the unicode object now to some filesystem calls e.g. shutil.move() or open() python encodes it using UTF-8 which will result in '\xc3\x83\xc2\x84nderung.PDF'. → If we e.g. save the object now on the harddisk we have corrupt latin-1 files on a UTF-8 filesystem.
Comment 1 Florian Best univentionstaff 2015-02-06 13:14:35 CET
This then causes bugs like Bug #36846.
Comment 2 Florian Best univentionstaff 2015-02-06 13:22:37 CET
Fix would be either to enable the cherrypy plugin which guesses the correct encoding by specifying this in the config:
tools.decode.on=True
tools.decode.encoding="UTF-8"

Alternatively we can add it by hand:
                try:
                    filename = filename.encode('latin1').decode('UTF-8')
                except UnicodeDecodeError:
                    pass
                self._log('info', 'Converting filename from=%r to=%r' % (store.filename, filename))
Comment 3 Florian Best univentionstaff 2015-02-16 17:54:56 CET
(In reply to Florian Best from comment #2)
> Fix would be either to enable the cherrypy plugin which guesses the correct
> encoding by specifying this in the config:
> tools.decode.on=True
> tools.decode.encoding="UTF-8"
Cherrypy does this (correct RFC 7230 behavior):
                k, v = line.split(ntob(":"), 1)
                k = k.strip().decode('ISO-8859-1')
                v = v.strip().decode('ISO-8859-1')
The fact is just that browser implementations are broken, they send UTF-8 while they are forbidden to do so (correct would be to send either latin-1 or use quoted printable encoding).

The manual patch from comment #2 works perfect.
Comment 5 Florian Best univentionstaff 2023-06-20 20:56:50 CEST

*** This bug has been marked as a duplicate of bug 43633 ***