Bug 37716 – UMC-Webserver: decodes filenames of uploads in latin-1

Bug 37716 - UMC-Webserver: decodes filenames of uploads in latin-1


Summary:	UMC-Webserver: decodes filenames of uploads in latin-1

Status:	RESOLVED DUPLICATE of bug 43633

Product:	UCS
Classification:	Unclassified
Component:	UMC (Generic)
Version:	UCS 5.0
Hardware:	Other Linux

Importance:	P5 normal (vote)
Target Milestone:	UCS 4.0-x
Assigned To:	UMC maintainers
QA Contact:

URL:
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2015-02-06 13:13 CET by Florian Best
Modified:	2023-06-20 20:56 CEST (History)
CC List:	1 user (show)

See Also:	36846
What kind of report is it?:	---
What type of bug is this?:	---
Who will be affected by this bug?:	---
How will those affected feel about the bug?:	---
User Pain:
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:

Flags:	best: Patch_Available+

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Florian Best

2015-02-06 13:13:14 CET

Uploading a file to the umc-webserver will result that the filename is given in a latin-1 decoded unicode object to the UMC module processes.
Instead of decoding b'\xc3\x84nderung.pdf' in UTF-8 to 'Änderung.pdf' it decodes it in latin-1 which will result in u'\xc3\x84nderung.PDF'.
In UCS we overwrote the python-default encoding from ascii to UTF-8.
If we give the unicode object now to some filesystem calls e.g. shutil.move() or open() python encodes it using UTF-8 which will result in '\xc3\x83\xc2\x84nderung.PDF'. → If we e.g. save the object now on the harddisk we have corrupt latin-1 files on a UTF-8 filesystem.

Comment 1 Florian Best

2015-02-06 13:14:35 CET

This then causes bugs like Bug #36846.

Comment 2 Florian Best

2015-02-06 13:22:37 CET

Fix would be either to enable the cherrypy plugin which guesses the correct encoding by specifying this in the config:
tools.decode.on=True
tools.decode.encoding="UTF-8"

Alternatively we can add it by hand:
                try:
                    filename = filename.encode('latin1').decode('UTF-8')
                except UnicodeDecodeError:
                    pass
                self._log('info', 'Converting filename from=%r to=%r' % (store.filename, filename))

Comment 3 Florian Best

2015-02-16 17:54:56 CET

(In reply to Florian Best from comment #2)
> Fix would be either to enable the cherrypy plugin which guesses the correct
> encoding by specifying this in the config:
> tools.decode.on=True
> tools.decode.encoding="UTF-8"
Cherrypy does this (correct RFC 7230 behavior):
                k, v = line.split(ntob(":"), 1)
                k = k.strip().decode('ISO-8859-1')
                v = v.strip().decode('ISO-8859-1')
The fact is just that browser implementations are broken, they send UTF-8 while they are forbidden to do so (correct would be to send either latin-1 or use quoted printable encoding).

The manual patch from comment #2 works perfect.

Comment 4 Florian Best

2015-02-16 17:59:38 CET

http://greenbytes.de/tech/tc2231/

http://greenbytes.de/tech/webdav/rfc5987.html

Comment 5 Florian Best

2023-06-20 20:56:50 CEST


*** This bug has been marked as a duplicate of bug 43633 ***