Bug 53357 - Migrate univention-squid to Python 3
Migrate univention-squid to Python 3
Status: CLOSED FIXED
Product: UCS
Classification: Unclassified
Component: Squid
UCS 5.0
Other Linux
: P5 normal (vote)
: UCS 5.0-1-errata
Assigned To: Arne Bernin
Florian Best
https://git.knut.univention.de/univen...
: python3-migration
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2021-05-31 10:56 CEST by Florian Best
Modified: 2022-03-09 13:43 CET (History)
1 user (show)

See Also:
What kind of report is it?: Development Internal
What type of bug is this?: ---
Who will be affected by this bug?: ---
How will those affected feel about the bug?: ---
User Pain:
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Florian Best univentionstaff 2021-05-31 10:56:53 CEST
univention-squid should be migrated to be Python 3 compatible.

> services/univention-squid/basic_ldap_auth_encoding_wrapper
> services/univention-squid/squid_ldap_ntlm_auth
Comment 2 Florian Best univentionstaff 2021-10-29 17:06:22 CEST
Some context, we have to understand about the script `services/univention-squid/basic_ldap_auth_encoding_wrapper`:

This is a helper/wrapper around the original /usr/lib/squid/basic_ldap_auth (https://github.com/squid-cache/squid/blob/SQUID_4_6/src/auth/basic/LDAP/basic_ldap_auth.cc).

Helpers are (a little bit) described in https://wiki.squid-cache.org/Features/AddonHelpers#Basic_Scheme.

That says "All messages from squid are URL-escaped (the  rfc1738_unescape  from rfc1738.h can be used to decode them"
→ for key-value pairs but this also seems to be the case for the username/password:
https://github.com/squid-cache/squid/blob/SQUID_4_6/src/auth/basic/UserRequest.cc#L129-L147

> 140        xstrncpy(usern, rfc1738_escape(user()->username()), sizeof(usern));
> 141        xstrncpy(pass, rfc1738_escape(basic_auth->passwd), sizeof(pass));
> …
> 147        sz = snprintf(buf, sizeof(buf), "%s %s\n", usern, pass);

And rfc1738_escape is `#define rfc1738_escape(x)  rfc1738_do_escape(x, RFC1738_ESCAPE_UNSAFE|RFC1738_ESCAPE_CTRLS)`:
https://github.com/squid-cache/squid/blob/SQUID_4_6/lib/rfc1738.c#L110-L112

> 103        if ((flags & RFC1738_ESCAPE_CTRLS) && do_escape == 0) {
> …
> 110            /* RFC 1738 says any non-US-ASCII are encoded */
> 111            else if (((unsigned char) *src >= (unsigned char) 0x80))
> 112                do_escape = 1;
→ It is ensured, that the resulting string is always `ASCII` only but percent encoded!

The helper has 3 result responses:
OK | Success. Valid credentials.
ERR | Success. Invalid credentials.
BH | Failure. The helper encountered a problem.

See also Bug #51817 comment 4 why we currently have "BH" instead of "ERR".

What the script seems to achieve is the combination of both `auth_param basic utf8` `on` and `off` - because IE uses `ISO8859-1` and Firefox/Chrome uses `UTF-8`.

The latest HTTP basic authentication standards say about the encoding:

https://datatracker.ietf.org/doc/html/rfc7617#section-2
> The original definition of this authentication scheme failed to
> specify the character encoding scheme used to convert the user-pass
> into an octet sequence.  In practice, most implementations chose
> either a locale-specific encoding such as ISO-8859-1 ([ISO-8859-1]),
> or UTF-8 ([RFC3629]).
And:

https://datatracker.ietf.org/doc/html/rfc7617#section-3
> User-ids or passwords containing characters outside the US-ASCII
> character repertoire will cause interoperability issues, unless both
> communication partners agree on what character encoding scheme is to
> be used.
Comment 3 Florian Best univentionstaff 2021-10-29 17:38:04 CEST
Some notes on the Python 2 vs 3 behavior:

* in UCS we don't have ASCII as standard encoding for every string but UTF-8 (See Bug #49009)
* so everything passed to a file/socket/process which is of type unicode is encoding implicitly via UTF-8
* In Python2 there is `str`/`bytes`.encode(…) and `unicode`.decode(…) - which totally does not make sense. In Python 3 they have been removed.
* When this is used, python makes something like bytes.decode(*sys.get_default_encoding()).encode(…) / unicode.encode(sys.get_default_encoding()).decode(…)
* Our squid helper script uses this broken mechanism:
`encoded = raw_unquoted.encode('iso-8859-1')`
* Since Python 3 urllib.parse.unquote() does more than Python 2. In Python 2 it just did a bytes to bytes conversion - leaving the decoding up to you. In Python 3 it decodes the values by default with UTF-8 and strictness='replace' - which just replaces every invalid non-UTF-8 aka ISO8859-1 thing.

We are also not re-urlencoding the content we pass to the original script, which we should do!
Comment 4 Florian Best univentionstaff 2021-10-29 17:39:44 CEST
The test cases cover the functionality:
test/ucs-test/tests/43_proxy/07_basic_auth_encoding
(test/ucs-test/tests/43_proxy/00_http_proxy_basic_ntlm_auth_check)
Comment 5 Florian Best univentionstaff 2022-03-04 17:43:27 CET
I merged the changes of Arne:

univention-squid.yaml
4da94e7b1c91 | Bug #53357: migrate univention-squid to Python 3

univention-squid (13.0.3-5)
0e835f38621f | Bug #53357: fix shell quoting of command line arguments
4da94e7b1c91 | Bug #53357: migrate univention-squid to Python 3
Comment 6 Florian Best univentionstaff 2022-03-07 15:18:07 CET
OK: changes
OK: YAML
OK: ucs-test (well, they don't test all combinations of UCR variables :-( )