Univention Bugzilla – Bug 52878
Add turkish umlauts to list of umlauts that are converted to ASCII via <:umlauts>
Last modified: 2021-05-12 13:37:34 CEST
# univention-app info UCS: 4.4-7 errata907 Installed: ucsschool=4.4 v8 Upgradable: Scenario: I want to create users (via usertemplate or UCS@school import) and use the ":umlauts" function to replace umlauts in usernames. Expected behaviour: turkish umlauts (Ç,ç,Ğ,ğ,İ,ı,Ş,ş) are replaced with their ASCII look-alikes (C,c,G,g,I,i,S,s), just like french and scandinavian umlauts. Observed behaviour: turkish umlauts are stripped. Someone named "Pınar Ağrı" will end up with a username like "pnar.ar" instead of "pinar.agri".
Created attachment 10636 [details] patch (git:fbest/52878-unidecode-umlauts) We should change the implementation to use python3-unidecode: >>> import unidecode >>> unidecode.unidecode('Ç,ç,Ğ,ğ,İ,ı,Ş,ş,Pınar Ağrı') 'C,c,G,g,I,i,S,s,Pinar Agri'
Ah, Florian was faster than me and I like his approach better. For the record: in the meanwhile I created https://git.knut.univention.de/univention/ucs/commit/2f1570ae
The fix is implemented in [twenzel/4.4/52878_turkish_umlauts] 53ae6b4967 Bug #52878: fix umlauts in patterns In python 2 we need text.decode() to make this work. Thanks for the patch! # univention-app info UCS: 4.4-7 errata947 Installed: cups=2.2.1 samba4=4.10 squid=3.5 ucsschool=4.4 v9 /var/lib/ucs-school-import/configs/user_import.json ... "username": { "default": "<:umlauts><firstname>.<lastname><:lower>[COUNTER2]" } ... I got the following before the fix: Pınar Ağrı -> pinar.aara After the fix: Pınar Ağrı -> pinar.agri ÇçĞğİıŞş -> ccggiiss
Unfortunately, the unidecode-patch is not enough. ä would be converted to a and not to ae as before We tried to implement this with py-icu, but we didn't succeed: transliterator = icu.Transliterator.createInstance('de-ascii') should do the trick but for some reason this didn't work. To catch cases like 'Pınar Ağrı' and 'Jürgen Deveaux' we added the suggested patch -> comment 2 after the original string replacement. A solution with icu would be better, but this 'does the trick' as well. [twenzel/4.4/52878_turkish_umlauts] 7b27cab427 Bug #52878: merge old and new conversion methods New tests were added in ucs & ucsschool: [twenzel/4.4/52878_turkish_umlauts] 304a8abe4 Bug #52878: umlauts tests -> this commit will be dropped: [twenzel/4.4/52878_turkish_umlauts] 6c4797176 Bug #52878: tests for turkish and other umlauts, selectable test methods
The icu versions for python2 & python3 seem to be different. python2 ``` transliterator.transliterate('Ä Ö Ü ä ö ü ß ł ŧ ø þ æ ſ ð đ ŋ ħ ĸ Ł Ŧ Ø ...: Þ Æ ẞ Ð Ŋ Ħ') u'A O U a o u ss l t o th ae s d d n h q L T O TH AE SS D N H' ``` python3 ``` In [4]: transliterator = icu.Transliterator.createInstance('de-ascii;') In [5]: transliterator.transliterate('Ä Ö Ü ä ö ü ß ł ŧ ø þ æ ſ ð đ ŋ ħ ĸ Ł Ŧ Ø Þ Æ ẞ Ð Ŋ Ħ') Out[5]: 'AE OE UE ae oe ue ss l t o th ae s d d n h q L T O TH AE SS D N H' ``` → This is still not what's expected, we want Ä → Ae -------------- fixed encoding/ decoding [twenzel/4.4/52878_turkish_umlauts] b8256eeca3 fixup! Bug #52878: remove encoding [twenzel/4.4/52878_turkish_umlauts] ab22929716 Bug #52878: remove encoding
The package python-unidecode has been move to maintained in 4.4-8errata and will be released next Wednesday (05.05.2021).
The UDM package was built and an advisory created: [4.4-8] 6d8685f127 Bug #52878: fix umlauts in patterns [4.4-8] b6002480ee Bug #52878: changelogs and advisory [4.4-8] 064cdc757d Bug #52878: advisory update univention-directory-manager-modules (14.0.20-6)
A merge request for UCS 5.0-0 was created: https://git.knut.univention.de/univention/ucs/-/merge_requests/88
Extracted code of loop in [4.4-8] 6106bd360e Bug #52878: extract code from loop Package: ucs-test Version: 9.0.7-30A~4.4.0.202105040926 Branch: ucs_4.4-0 Scope: errata4.4-8
Jenkins: OK Changelog&Advisories: OK Manual Tests: OK Packages install: OK
REOPEN: The merge request doesn't reflect the latest changes REOPEN: The comments in the merge request aren't integrated.
(In reply to Florian Best from comment #13) > REOPEN: The merge request doesn't reflect the latest changes > REOPEN: The comments in the merge request aren't integrated. The work on the 5.0 merge has been split into bug 53224.
(In reply to Daniel Tröder from comment #14) > (In reply to Florian Best from comment #13) > > REOPEN: The merge request doesn't reflect the latest changes > > REOPEN: The comments in the merge request aren't integrated. > > The work on the 5.0 merge has been split into bug 53224. The comments belong also to the UCS 4.4 code.
Changes according to remarks have been added in [4.4-8] a616d35a73 Bug #52878: qa remarks [4.4-8] a92dcb0966 Bug #52878: advisory update Package: univention-directory-manager-modules Version: 14.0.20-7A~4.4.0.202105061110 Branch: ucs_4.4-0 Scope: errata4.4-8 Package: ucs-test Version: 9.0.7-31A~4.4.0.202105061112 Branch: ucs_4.4-0 Scope: errata4.4-8 merge request has been updated in [twenzel/5.0/52878_turkish_umlauts] ae31c8a30d Bug #52878: qa remarks [twenzel/5.0/52878_turkish_umlauts] fec7dfb059 Bug #52878: fix umlauts in patterns uas QA will check and verify
Due to this commit, all UCS 4.4-8 Tests failed: a616d35a73a40f8d9a05db9e6e3cee9ac90b10f5 Multiple join scripts could not be executed. [master090] 2021-05-06T23:57:03.074057 + univention-check-join-status [master090] 2021-05-06T23:57:03.407947 Warning: 'univention-ldap-server' is not configured. [master090] 2021-05-06T23:57:03.427876 Warning: 'univention-join' is not configured. [master090] 2021-05-06T23:57:03.536059 Warning: 'univention-saml' is not configured. [master090] 2021-05-06T23:57:03.538767 Warning: 'univention-management-console-web-server' is not configured. [master090] 2021-05-06T23:57:03.543950 Error: Not all install files configured: 4 missing In the join.log this traceback a bunch of times: Traceback (most recent call last): File "/usr/share/univention-directory-manager-tools/univention-cli-server", line 219, in doit output = univention.admincli.admin.doit(arglist) File "/usr/lib/python2.7/dist-packages/univention/admincli/admin.py", line 409, in doit out = _doit(arglist) File "/usr/lib/python2.7/dist-packages/univention/admincli/admin.py", line 755, in _doit dn = object.create() File "/usr/lib/python2.7/dist-packages/univention/admin/handlers/__init__.py", line 557, in create dn = self._create(response=response, serverctrls=serverctrls) File "/usr/lib/python2.7/dist-packages/univention/admin/handlers/__init__.py", line 1235, in _create self.set_default_values() File "/usr/lib/python2.7/dist-packages/univention/admin/handlers/__init__.py", line 1352, in set_default_values if p.default(self): File "/usr/lib/python2.7/dist-packages/univention/admin/__init__.py", line 312, in default return self._replace(base_default, object) File "/usr/lib/python2.7/dist-packages/univention/admin/__init__.py", line 301, in _replace return pattern_replace(copy.copy(res), object) File "/usr/lib/python2.7/dist-packages/univention/admin/__init__.py", line 189, in pattern_replace value = modify_text(value, global_commands) File "/usr/lib/python2.7/dist-packages/univention/admin/__init__.py", line 139, in modify_text text = unicodedata.normalize('NFKD', text).encode('ascii', 'ignore').decode('ascii') TypeError: normalize() argument 2 must be unicode, not str
The reason is the missing second u'' string in the UMLAUTS mapping.
I'm very sorry: I only had fixed this in the merge request. [4.4-8] e50825bb5d Bug #52878: change advisory [4.4-8] 35073d260b Bug #52878: add u-string
As discussed, I removed the line text = unicodedata.normalize('NFKD', text).encode('ascii', 'ignore').decode('ascii') → it is covered by unidecode. This should fix our test. [4.4-8] bd6cc9a12b Bug #52878: modify advisory [4.4-8] 421d6e552d Bug #52878: remove unicodedata.normalize call Package: univention-directory-manager-modules Version: 14.0.20-9A~4.4.0.202105111251 Branch: ucs_4.4-0 Scope: errata4.4-8 [twenzel/5.0/52878_turkish_umlauts] 8bad51859a Bug #52878: remove unicodedata.normalize call @QA please check if tests pass e.g. 01_base/01_ignore_disabled_users_in_license_count
Package: univention-directory-manager-modules Version: 14.0.20-10A~4.4.0.202105111403 Branch: ucs_4.4-0 Scope: errata4.4-8 [4.4-8] 5b40e539ea Bug #52878: modify advisory [4.4-8] 9b7bc63e03 Bug #52878: remove unused import
OK: code changes OK: automated tests - 11_use_usertemplate_umlauts - 90_ucsschool/219_import_users_umlauts.py - 90_ucsschool/219_import_users_umlauts_normalized.py Also OK, but IMHO unrelated: 01_base/01_ignore_disabled_users_in_license_count OK: manual test: --------------------------------------------------------------------- # dpkg -l python-univention-directory-manager ii python-univention-directory-m 14.0.20-5A~4.4.0.20 # ipython In [1]: from univention.admin import property as uadmin_property In [2]: prop = uadmin_property("_replace") In [3]: prop._replace("<:umlauts>Pınar Ağrı", {}) Out[3]: 'Pnar Agr' -------------- update -------------- # dpkg -l python-univention-directory-manager ii python-univention-directory-m 14.0.20-10A~4.4.0.2 # ipython In [1]: from univention.admin import property as uadmin_property In [2]: prop = uadmin_property("_replace") In [3]: prop._replace("<:umlauts>Pınar Ağrı", {}) Out[3]: 'Pinar Agri' --------------------------------------------------------------------- OK: advisories The funcionality in 5.0 was not tested.
<https://errata.software-univention.de/#/?erratum=4.4x973> <https://errata.software-univention.de/#/?erratum=4.4x974>