Univention Bugzilla – Bug 47580
Normalize in user templates of names with umlauts do not work completely
Last modified: 2020-06-22 17:01:25 CEST
There are still characters, especially " ` " which are not normalized and therefore still existing in the mail address. E.g. "Vivian D` Muster" should be changeable to VivianDMuster@schule.example.de +++ This bug was initially created as a clone of Bug #44370 +++ If you use <:umlauts> in user templates (example of primary mail address: <firstname>[0].<lastname><:strip><:umlauts>@demo.univention.de), there will be a problem with names like "Ýlang Mustermann". +++ This bug was initially created as a clone of Bug #44367 +++ Import of "Ýlang Müstèrmánn" produce "?" in username and email address with default settings. I think there are some umlauts missing in "class property" in /usr/share/pyshared/univention/admin/__init__.py May be better to use something like unicodedata.normalize() (see https://docs.python.org/2/library/unicodedata.html#unicodedata.normalize) instead of hard coding UMLAUTS!
Also the strip/trim command seems to not work. As far as I understood the unicode documentation, unicode normalization is intended to transform unicode strings to remove the case that the same character is represented by different codes. Useful if you always want the same representation of a character in your data(base), but not really intended to remove special characters from strings or replace them with 'similar' ASCII characters via normalization-encoding chain. A better solution in my opinion would be unidecode (https://pypi.org/project/Unidecode/) which tries to do exactly what we intend. Represent Unicode strings as ASCII as close as possible. In the library are a lot of hand crafted solutions (similar to our umlaut replacement code, but much more extensive). Since ` is an ASCII character neither unicode normalization+encoding nor unidecode would remove them. For that we should facilitate pythons builtin string function isalpha() which lets us check if a given character is alphanumerical (or a given string consists entirely out of alphanumerical characters) If we filter the given string/name after applying asciification we should get quite a solid result.
'Also the strip/trim command seems to not work.' --Ignore that comment
Package: univention-management-console-module-udm Version: 8.0.5-16A~4.3.0.201809101250 Package: univention-directory-manager-modules Version: 13.0.22-3A~4.3.0.201809101251 The option :umlauts was not altered at all. Since symbols like '`# etc are not falling into the category of the umlauts option (it takes umlauts and transforms them into ASCII-representations). Instead there is the new option :alphanum which removes all symbols that are not alphanumerical or spaces. In the UCRV directory/manager/templates/alphanum/whitelist you can save a string containing all symbols that should be ignored by that option. :alphanum should be used carefully since it removes even the @-sign of an email address, if it is applied to the entire email field for example. It is better to use it on specific attributes only, like <firstname>.
Looks like jenkins doesn't like your docu commit: http://jenkins.knut.univention.de:8080/job/UCS-4.3/job/UCS-4.3-2/job/HandbookUCS/71/warnings5Result/new/ I guess you need to whitelist whitelist for the spelling check... :)
Package: univention-directory-manager-modules Version: 13.0.22-4A~4.3.0.201809120823 Remove debug entry whitelist added to english dict
Package: univention-directory-manager-modules Version: 13.0.22-5A~4.3.0.201809181105 Package: univention-management-console-module-udm Version: 8.0.5-17A~4.3.0.201809181115 Integrated the discussed code improvements and equalized the umlauts dict for front- and backend. The discussed option to parametrize user template options was moved into a new Feature Request Bug #47830
(In reply to Ole Schwiegert from comment #3) > Package: univention-management-console-module-udm > Version: 8.0.5-16A~4.3.0.201809101250 > > Package: univention-directory-manager-modules > Version: 13.0.22-3A~4.3.0.201809101251 > > The option :umlauts was not altered at all. Since symbols like '`# etc are > not falling into the category of the umlauts option (it takes umlauts and > transforms them into ASCII-representations). > > Instead there is the new option :alphanum which removes all symbols that are > not alphanumerical or spaces. In the UCRV > directory/manager/templates/alphanum/whitelist you can save a string > containing all symbols that should be ignored by that option. > > :alphanum should be used carefully since it removes even the @-sign of an > email address, if it is applied to the entire email field for example. It is > better to use it on specific attributes only, like <firstname>. The whitespace character is also filtered by the option if not excluded in the whitelist
Package: univention-directory-manager-modules Version: 13.0.22-6A~4.3.0.201809181151 Fix error in postinst script
Package: univention-directory-manager-modules Version: 13.0.23-4A~4.3.0.201809191058 fixed unicode bug
OK only alphanumeric characters are kept (in the frontend this is constraint to ascii letters, basic latin 1 letters, and the digits 0-9) When IE11 is no longer supported we can use unicode regex to also keep all unicode alphanumeric characters in the frontend OK Characters defined in the ucr variable directory/manager/templates/alphanum/whitelist are also kept OK Code OK YAML -> verified
<http://errata.software-univention.de/ucs/4.3/252.html> <http://errata.software-univention.de/ucs/4.3/253.html>
*** Bug 45387 has been marked as a duplicate of this bug. ***