Bug 54589 - Wrong regex for UDM syntax gid allows wrong characters
Summary: Wrong regex for UDM syntax gid allows wrong characters
Status: CLOSED FIXED
Alias: None
Product: UCS
Classification: Unclassified
Component: UDM (Generic)
Version: UCS 5.0
Hardware: Other Linux
: P5 normal
Target Milestone: UCS 5.0-10-errata
Assignee: Arvid Requate
QA Contact: Felix Botner
URL: https://git.knut.univention.de/univen...
Keywords:
: 18332 24137 39776 (view as bug list)
Depends on:
Blocks: 58898
  Show dependency treegraph
 
Reported: 2022-03-25 09:59 CET by Sönke Schwardt-Krummrich
Modified: 2025-12-11 10:04 CET (History)
6 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 3: Simply Wrong: The implementation doesn't match the docu
Who will be affected by this bug?: 2: Will only affect a few installed domains
How will those affected feel about the bug?: 1: Nuisance – not a big deal but noticeable
User Pain: 0.034
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Customer ID:
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Sönke Schwardt-Krummrich univentionstaff 2022-03-25 09:59:23 CET
The regular expression für the UDM syntax gid does not represent what is really meant by the author and what is useful.

" -." does not only allow the characters space, dash and dot but all characters in the ASCII range " " (32) up to "." (46) → see list of wrongly allowed characters below.

IIRC the correct regex would be: u"(?u)^\\w([\\w .-]*\\w)?$"

ALSO: Please check, why single ticks are currently allowed in group names!

Also note: this is a breaking change. We have to find an appropriate release for this change and have to announce this change before! (in case a customer used a lot of e.g. plus signs in group names)

class gid(simple):
	min_length = 1   # TODO: not enforced here
	max_length = 32  # TODO: not enforced here
	regex = re.compile(u"(?u)^\\w([\\w -.’]*\\w)?$")
	# FIXME: The " -." in "[\w -.]" matches the ASCII character range(ord(' '),  ord('.')+1) == range(32, 47)
	error_message = _(
		"A group name must start and end with a letter, number or underscore. In between additionally spaces, dashes "
		"and dots are allowed."
	)

$ python3
>>> for i in range(ord(' '), ord('.')+1): print(i, repr(chr(i)))
... 
32 ' '
33 '!'
34 '"'
35 '#'
36 '$'
37 '%'
38 '&'
39 "'"
40 '('
41 ')'
42 '*'
43 '+'
44 ','
45 '-'
46 '.'
>>> 

root@master:~# udm groups/group create --position cn=groups,$(ucr get ldap/base) --set name="Group (name) + cool2"
Object created: cn=Group (name) \+ cool,cn=groups,dc=dev,dc=nstx,dc=de
Comment 1 Florian Best univentionstaff 2022-03-25 10:11:48 CET
*** Bug 39776 has been marked as a duplicate of this bug. ***
Comment 2 Florian Best univentionstaff 2022-03-25 10:18:56 CET
*** Bug 33656 has been marked as a duplicate of this bug. ***
Comment 3 Florian Best univentionstaff 2022-03-25 10:24:45 CET
(In reply to Florian Best from comment #1)
> *** Bug 39776 has been marked as a duplicate of this bug. ***

Same for the syntax classes: `uid_umlauts` and `uid_umlauts_lower_except_first_letter`.
Comment 4 Florian Best univentionstaff 2022-03-25 10:28:21 CET
(In reply to Florian Best from comment #2)
> *** Bug 33656 has been marked as a duplicate of this bug. ***

(In reply to Frank Greif from comment #7)
> Currently (4.4.1.239) the regexp of class 'gid' is somewhat weird:
> 
> 1334         regex = re.compile(ur"(?u)^\w([\w -.’]*\w)?$")
> 
> The bracketed character class would match (left to right):
> 
> * anything deemed a 'word character' in Unicode
> * the range of characters between 0x20 (space) and 0x2E (dot)
> * the Unicode char with codepoint U+2019 (right single quotation mark)
> 
> I'd propose to change:
> 
> * position the minus at the right end (so it can't be misunderstood as a
> range)
> * remove the spurious U+2019 char
> 
> This would not solve this bug, but at least it would make the regexp really
> match what the description says.

We probably have to keep the U+2019 (’) for french installations.
Comment 5 Frank Greif univentionstaff 2022-03-30 12:14:12 CEST
> We probably have to keep the U+2019 (’) for french installations.

Really? U+2019 is a punctuation character. (if I understood the issue correctly, then space, dot and dash are the only punctuation chars intended to be allowed?)

French accented vowels are already covered by \w in Unicode context.
Comment 6 Florian Best univentionstaff 2022-04-05 16:48:35 CEST
*** Bug 24137 has been marked as a duplicate of this bug. ***
Comment 7 Florian Best univentionstaff 2022-04-05 16:48:47 CEST
*** Bug 18332 has been marked as a duplicate of this bug. ***
Comment 8 Arvid Requate univentionstaff 2025-11-11 16:15:06 CET
Also: https://learn.microsoft.com/en-us/windows/win32/adschema/a-samaccountname says the following characters are *not* allowed:

> " / \ [ ] : ; | = , + * ? < >


But a dash should be allowed and I guess the original intention of the regex was to allow that.
Comment 9 Jürn Brodersen univentionstaff 2025-11-25 11:11:44 CET
Analyses from the systemd devs, which characters should be allowed:
https://systemd.io/USER_NAMES/
Comment 11 Arvid Requate univentionstaff 2025-11-28 13:44:02 CET
387d1ba3d30 | feat(udm): Allow trailing dash in uid and gid syntaxes
ab2d520edac | chore(udm): changelog and advisory

Package: univention-directory-manager-modules
Version: 15.0.29-7
Release: ucs_5.0-0-errata5.0-10
Scope: errata5.0-10

cherry-picked for 5.2-3 into https://git.knut.univention.de/univention/dev/ucs/-/issues/3226
Comment 12 Felix Botner univentionstaff 2025-12-02 13:36:10 CET
OK - tests
OK - adivsory
OK - univention-directory-manager-modules
Comment 13 Christian Castens univentionstaff 2025-12-03 13:31:35 CET
<https://errata.software-univention.de/#/?erratum=5.0x1359>