54589 – Wrong regex for UDM syntax gid allows wrong characters

Bug 54589 - Wrong regex for UDM syntax gid allows wrong characters

Summary: Wrong regex for UDM syntax gid allows wrong characters

Status:	CLOSED FIXED

Alias:	None

Product:	UCS
Classification:	Unclassified
Component:	UDM (Generic)
Version:	UCS 5.0
Hardware:	Other Linux

Importance:	P5 normal
Target Milestone:	UCS 5.0-10-errata
Assignee:	Arvid Requate
QA Contact:	Felix Botner

URL:	https://git.knut.univention.de/univen...
Keywords:

Duplicates (3):	18332 24137 39776 (view as bug list)
Depends on:
Blocks:	58898
	Show dependency tree / graph

Reported:	2022-03-25 09:59 CET by Sönke Schwardt-Krummrich
Modified:	2025-12-11 10:04 CET (History)
CC List:	6 users (show)

See Also:	54522 40113 58206 19441
What kind of report is it?:	Bug Report
What type of bug is this?:	3: Simply Wrong: The implementation doesn't match the docu
Who will be affected by this bug?:	2: Will only affect a few installed domains
How will those affected feel about the bug?:	1: Nuisance – not a big deal but noticeable
User Pain:	0.034
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Customer ID:
Max CVSS v3 score:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Sönke Schwardt-Krummrich

2022-03-25 09:59:23 CET

The regular expression für the UDM syntax gid does not represent what is really meant by the author and what is useful.

" -." does not only allow the characters space, dash and dot but all characters in the ASCII range " " (32) up to "." (46) → see list of wrongly allowed characters below.

IIRC the correct regex would be: u"(?u)^\\w([\\w .-]*\\w)?$"

ALSO: Please check, why single ticks are currently allowed in group names!

Also note: this is a breaking change. We have to find an appropriate release for this change and have to announce this change before! (in case a customer used a lot of e.g. plus signs in group names)

class gid(simple):
	min_length = 1   # TODO: not enforced here
	max_length = 32  # TODO: not enforced here
	regex = re.compile(u"(?u)^\\w([\\w -.’]*\\w)?$")
	# FIXME: The " -." in "[\w -.]" matches the ASCII character range(ord(' '),  ord('.')+1) == range(32, 47)
	error_message = _(
		"A group name must start and end with a letter, number or underscore. In between additionally spaces, dashes "
		"and dots are allowed."
	)

$ python3
>>> for i in range(ord(' '), ord('.')+1): print(i, repr(chr(i)))
... 
32 ' '
33 '!'
34 '"'
35 '#'
36 '$'
37 '%'
38 '&'
39 "'"
40 '('
41 ')'
42 '*'
43 '+'
44 ','
45 '-'
46 '.'
>>> 

root@master:~# udm groups/group create --position cn=groups,$(ucr get ldap/base) --set name="Group (name) + cool2"
Object created: cn=Group (name) \+ cool,cn=groups,dc=dev,dc=nstx,dc=de

Comment 1 Florian Best

2022-03-25 10:11:48 CET

*** Bug 39776 has been marked as a duplicate of this bug. ***

Comment 2 Florian Best

2022-03-25 10:18:56 CET

*** Bug 33656 has been marked as a duplicate of this bug. ***

Comment 3 Florian Best

2022-03-25 10:24:45 CET

(In reply to Florian Best from comment #1)
> *** Bug 39776 has been marked as a duplicate of this bug. ***

Same for the syntax classes: `uid_umlauts` and `uid_umlauts_lower_except_first_letter`.

Comment 4 Florian Best

2022-03-25 10:28:21 CET

(In reply to Florian Best from comment #2)
> *** Bug 33656 has been marked as a duplicate of this bug. ***

(In reply to Frank Greif from comment #7)
> Currently (4.4.1.239) the regexp of class 'gid' is somewhat weird:
> 
> 1334         regex = re.compile(ur"(?u)^\w([\w -.’]*\w)?$")
> 
> The bracketed character class would match (left to right):
> 
> * anything deemed a 'word character' in Unicode
> * the range of characters between 0x20 (space) and 0x2E (dot)
> * the Unicode char with codepoint U+2019 (right single quotation mark)
> 
> I'd propose to change:
> 
> * position the minus at the right end (so it can't be misunderstood as a
> range)
> * remove the spurious U+2019 char
> 
> This would not solve this bug, but at least it would make the regexp really
> match what the description says.

We probably have to keep the U+2019 (’) for french installations.

Comment 5 Frank Greif

2022-03-30 12:14:12 CEST

> We probably have to keep the U+2019 (’) for french installations.

Really? U+2019 is a punctuation character. (if I understood the issue correctly, then space, dot and dash are the only punctuation chars intended to be allowed?)

French accented vowels are already covered by \w in Unicode context.

Comment 6 Florian Best

2022-04-05 16:48:35 CEST

*** Bug 24137 has been marked as a duplicate of this bug. ***

Comment 7 Florian Best

2022-04-05 16:48:47 CEST

*** Bug 18332 has been marked as a duplicate of this bug. ***

Comment 8 Arvid Requate

2025-11-11 16:15:06 CET

Also: https://learn.microsoft.com/en-us/windows/win32/adschema/a-samaccountname says the following characters are *not* allowed:

> " / \ [ ] : ; | = , + * ? < >


But a dash should be allowed and I guess the original intention of the regex was to allow that.

Comment 9 Jürn Brodersen

2025-11-25 11:11:44 CET

Analyses from the systemd devs, which characters should be allowed:
https://systemd.io/USER_NAMES/

Comment 10 Florian Best

2025-11-26 12:50:31 CET

MR: https://git.knut.univention.de/univention/dev/ucs/-/merge_requests/1708

Comment 11 Arvid Requate

2025-11-28 13:44:02 CET

387d1ba3d30 | feat(udm): Allow trailing dash in uid and gid syntaxes
ab2d520edac | chore(udm): changelog and advisory

Package: univention-directory-manager-modules
Version: 15.0.29-7
Release: ucs_5.0-0-errata5.0-10
Scope: errata5.0-10

cherry-picked for 5.2-3 into https://git.knut.univention.de/univention/dev/ucs/-/issues/3226

Comment 12 Felix Botner

2025-12-02 13:36:10 CET

OK - tests
OK - adivsory
OK - univention-directory-manager-modules

Comment 13 Christian Castens

2025-12-03 13:31:35 CET

<https://errata.software-univention.de/#/?erratum=5.0x1359>