Bug 54589 - Wrong regex for UDM syntax gid allows wrong characters
Wrong regex for UDM syntax gid allows wrong characters
Status: NEW
Product: UCS
Classification: Unclassified
Component: UDM (Generic)
UCS 5.0
Other Linux
: P5 normal (vote)
: ---
Assigned To: UMC maintainers
UMC maintainers
:
: 18332 24137 39776 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2022-03-25 09:59 CET by Sönke Schwardt-Krummrich
Modified: 2022-05-25 09:51 CEST (History)
5 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 3: Simply Wrong: The implementation doesn't match the docu
Who will be affected by this bug?: 2: Will only affect a few installed domains
How will those affected feel about the bug?: 1: Nuisance – not a big deal but noticeable
User Pain: 0.034
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Sönke Schwardt-Krummrich univentionstaff 2022-03-25 09:59:23 CET
The regular expression für the UDM syntax gid does not represent what is really meant by the author and what is useful.

" -." does not only allow the characters space, dash and dot but all characters in the ASCII range " " (32) up to "." (46) → see list of wrongly allowed characters below.

IIRC the correct regex would be: u"(?u)^\\w([\\w .-]*\\w)?$"

ALSO: Please check, why single ticks are currently allowed in group names!

Also note: this is a breaking change. We have to find an appropriate release for this change and have to announce this change before! (in case a customer used a lot of e.g. plus signs in group names)

class gid(simple):
	min_length = 1   # TODO: not enforced here
	max_length = 32  # TODO: not enforced here
	regex = re.compile(u"(?u)^\\w([\\w -.’]*\\w)?$")
	# FIXME: The " -." in "[\w -.]" matches the ASCII character range(ord(' '),  ord('.')+1) == range(32, 47)
	error_message = _(
		"A group name must start and end with a letter, number or underscore. In between additionally spaces, dashes "
		"and dots are allowed."
	)

$ python3
>>> for i in range(ord(' '), ord('.')+1): print(i, repr(chr(i)))
... 
32 ' '
33 '!'
34 '"'
35 '#'
36 '$'
37 '%'
38 '&'
39 "'"
40 '('
41 ')'
42 '*'
43 '+'
44 ','
45 '-'
46 '.'
>>> 

root@master:~# udm groups/group create --position cn=groups,$(ucr get ldap/base) --set name="Group (name) + cool2"
Object created: cn=Group (name) \+ cool,cn=groups,dc=dev,dc=nstx,dc=de
Comment 1 Florian Best univentionstaff 2022-03-25 10:11:48 CET
*** Bug 39776 has been marked as a duplicate of this bug. ***
Comment 2 Florian Best univentionstaff 2022-03-25 10:18:56 CET
*** Bug 33656 has been marked as a duplicate of this bug. ***
Comment 3 Florian Best univentionstaff 2022-03-25 10:24:45 CET
(In reply to Florian Best from comment #1)
> *** Bug 39776 has been marked as a duplicate of this bug. ***

Same for the syntax classes: `uid_umlauts` and `uid_umlauts_lower_except_first_letter`.
Comment 4 Florian Best univentionstaff 2022-03-25 10:28:21 CET
(In reply to Florian Best from comment #2)
> *** Bug 33656 has been marked as a duplicate of this bug. ***

(In reply to Frank Greif from comment #7)
> Currently (4.4.1.239) the regexp of class 'gid' is somewhat weird:
> 
> 1334         regex = re.compile(ur"(?u)^\w([\w -.’]*\w)?$")
> 
> The bracketed character class would match (left to right):
> 
> * anything deemed a 'word character' in Unicode
> * the range of characters between 0x20 (space) and 0x2E (dot)
> * the Unicode char with codepoint U+2019 (right single quotation mark)
> 
> I'd propose to change:
> 
> * position the minus at the right end (so it can't be misunderstood as a
> range)
> * remove the spurious U+2019 char
> 
> This would not solve this bug, but at least it would make the regexp really
> match what the description says.

We probably have to keep the U+2019 (’) for french installations.
Comment 5 Frank Greif univentionstaff 2022-03-30 12:14:12 CEST
> We probably have to keep the U+2019 (’) for french installations.

Really? U+2019 is a punctuation character. (if I understood the issue correctly, then space, dot and dash are the only punctuation chars intended to be allowed?)

French accented vowels are already covered by \w in Unicode context.
Comment 6 Florian Best univentionstaff 2022-04-05 16:48:35 CEST
*** Bug 24137 has been marked as a duplicate of this bug. ***
Comment 7 Florian Best univentionstaff 2022-04-05 16:48:47 CEST
*** Bug 18332 has been marked as a duplicate of this bug. ***