Bug 45626 - User is removed and recreated on every import
User is removed and recreated on every import
Status: CLOSED FIXED
Product: UCS@school
Classification: Unclassified
Component: Import scripts
UCS@school 4.2
Other Linux
: P5 normal (vote)
: UCS@school 4.2 (HTTP-API-MVP)
Assigned To: Daniel Tröder
Sönke Schwardt-Krummrich
:
Depends on:
Blocks: 45019 45844
  Show dependency treegraph
 
Reported: 2017-11-01 13:19 CET by Sönke Schwardt-Krummrich
Modified: 2017-12-21 12:23 CET (History)
2 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 3: Simply Wrong: The implementation doesn't match the docu
Who will be affected by this bug?: 4: Will affect most installed domains
How will those affected feel about the bug?: 2: A Pain – users won’t like this once they notice it
User Pain: 0.137
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:


Attachments
Test import data (730 bytes, text/csv)
2017-11-01 13:19 CET, Sönke Schwardt-Krummrich
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Sönke Schwardt-Krummrich univentionstaff 2017-11-01 13:19:12 CET
Created attachment 9269 [details]
Test import data

If this file is imported multiple times, the user "hanna.venckh" from the attached CSV file is removed and recreated with a new username during every import starting with the second import.

/usr/share/ucs-school-import/scripts/ucs-school-user-import \
  --conffile /usr/share/ucs-school-import/configs/user_import_http-api.json  \
  --user_role student --school gsost \
  --infile test_users_2017-10-31_02:48:54.csv \
  --sourceUID gsost-student --verbose

"gsost","Hanna","Venckhuß","1c,1a","A teacher.","+16-990-25713"

May be related to the umlaut in the last name.

Config /usr/share/ucs-school-import/configs/user_import_http-api.json:

{
   "factory": "ucsschool.importer.default_user_import_factory.DefaultUserImportFactory",
   "input": {
      "type": "csv"
   },
   "csv": {
     "mapping": {
        "Schule": "school",
        "Vorname": "firstname",
        "Nachname": "lastname",
        "Klassen": "school_classes",
        "Beschreibung": "description",
        "Telefon": "phone",
        "EMail": "mailPrimaryAddress"
     }
   },
   "scheme": {
     "email": "<firstname>.<lastname>@<maildomain>",
     "recordUID": "<firstname>.<lastname>",
     "username": {
        "default": "<:umlauts><firstname>.<lastname><:lower>[COUNTER2]"
     }
   },
   "verbose": false
}
Comment 1 Daniel Tröder univentionstaff 2017-11-01 14:08:16 CET
It is a configuration error.

The problem ist with the ucsschoolRecordUID that gets saved as SGFubmEuVmVuY2todcOf because of the ß. When the configuration is changed, it works:

19c19
< 		"recordUID": "<firstname>.<lastname>",
---
> 		"recordUID": "<:umlauts><firstname>.<lastname>",

I guess we should either make it a default in the configuration or in the code.
Comment 2 Sönke Schwardt-Krummrich univentionstaff 2017-11-08 16:26:27 CET
(In reply to Daniel Tröder from comment #1)
> It is a configuration error.

I used the default config/example → bug in the MVP example
 
> The problem ist with the ucsschoolRecordUID that gets saved as
> SGFubmEuVmVuY2todcOf because of the ß. When the configuration is changed, it
> works:

I think, the recordUID is not saved as "SGFubmEuVmVuY2todcOf". This string is base64 encoded:
$ echo "SGFubmEuVmVuY2todcOf" | base64 -d
Hanna.Venckhuß
$
univention-ldapsearch returns this value base64 encoded and indicates this by two colons "::" between LDAP attribute name and value.
 
> 19c19
> < 		"recordUID": "<firstname>.<lastname>",
> ---
> > 		"recordUID": "<:umlauts><firstname>.<lastname>",
> 
> I guess we should either make it a default in the configuration or in the
> code.

Is there a problem with UTF-8/string vs. unicode encoding?
If yes, we should fix it. If no, we should fix the default config.
Comment 3 Daniel Tröder univentionstaff 2017-11-09 08:14:08 CET
(In reply to Sönke Schwardt-Krummrich from comment #2)
> (In reply to Daniel Tröder from comment #1)
> > It is a configuration error.
> 
> I used the default config/example → bug in the MVP example
Yes - fixed in commit ed05fd74.

> > The problem ist with the ucsschoolRecordUID that gets saved as
> > SGFubmEuVmVuY2todcOf because of the ß. When the configuration is changed, it
> > works:
> 
> I think, the recordUID is not saved as "SGFubmEuVmVuY2todcOf". This string
> is base64 encoded:
> $ echo "SGFubmEuVmVuY2todcOf" | base64 -d
> Hanna.Venckhuß
> $
> univention-ldapsearch returns this value base64 encoded and indicates this
> by two colons "::" between LDAP attribute name and value.
>  
> > 19c19
> > < 		"recordUID": "<firstname>.<lastname>",
> > ---
> > > 		"recordUID": "<:umlauts><firstname>.<lastname>",
> > 
> > I guess we should either make it a default in the configuration or in the
> > code.
> 
> Is there a problem with UTF-8/string vs. unicode encoding?
> If yes, we should fix it. If no, we should fix the default config.
Yes - the problem is not with the storage, but with the search: ldap.filter.filter_format() decodes 'ß' to '\xc3\x9f', which doesn't match the saved value.
We have to either always use "<:umlauts>" or base64-encode all recordUIDs and LDAP-searches.
Comment 4 Daniel Tröder univentionstaff 2017-11-27 09:54:53 CET
RFC in branch dtroeder/45626_rUID_filter:

Commit 3ed35f11 (https://git.knut.univention.de/univention/ucsschool/commit/3ed35f11a53b1fa588f5a674143e52800afb65d9) stores recordUID and sourceUID as already-escaped strings, so that an LDAP search with the same names should match.

If the solution is OK, I'll write a test, advisory etc. Please comment.
Comment 5 Florian Best univentionstaff 2017-11-27 12:10:45 CET
I think values should be escaped where they are inserted into a filter, not in self.record_uid.
Also: For escaping single values there is ldap.filter.escape_filter_chars().
Comment 6 Florian Best univentionstaff 2017-11-27 20:34:42 CET
@Daniel:
Which ldap search is failing? I can't find the code.
Comment 7 Daniel Tröder univentionstaff 2017-11-28 09:37:49 CET
Found it. Fixed in 91758e34: https://git.knut.univention.de/univention/ucsschool/commit/91758e3438df1f35cd98de2eb681830e30bfbee7

The problem is the comparison of a unicode object and a string containing encoded unicode. The {str,unicode}.__eq__ can handle it:

In [11]: u'ß' == 'ß', u'\xdf' == 'ß', u'\xdf' == '\xc3\x9f', u'\xdf' == u'ß', '\xc3\x9f' == 'ß', '\xc3\x9f' == u'ß'
Out[11]: (True, True, True, True, True, True)

But the hashes are different:

In [14]: hash('ß'), hash('\xc3\x9f'), hash(u'ß'), hash(u'\xdf')
Out[14]: (24960149699224340, 24960149699224340, 28544085598, 28544085598)

This comes into play, when the sets were compared:


In [17]: 'ß' == u'ß', set('ß') == set(u'ß')
Out[17]: (True, False)

With 91758e34 both sets contain unicode strings, and the comparison works again even when "<:umlauts>" is not used in the configuration scheme.
Comment 8 Daniel Tröder univentionstaff 2017-11-30 09:52:59 CET
[4.2 3f15130a] Bug #45626: handle non-ASCII in recordUID and sourceUID
[4.2 b6629533] Bug #45626: advisory update

ucs-school-import 15.0.3-11
Comment 9 Sönke Schwardt-Krummrich univentionstaff 2017-12-09 01:26:38 CET
(In reply to Daniel Tröder from comment #3)
> (In reply to Sönke Schwardt-Krummrich from comment #2)
> > (In reply to Daniel Tröder from comment #1)
> > > It is a configuration error.
> > 
> > I used the default config/example → bug in the MVP example
> Yes - fixed in commit ed05fd74.

I don't think that this is correct in all cases:
    "recordUID": "<:umlauts><firstname>.<lastname>",

The following users will create a recordUID collision with this scheme:
- Martin Dräger → Martin.Draeger
- Martin Draeger → Martin.Draeger

I suggest to remove the "<:umlauts>" in user_import_http-api.json to make collisions less likely.
→ REOPEN


Otherwise:
OK: code change for the actual fix
OK: functional test
OK: advisory
Comment 10 Daniel Tröder univentionstaff 2017-12-11 11:39:23 CET
81cd8678: Bug #45626: remove "<:umlauts>" from default configuration to lower the chance of recordUID collisions

ucs-school-import (15.0.3-17)

Package version in advisory will be adapted when Bug #45577 is fixed.
Comment 11 Sönke Schwardt-Krummrich univentionstaff 2017-12-11 12:07:13 CET
OK: code change
OK: functional test

> Package version in advisory will be adapted when Bug #45577 is fixed.

OK
→VERIFIED
Comment 12 Sönke Schwardt-Krummrich univentionstaff 2017-12-21 12:23:04 CET
UCS@school 4.2 v6 has been released.

http://docs.software-univention.de/changelog-ucsschool-4.2v6-de.html

If this error occurs again, please clone this bug.