Bug 48655 - Allow missing columns in CSV
Allow missing columns in CSV
Status: CLOSED FIXED
Product: UCS@school
Classification: Unclassified
Component: Import scripts
UCS@school 4.3
Other other
: P5 normal (vote)
: UCS@school 4.3 v8
Assigned To: Daniel Tröder
Ole Schwiegert
:
Depends on: 47681 47691
Blocks: 49050
  Show dependency treegraph
 
Reported: 2019-02-12 12:13 CET by Michael Grandjean
Modified: 2019-04-01 00:44 CEST (History)
5 users (show)

See Also:
What kind of report is it?: Bug Report
What type of bug is this?: 5: Major Usability: Impairs usability in key scenarios
Who will be affected by this bug?: 3: Will affect average number of installed domains
How will those affected feel about the bug?: 3: A User would likely not purchase the product
User Pain: 0.257
Enterprise Customer affected?:
School Customer affected?: Yes
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Grandjean univentionstaff 2019-02-12 12:13:31 CET
With Bug #47681 we also introduced a "missing_columns" check in csv_reader.py. This checks if the given CSV contains all columns that are specified in the csv mapping of the config file(s) - independently of the "mandatory attributes".

While I do understand this approach from a technical perspective, it is quite problematic in real world scenarios:
Customers (even end users) do import different CSV files with different columns, but want or need (HTTP-API) to use the same configuration for all of them.

Example:
- The CSV files for teachers contain a "mail" column, but those for students don't
- The CSV files for students contain a "birthday" column, but those for teachers don't

This leads to several customer environments where the "missing_columns" check is commented out in "csv_reader.py", which is not a sustainable solution.

IMHO it should be possible to deactivate this check (via UCR). This way, we could have a more loose approach if the customer wants this flexibility regarding the CSV columns and we could have a more strict mode, where the columns get checked even if they are not marked as mandatory.


+++ This bug was initially created as a clone of Bug #47681 +++

Scenario: I uploaded a CSV file to the HTTP-API UMC User Import module with a column name "Primarykey" that contained the record_uid. But in the user_import.json config file, the mapping was "ID": "record_uid". So, I had a mismatch between Primarykey (file) <-> ID (config).
In this case, the error message / Traceback in /var/lib/ucs-school-import/jobs/*/*/ucs-school-import.log is not helpful at all:

> 2018-08-28 09:45:10 ERROR cmdline.main:138  Outer Exception catcher: AttributeError("'NoneType' object has no attribute 'replace'",)
> Traceback (most recent call last):
>   File "/usr/lib/pymodules/python2.7/ucsschool/importer/frontend/cmdline.py", line 118, in main
>     self.do_import()
>   File "/usr/lib/pymodules/python2.7/ucsschool/importer/frontend/cmdline.py", line 96, in do_import
>     importer.mass_import()
>   File "/usr/lib/pymodules/python2.7/ucsschool/importer/mass_import/mass_import.py", line 70, in mass_import
>     self.import_users()
>   File "/usr/lib/pymodules/python2.7/ucsschool/importer/mass_import/mass_import.py", line 100, in import_users
>     user_import.create_and_modify_users(imported_users)  # 90% - 100%
>   File "/usr/lib/pymodules/python2.7/ucsschool/importer/mass_import/user_import.py", line 122, in create_and_modify_users
>     user = self.determine_add_modify_action(imported_user)
>   File "/usr/lib/pymodules/python2.7/ucsschool/importer/mass_import/user_import.py", line 197, in determine_add_modify_action
>     user = imported_user.get_by_import_id(self.connection, imported_user.source_uid, imported_user.record_uid)
>   File "/usr/lib/pymodules/python2.7/ucsschool/importer/models/import_user.py", line 338, in get_by_import_id
>     (source_uid, record_uid)
>   File "/usr/lib/python2.7/dist-packages/ldap/filter.py", line 59, in filter_format
>     return filter_template % (tuple(map(escape_filter_chars,assertion_values)))
>   File "/usr/lib/python2.7/dist-packages/ldap/filter.py", line 43, in escape_filter_chars
>     s = assertion_value.replace('\\', r'\5c')
> AttributeError: 'NoneType' object has no attribute 'replace'

We should check and catch this. I think something like "No record_uid given." as error message should be fine.
Comment 2 Daniel Tröder univentionstaff 2019-02-12 12:40:36 CET
The test prevents user errors and thus unnecessary support cases.
I'd prefer a more explicit solution.
Explicit in the sense, that the intend of the user can be found in the configuration and thus the system can verify the input.

I can imagine:
* A configuration key that allows certain columns to be optional (explicitly deactivating the test for only those).
* Another layer of JSON files like with $OU for $ROLE. Combining can be confusing for the user though.
* A post-read-hook that adds missing columns to CSV files.
Comment 3 Daniel Tröder univentionstaff 2019-03-12 09:33:14 CET
A configuration key csv:allowed_missing_columns was added whos value can be a list of column names that are allowed to be missing in the input data.

A ucs-test for the feature was also added.

[dtroeder/48655.allow.missing.columns 7a54c3ad6] Bug #48655: allow the configuration of columns missing in the input file

Please reopen, when starting QA (missing merge, changelog and package build).
Comment 4 Ole Schwiegert univentionstaff 2019-03-13 14:46:26 CET
Package installs: OK
Test passes: OK
allowed_missing_columns: OK

Please merge and create changelog and advisory
Comment 5 Daniel Tröder univentionstaff 2019-03-13 16:59:04 CET
[4.3] fd450cd5c Bug #48655: changelog and advisory
[4.3] 3ce2972b8 Bug #48655: advisory update

Merged to 4.4:

[4.4] 0db4c2c22 Bug #48655: allow the configuration of columns missing in the input file

ucs-school-import (16.0.3-5)
Comment 6 Ole Schwiegert univentionstaff 2019-03-18 09:00:17 CET
4.4:
Changelog: No mention of this bug
Advisory: OK
ucs-test-ucsschool: not build, test not included
manual copied test fails:

Traceback (most recent call last):
  File "245_import_user_allowed_missing_columns", line 69, in <module>
    Test().run()
  File "/usr/lib/pymodules/python2.7/univention/testing/ucsschool/importusers_cli_v2.py", line 293, in run
    self.test()
  File "245_import_user_allowed_missing_columns", line 64, in test
    person.verify()
  File "/usr/lib/pymodules/python2.7/univention/testing/ucsschool/importusers.py", line 352, in verify
    utils.verify_ldap_object(self.dn, expected_attr=self.expected_attributes(), strict=True, should_exist=True)
  File "/usr/lib/pymodules/python2.7/univention/testing/decorators.py", line 39, in __call__
    self.func(*args, **kwargs)
  File "/usr/lib/pymodules/python2.7/univention/testing/utils.py", line 177, in verify_ldap_object
    raise LDAPObjectValueMissing(msg)
univention.testing.utils.LDAPObjectValueMissing: DN: uid=sf5rwiruhz,cn=mitarbeiter,cn=users,ou=or6,dc=realm1,dc=intranet
ucsschoolSourceUID: ['sourceDB'], missing: 'sourceUID-wjgmuncasl'
ucsschoolSourceUID: ['sourceDB'], unexpected: 'sourceDB'
Starting 1 ucs-test at 2019-03-13 09:55:04 to /dev/null
UCS 4.4-0-e5 ucs-test 9.0.2-13A~4.4.0.201903111720
Test csv:allowed_missing_columns config............................... Test failed

manual test: OK
  - new field works as expected
  - works with multiple keys and generates no false positives

4.3:
Changelog&Advisory: OK
ucs-test-ucsschool: OK
manual test: OK
  - new field works as expected
  - works with multiple keys and generates no false positives
Comment 7 Daniel Tröder univentionstaff 2019-03-18 11:01:18 CET
[4.4 74f7ba0b0] Bug #48655: fix configuration

ucs-test-ucsschool (6.0.0-31)
Comment 8 Ole Schwiegert univentionstaff 2019-03-18 11:11:58 CET
Testpackage now build and test passes: OK
Comment 9 Sönke Schwardt-Krummrich univentionstaff 2019-04-01 00:44:22 CEST
UCS@school 4.3 v8 has been released.

https://docs.software-univention.de/changelog-ucsschool-4.3v8-de.html

If this error occurs again, please clone this bug.