Bug 51498 - Rework NTP configuration
Rework NTP configuration
Status: NEW
Product: UCS
Classification: Unclassified
Component: NTP
UCS 5.0
Other Linux
: P5 normal with 2 votes (vote)
: ---
Assigned To: UCS maintainers
UCS maintainers
:
Depends on: 30854 42171 chrony 50269 ucs445ec2
Blocks: 51493
  Show dependency treegraph
 
Reported: 2020-06-16 09:08 CEST by Philipp Hahn
Modified: 2022-01-21 17:47 CET (History)
2 users (show)

See Also:
What kind of report is it?: Development Internal
What type of bug is this?: ---
Who will be affected by this bug?: ---
How will those affected feel about the bug?: ---
User Pain:
Enterprise Customer affected?:
School Customer affected?:
ISV affected?:
Waiting Support:
Flags outvoted (downgraded) after PO Review:
Ticket number:
Bug group (optional):
Max CVSS v3 score:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Philipp Hahn univentionstaff 2020-06-16 09:08:53 CEST
Our NTP configuration has many deficiencies:

- running NTP inside a VM is not recommended (Bug #47939) and leads to errors like Bug #51493 or Bug #49455

- DCs always configures a local clock; if only one external NTP service is configured, NTPd cannot decide which one is correct and requires a long time to synchronize. At least 3 external services should be configured. (Bug #30854)

- All except the DC Master configure all other DCs as time servers first and add explicitly configures additional servers via UCRV "timeserver[23]" on top (the order is not important). Switch to peer-mode between Master and Backups?
 (Bug #42171)

- No encryption

- Beter support for pool: should not be configured through "server" as that can lead to one server being selected multiple times (by accident) (Bug #50269)

- By default not external NTP is configured (Bug #27728)

- NTP configuration is not provided via DHCP option 042

- Also support PTP?
Comment 1 Philipp Hahn univentionstaff 2020-07-03 05:55:07 CEST
(In reply to Philipp Hahn from comment #0)
> - NTP configuration is not provided via DHCP option 042

Actually this already works through `/etc/dhcp/dhclient-exit-hooks.d/ntp`, which uses `/etc/ntp.conf` as its base, overwrites the `server|pool|peer` configuration with the servers from DHCP, writes it to `/run/ntp.conf.dhc` and restarts `ntpd` through `/etc/init.d/ntp`, which preferres said file IFF it exists. UCRVs `timeserver[23]` are thus IGNORED.

This lead to the strange situation during my last 3
TT 2020-07-01/02
TT 2020-06-18/19
TT 2020-05-14/15
that NTP was NOT WORKING at all in AWS EC2: The reason for this was that in our CloudFormation template `schulung_v1.json` contains this chunk:

    "DhcpOptions" : {
      "Type" : "AWS::EC2::DHCPOptions",
        "Properties" : {
          "DomainName" : "schulung.ucs",
          "DomainNameServers" : ["10.0.0.13"],
            "NtpServers" : ["10.0.0.13"],
            "NetbiosNameServers" : ["10.0.0.13"],
            "NetbiosNodeType" : "2"
        }
    },

This also gets applied to the "DC Master", which the is using ITSELF as the ONLY time source. NTPd refuses to do so and will be stuck in the INIT state forever:

$ ntpq  -p
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 dc1.schulung.uc .INIT.          16 u    - 1024    0    0.000    0.000   0.000

This then cascaded to the "DC Slave", which also receives the same NTP configuration and will only contact NTPd on the Master. As that one never gets a stable time, the Master will not answer any request until then and so the Slave will also be stuck in that INIT state forever.

This is then flagged as an error by `check_univention_ntp` in Nagios.

As AWS nowadays runs its own internal NTP server network <https://docs.aws.amazon.com/de_de/AWSEC2/latest/UserGuide/set-time.html>, "169.254.169.123" should be used by default in AWS EC2. It should be configured through
  ucr set timeserver='169.254.169.123 iburst'
to also speed up the initial sync.
As those servers are internal to AWS no extra rule is needed in the SecurityGroup like this:
            { "IpProtocol" : "udp", "FromPort" : "123", "ToPort" : "123", "CidrIp" : "0.0.0.0/0" },
Comment 2 Philipp Hahn univentionstaff 2020-11-26 11:48:06 CET
With UCS 5.0-0 `apt-get update` is broken after a suspend as the downloaded file is too much in the future: Bug #51493
Comment 3 Philipp Hahn univentionstaff 2021-08-31 09:44:45 CEST
For AWS we set 169.254.169.123 by default since Bug #51558
Comment 4 Philipp Hahn univentionstaff 2022-01-21 17:47:41 CET
(In reply to Philipp Hahn from comment #0)
> - running NTP inside a VM is not recommended (Bug #47939) and leads to
> errors like Bug #51493 or Bug #49455

With Linux-4.11+ on the host there is "ptp-kvm", which exports the time of the host into any VM; "chrony" then can use "/dev/ptp_kvm" inside the VM to keep the VM up-to-date. "chrony" is also needed on the host to keep the host itself synchronized to wall-clock.