Consumer project outage
Incident Report for LiveBuzz
Postmortem

We experienced outage across registration forms and exhibitor hubs for a pro-longed period. We appreciate the fundamental impact this has on users attempting to register for an event at that time and apologise. We have learned lessons from this incident allowing us to provide more robust service going forwards.

What happened

The root cause of this issue was a Sectigo (formerly Comodo) root SSL certificate expiry. Generally speaking, this affected older, non-browser clients, specifically OpenSSL 1.0.x.

When you connect to a TLS server the server sends a certificate that proves its identity. The client builds up a chain from the server certificate to a root certificate it knows it trusts. A list of trusted roots is maintained by the client, as part of the web browser, SSL library, or operating system.

In this case the AddTrust External CA Root is relatively new, created in 2010, and took many years to become trusted by all clients. This was the case as recently as 2019.

Our certificate was issued in 2018 and therefore included the AddTrust External CA Root certificate in it’s own chain itself, as opposed to relying on the clients own trust stores.

Unfortunately this root level certificate has now expired. Fortunately, modern clients with well-written certificate validators (this includes all mainstream web browsers) won't have a problem with the expiration. Since they now trust root, they will build a chain to that root and ignore the fact that the server sent an expired certificate as part of the chain.

Other clients will have a problem, causing the validation to fail with an expired certificate error.

Remediation

Client operator

Our OpenSSL implementation on internal Ubuntu LTS servers was within scope of this issue. To resolve this issue we updated our local trust stores to ignore the AddTrust External CA Root certificate. This then meant the library built up the final part of the chain itself, succeeding without error. At this point all registration forms and exhibitor hubs returned to operational status.

Server operator

Although there has been no issue throughout for web browser clients, other clients such as external parties integrated with our API may have experienced the issue depending on their own SSL library implementations and trust stored.

We have updated all affected certificates throughout our infrastructure with the expired root CA now removed from all chains.

Deep analysis of the configuration of our SSL is now showing as valid. A report showing this, including A+ rating, can be accessed at https://www.ssllabs.com/ssltest/analyze.html?d=www.control.buzz&latest

In the future

This summer we are executing infrastructure migration from our current VMware vSphere Private Cloud to Microsoft Azure. Significantly the scope of SSL issues such as this will now sit with the relevant Azure service and as such is fully mitigated.

Posted Jun 03, 2020 - 08:56 BST

Resolved
The issue has been identified. A fix has been applied and all systems are operational. We're continuing to investigate the root cause of the issue.
Posted Jun 01, 2020 - 10:00 BST
Update
Workaround has been applied to all consumer projects. We're continuing to investigate the issue.
Posted Jun 01, 2020 - 09:58 BST
Update
We are applying a fix to all projects, expect to complete within 15 minutes.
Posted Jun 01, 2020 - 09:49 BST
Update
The error being encountered is related to SSL certificate expiration, on a non-expired certificate. We're continuing to investigate.
Posted Jun 01, 2020 - 09:25 BST
Investigating
We are currently investigating this issue.
Posted Jun 01, 2020 - 09:19 BST
This incident affected: LiveControl (Registration Pages, Exhibitor Hubs).