Yola News 12

An apology, an explanation and a commitment

In the last 30 days Yola has had far too many incidents of downtime. The engineers at Yola who create the systems and take responsibility for them being up and available find this unacceptable.

Let me highlight the area where we performed worst in terms of uptime – the Published Sites. This means your site that you have created and host with us. In the past 30 days, if you have a live site, it has been unavailable for a total of 3 hours. The longest time it was unavailable was 43 minutes, and the shortest time was just one minute.

As engineers, we take the uptime of your site very seriously, even personally. We are extremely proud when we get to the end of a month and there has been no downtime. It is usually the area where we have the least downtime – because we put an extraordinary amount of effort into keeping your site up and fast. We know what it means when your site is unavailable – it could be at a time critical to your business or personal brand and you could be losing money. When your site is unavailable you feel helpless and frustrated.

As the engineer who takes responsibility for everyone involved in keeping your site up, I would like to apologize to you, explain why it has been such a rough month and tell you what we are doing to make sure it never happens again.

There has been no disaster in our service infrastructure, as you might expect after such a bad month. In fact most of the issues have been unpredicted (and honestly, completely unlikely) side effects of some real, meaningful upgrades to the infrastructure that runs our millions of hosted sites. There is always a very small chance that no matter how well you plan for upgrades to happen ‘transparently’ and professionally, something could go wrong. It feels like we have have been quite unlucky in this regard.

The upgrades we have been doing involve a process of removing our reliance on infrastructure that is known to fail sometimes, so that we are left with infrastructure that can cope when things go wrong and does not cause downtime. But more than once in the last month, that failure-prone infrastructure would fail as we were removing it, causing downtime. For those of you interested in detail, the issues have included operating system failures and network filesystem failures.

Here is a commitment to you.

  • Your site’s availability and uptime is our number one priority.
  • Your site data is safe and secure. We take care of backing it up and making sure we can recover it in the event of things like natural disasters.
  • We will continue to improve your site’s uptime and response time.
  • We strive for 100% uptime on your site.
  • The same applies to our Site Builder and MyYola, although the uptime of your site will always take precedence over these if we need to make that call.

Please contact me if you have any questions whatsoever.

Lisa Retief
VP of Engineering at Yola