About Yesterday's Balsamiq Outage

March 1, 2017 · Posted by Peldi in Products · Add a comment

Hello friends of Balsamiq.

We had a MAJOR outage yesterday.

Our websites (balsamiq.com, support.balsamiq.com, docs.balsamiq.com, and uxapprentice.com) were up and running, but you couldn't download Mockups for Desktop or, in some cases, use our website's search functionality.

These inconveniences were nothing compared to what happened to myBalsamiq, our web app. It was completely unavailable for over 6 hours, possibly our biggest outage ever.

We Are Sorry

We are really sorry about this, and if you know us, you know that we're not just saying it. Hosting your data is a big responsibility, and we know that when you cannot get to it when you need it, it sucks. People miss meetings, or lessons, or generally cannot do their work.

What Went Wrong

One of our service providers (Amazon Web Services) had a major outage in their S3 service, which is what we use to serve static files. For more than 6 hours, these files were not accessible by our applications.

The S3 service is known in tech circles as being very reliable. So much so that we took it for granted and didn't prepare for a possible outage of this particular component of our applications. Clearly, that was a mistake.

What We're Doing About It

We have already started looking at how to make our services more resilient. One first step will be to use features of S3 that allow for duplication across data centers. A second step will be to keep copies of our static files on multiple vendors (Google, Microsoft) and not just on AWS.

This is a big effort and it will take time, but it's something we look forward to working on.

We Want to Make Things Right

If you were affected by yesterday's downtime, please have your myBalsamiq site owner email us at sales@balsamiq.com by March 15th and we'll be more than happy to add 3 months of free myBalsamiq credit to your account. It's the least we can do.

Moving On

We know we let you down, but we hope we'll be able to stay friends through this rough patch.

The way we see it is this: what doesn't kill us makes us stronger!

More About the Outage

It was a doozy. We weren't the only ones unprepared for such an outage.

Here are some other websites who were affected: Adobe services, Airbnb, Twitch, HipChat, Buffer, Business Insider, Citrix, Coursera, Docker, Expedia, Flipboard, Giphy, Gitlab, Heroku, Imgur, Lonely Planet, Mailchimp, Medium, News Corp, Quora, Slack, Trello, Twilio, The U.S. Securities and Exchange Commission (SEC), Zendesk, Freshdesk, Pinterest, Time Inc., Xero, Apple App Store, Apple Music, Apple iCloud services... and about 120,000 more.

And here's some news about it: