About Yesterday's Balsamiq Outage
Hello friends of Balsamiq.
We had a MAJOR outage yesterday.
Our websites (balsamiq.com, support.balsamiq.com, docs.balsamiq.com, and uxapprentice.com) were up and running, but you couldn't download Mockups for Desktop or, in some cases, use our website's search functionality.
These inconveniences were nothing compared to what happened to myBalsamiq, our web app. It was completely unavailable for over 6 hours, possibly our biggest outage ever.
We Are Sorry
We are really sorry about this, and if you know us, you know that we're not just saying it. Hosting your data is a big responsibility, and we know that when you cannot get to it when you need it, it sucks. People miss meetings, or lessons, or generally cannot do their work.
What Went Wrong
One of our service providers (Amazon Web Services) had a major outage in their S3 service, which is what we use to serve static files. For more than 6 hours, these files were not accessible by our applications.
The S3 service is known in tech circles as being very reliable. So much so that we took it for granted and didn't prepare for a possible outage of this particular component of our applications. Clearly, that was a mistake.
What We're Doing About It
We have already started looking at how to make our services more resilient. One first step will be to use features of S3 that allow for duplication across data centers. A second step will be to keep copies of our static files on multiple vendors (Google, Microsoft) and not just on AWS.
This is a big effort and it will take time, but it's something we look forward to working on.
We Want to Make Things Right
If you were affected by yesterday's downtime, please have your myBalsamiq site owner email us at firstname.lastname@example.org by March 15th and we'll be more than happy to add 3 months of free myBalsamiq credit to your account. It's the least we can do.
We know we let you down, but we hope we'll be able to stay friends through this rough patch.
The way we see it is this: what doesn't kill us makes us stronger!
More About the Outage
It was a doozy. We weren't the only ones unprepared for such an outage.
Here are some other websites who were affected: Adobe services, Airbnb, Twitch, HipChat, Buffer, Business Insider, Citrix, Coursera, Docker, Expedia, Flipboard, Giphy, Gitlab, Heroku, Imgur, Lonely Planet, Mailchimp, Medium, News Corp, Quora, Slack, Trello, Twilio, The U.S. Securities and Exchange Commission (SEC), Zendesk, Freshdesk, Pinterest, Time Inc., Xero, Apple App Store, Apple Music, Apple iCloud services... and about 120,000 more.
And here's some news about it:
- Massive Amazon cloud service outage disrupts sites - USA Today
- AWS's S3 outage was so bad Amazon couldn't get into its own dashboard to warn the world - The Register
- Amazon AWS S3 outage is breaking things for a lot of websites and apps - TechCrunch
- AWS is investigating S3 issues, affecting Quora, Slack, Trello (updated) - VentureBeat
- AWS Takes Down Hundreds of Sites in Massive S3 Outage - The Whirl
— Fernando (@fmc_sea) February 28, 2017
Once again, we're really really sorry about the outage.