Simulating cloud provider downtime with Cloudburst

Available as an open-source tool

Thursday 18 January 2024
Article by: Elmer Lastdrager, Caspar Schutijser, Ralph Koning

Cloudburst is a tool that simulates (public) cloud outages by blocking interactions with selected cloud operators. It allows enterprises to test what-if scenarios and to assess under practical conditions the extent to which they rely on cloud services and can cope with cloud outages. We have open-sourced Cloudburst so everyone can simulate such disruptions, use the results to further improve their security practices and customise the tool to support their environments. We believe Cloudburst is also relevant for individual users, who might want to use it to get a better feel for ongoing centralisation of the internet.

Reliance on the cloud

Organisations are increasingly migrating their ICT operations to cloud providers. The cloud allows businesses to easily adjust their resources according to fluctuating demands, often optimising cost-efficiency. It additionally provides ease of maintenance and reduced need for physical infrastructure, further streamlining operations and enabling IT teams to focus on strategic initiatives instead of routine management.

However, dependencies on external service providers also introduce supply chain risks. For example, Slack customers experienced outages for several hours in January of 2021 because of a scaling problem at AWS. Such events may even have a global impact if they occur at large cloud providers, as exemplified by the many sites that became unavailable in July of 2021 because of disruptions at content distribution networks Akamai and Fastly.

The problem is that enterprises usually need to wait for the cloud operator to fix the failure, which might reduce the availability of their services. They therefore often work with multiple cloud operators (e.g. in the banking industry) or fall back to on-premises infrastructure (e.g. at RIPE NCC), amongst other strategies. Many organisations want to regularly test whether such mitigations work as intended as part of their business continuity management. That is where Cloudburst could play a role.

What is Cloudburst?

Cloudburst is a tool that enables IT teams to simulate the outage of a cloud provider under practical conditions by blocking network connections with that provider. Using Cloudburst, a security team could, for instance, discover that the messaging application that they rely on in emergencies has a hidden dependency on the same cloud services that the organisation uses for normal operations. If such a dependency goes unnoticed, coordination and crisis response will be difficult, since both primary and backup communications will be affected. When such an issue is identified using Cloudburst, the IT team can take action before the issue becomes a problem, by updating their security procedures and switching to another messaging service for crisis communication.

Figure 1 is a screenshot of Cloudburst. It shows the web interface where connected clients can select which cloud services they want to simulate an outage of. Cloudburst uses simple and off-the-shelf techniques, such as a firewall and a VPN to connect clients and simulate outages (more details below).

Screenshot of the Cloudburst user interface.

Figure 1: Cloudburst screenshot.

Internet centralisation

Cloudburst can also act as an awareness tool to give end users insight into the ongoing centralisation of the internet towards a small number of economically very powerful companies. Internet centralisation, in which a relatively small number of large service providers attract a large number of customers and their network traffic, reduces the decentralised nature of the open internet. That is often considered undesirable. Those large service providers are deploying their own wide area networks, accessible only to them, which could lead to less competition on the internet and thus to reduced innovation. Additionally, in the Domain Name System (DNS), for example, centralisation increases the risk of downtime if a domain's name servers are not distributed across multiple networks.

In the remainder of this blog, we will discuss how Cloudburst blocks cloud providers and how it can be deployed by IT teams and individuals.

Methods of deployment

Cloudburst should be deployed in such a way that users (IT teams and individuals) understand what is going on and can easily control the blocking of cloud providers. We have developed two methods of deployment.

The first method is through a WireGuard VPN tunnel, which is the easiest approach. Users simply need to install the WireGuard VPN client software on their devices, which is available for all major platforms. To deploy Cloudburst in this way, we recommend using a virtual machine running Debian 12. Detailed installation instructions are available on GitHub.

Once installed, the user can browse to your public Cloudburst website and scan the QR-code using the WireGuard app on a PC or mobile (shown in Figure 1 on the left). Browsing to the same endpoint should then return the internal Cloudburst website, meaning users are successfully connected to Cloudburst. Through this internal Cloudburst website, a user can control Cloudburst (shown in Figure 1 on the right).

The second method of deployment is using a Wi-Fi router. This method requires more technical know-how and knowledge of the local network environment. In essence, we set up a WireGuard tunnel between the Wi-Fi router and a Cloudburst virtual machine (acting as a VPN server and running the Cloudburst software) and set some routes to connect Wi-Fi users via the shared tunnel. We used this configuration as our mobile setup so that we can use Cloudburst at conferences with an Omnia Turris Wi-Fi router together with a Raspberry Pi.

Methods of blocking

To simulate a cloud outage, we need to know which IP addresses a cloud service uses, so that we can block them. For that, we use so-called IP blocklists, which are lists of IP addresses that can be blocked in a firewall. Even though individual cloud providers provide lists of IP addresses that they use, our experience is that they are not always complete, and they are often provided in different formats.

We therefore provide Cloudburst users with the ability to compile IP blocklists. We provide a few options for generating a blocklist:

Online published blocklists. Depending on the format used, a lightweight parser should be implemented.
Offline blocklists as filenames. You can use your own preferred method to compile them.
Autonomous system numbers. We use them to obtain IP prefixes using the tool 'bgpq4'.

For our purposes, those options provide sufficient flexibility for blocking clouds, but users can supplement them with other methods (such as DNS-based blocking) if necessary. Technically, our implementation uses ipsets and iptables rules to load the lists into netfilter.

Limitations

Cloudburst works well as a means of testing a scenario in which a cloud goes offline. However, it is unable to simulate a cloud that is partially offline. Additionally, there may be dependencies in the private network of cloud providers that Cloudburst is unable to simulate. For example, if a company hosts a website on their own server, but the website uses a cloud-based API in the backend, then Cloudburst will not be able to block that API call.

Demonstrations

We have demonstrated Cloudburst at SIDN Inspire, Public Spaces and ICT.Open, and received a lot of positive feedback, from both technical and non-technical users. Based on their feedback, we're currently exploring how we can use Cloudburst at SIDN because we run some of our components in the cloud as well, such as the LogoMotive algorithm.

Source code

We have made the source code of Cloudburst publicly available under an open-source licence so you can try it yourself. You can download the source code from GitHub.