Uptime is becoming increasingly critical for cloud-based apps, as they take center stage in the business world. Expectations for availability are high, and every moment feels urgent. And if we’re going to make ourselves available at all times, we expect no less from our apps—both at work and at home.
But what does availability mean for an app, and how do cloud providers deliver on their uptime promises?
Let’s get into it:
- What is “uptime”?
- What is “Five 9s availability”?
- The growing importance of uptime for businesses
- 5 uptime questions to ask your service provider
- How cloud providers ensure high uptime
What is “uptime”?
Simply put, uptime is the percentage of time, per quarter, that a given cloud-based app is up and running. Most enterprise communications providers offer Service Level Agreements (SLAs) that commit to a certain minimum percentage of uptime in a given period (or, conversely, maximum downtime). The closer the percentage is to 100%, the less projected downtime per quarter.
What is “Five 9s availability”?
99.999% availability (also known as “Five 9s”) means an app has less than 78 seconds of downtime per quarter. This is widely considered the holy grail of availability—remaining online virtually the entire time.
Of course, not every company can guarantee Five 9s, and lower guarantees can translate to possibly significant downtime. For example, 95% availability—which sounds like a high number—actually equates to up to 18 days of downtime annually.
Increased downtime for cloud communications apps can actually have potentially devastating consequences, especially in certain industries. For example:
- Healthcare: Patients can’t reach doctors for critical information
- Education: Students are not able to access lessons remotely
- Public sector: Citizens can’t reach critical government services
- Financial services: Clients cannot execute their desired trade
- Retail: When consumers are unable to reach an associate, 46% of shoppers will not buy intended product, 35% will switch to another retailer, 17% will write a negative review
The growing importance of uptime for business communications
For unified communications, the importance of continuous availability only increases. Because communication is at the heart of any successful organization, communications solutions need to withstand a multitude of obstacles.
These include natural disasters, seasonal surges (such as the first day of school or holiday buying), unexpected surges (such as what we experienced with the pandemic), or company-specific issues (such as hosting a large all-hands session online).
In addition to these variables, Unified Communications as a Service (UCaaS) and Contact Center as a Service (CCaaS) providers also need to remain available across many different devices (laptop, mobile, or tablet) and connectivity options (WiFi, 3G/4G/5G, or a switch from one to the other) that customers might use to connect.
5 uptime questions to ask your service provider
Providers’ SLAs vary, with differing levels of commitment to uptime. When evaluating cloud communication and collaboration solutions, be sure to get detailed responses to the following questions about uptime:
- How is the service provider ensuring data redundancy?
- How is the infrastructure prepared for events and surges your business might experience?
- Does the provider conduct in-depth and frequent disruptive testing ( the process of simulating failures in real-world situations), disaster recovery tests? Are the test results and findings shared with customers?
- What are the provider’s business continuity plans? Be sure to go beyond whether the provider has a business continuity plan to determine how often they test and revise it, for example.
- Ask for supporting third-party test reports and accreditations, wherever applicable.
How cloud providers ensure high uptime
There are some critical elements that all highly available Software as a Service (SaaS) companies need to get right, starting with building a scalable, redundant, and secure infrastructure. Here are a few of the hallmarks of highly available solutions:
- It’s critical to host cloud solutions in top-tier data centers with geographic redundancy, meaning in the event of an outage in one data center, another data center in another location is already set up to automatically handle the load with no issues.
- Providers must also ensure this kind of capability within each data center by using similar architectures that feature multiple layers of redundancy in case problems arise.
- Maintaining high levels of uptime requires providers who build advanced system monitoring capabilities that allow them to identify issues before they happen and quickly resolve and remediate them when they do.
- Highly available solutions providers have strong internal controls and policies in place to minimize risk and ensure uptime.
Let’s dig in even further on this with a real company, to see how we at RingCentral deliver on our uptime promise.
How RingCentral builds Five 9s availability
RingCentral’s cloud architecture is built on what’s known as a multi-cloud, multi-network, point-of-delivery (PoD) design. In other words, we use a modular approach that allows our solutions to intelligently scale and manage increases in usage across messaging, video calls, and phone, while also providing resiliency and redundancy.
The multi-tenant network is designed with built-in 2x capacity, which means customers can double their usage overnight without an issue. Also, systems are designed with concurrent usage in mind. This ensures that the service is always available even when there are usage fluctuations at the customer’s end.
Data centers
RingCentral maintains “geo-redundant data centers,” which means they’re similarly configured across multiple regions to ensure that service continues despite possible outages.
In the event of a data center failure, RingCentral’s automated systems (built with active-active design), in conjunction with an always-on and world-class network operations center (NOC), ensure a rapid transition to back-up systems as needed to maintain uninterrupted service availability.
Simply put, should an issue arise in any one data center, another data center automatically assumes the load with no downtime.
RingCentral employs three layers of network and service redundancy to ensure that customers’ phone systems remain up and running:
- Our data centers provide the first layer of redundancy. Data between bi-coastal locations is synchronized consistently, with latency of less than one minute. Each component has a redundant power supply, which delivers seamless operation and 99.999% availability in case of geographic outages or any natural disaster.
In fact, RingCentral has delivered twenty consecutive quarters of 99.999% uptime SLA for our flagship product RingCentral MVP. The data centers share hosted facilities space with some of the world’s largest Internet companies and financial institutions. In addition, they’re in close physical proximity to the world’s top 20 Internet exchange points, and are co-located with all the major U.S. telecommunications carriers to maintain the fastest response times and interconnect services possible.
- Our architecture is vendor-agnostic and commodity-based, meaning it’s fully replaceable and fault-tolerant, providing a second layer of redundancy.
- Our third layer of redundancy utilizes both load balancing and failover technology to keep our systems continuously up and running. For example, primary and secondary servers contain multiple servers that back each other up.
Beyond Five 9s: Our commitment to relentless innovation
There have been several areas in particular where RingCentral has concentrated its attention in an effort to continuously improve our availability:
Agile development
With decades of stable, mature operational procedures, our proven architecture enables agile development with the ability to support our growing global customer base and partners.
Application lifecycle management
Our investments here help minimize errors, disruptions, and the risk of failure. Our engineering, cloud operations, and support teams work in concert with customers to deploy new innovations while minimizing potential impacts.
Our PoD deployment architecture, combined with our rigorous testing, Q&A, and staging processes, ensures that changes get synchronized while isolating updates and changes as they’re rolled into production.
This very controlled synchronization of updates means that changes don’t inadvertently create delays, outages, or downtime. It’s also important that we work closely with customers to consider critical situational factors (e.g., surges in usage for the first day of virtual school or Black Friday, etc.) and evaluate the most appropriate times for change. It’s critical to ensure that any changes have been made and tested well before these major events.
Sophisticated machine learning (ML) and artificial intelligence (AI) automation
When it comes to insights, collecting data is the easy part. RingCentral has built the supporting technology infrastructure and combined that knowledge with decades of industry expertise in messaging, video, and phone to create meaningful and actionable insights.
Our ML and AI layers are built on a single data lake that aggregates all operational, usage, and simulated testing data to identify events, correlate them, respond, and remediate. RingCentral’s sophisticated architecture is the key to enabling a data-driven approach to product development, engineering, operations, and support.
RingCentral monitors and manages every aspect of the service from top to bottom—from edge to core—to ensure the highest quality, reliability, and security. This architecture has also enabled RingCentral to provide customers with high quality-of-service analytics and insights in a single pane of glass across messaging, video meetings, and phone with tremendous detail.
Team building and a culture of trust
RingCentral teams prepare for everything using rigorous testing. Everybody brings a different opinion and skillset. Such exercises build trust in each other’s capabilities so teams can rely on one another in every situation.
Reliability for tomorrow’s workplace
As organizations continue to work remotely—and plan to even beyond the pandemic—teams are now depending on cloud communications more than ever. From team messaging to video conferencing and virtual calling, remote workforces need their cloud communications to work flawlessly under any circumstances if they want to succeed.
Service providers that offer a 99.999% uptime SLA ensure that no matter where your future employees work, you can trust that their communication tools will always be available
Originally published Aug 31, 2022, updated Aug 16, 2024