(Part 1) How to build SaaS Minimum Viable Products (MVPs) with Azure

Written by Pickrell Global Technologies Editorial Team | Nov 4, 2024 2:00:00 PM

While there are many important decisions to make when it comes to building your SaaS MVP, one of the most crucial choices is selecting an appropriate architecture based on your app's specific requirements. By picking the right provider and resources, you can ensure that your solution is secure, scalable, and cost-effective, which leads to direct business results.

This article is the first in a three-part series where we will architect three sample applications of increasing complexity using Microsoft's cloud, Azure. Azure's maturity, security measures, wide diversity of solutions, and relative affordability make it one of the best choices for hosting your SaaS MVP based on our experience building and launching dozens of applications across various cloud platforms.

Without further delay, let's dive into the detailed requirements of our example B2B SaaS MVP, ContractHub.

Example Application: ContractHub – A Contract Management Tool for Small Businesses

Description: ContractHub is a lightweight B2B application that helps small businesses organize, store, and track contracts with clients, suppliers, and partners. Users can upload documents, set contract expiration reminders, and receive notifications about upcoming renewals. This product serves as a straightforward contract management solution for businesses with smaller-scale needs, focusing on simplicity and usability.

Key Features:

Contract storage and organization
Reminder emails for contract expirations and renewals
Basic document tagging and search functionality
Limited analytics for contract types and renewal rates

Load Requirements:

Small user base with low concurrent usage (e.g., hundreds of users rather than thousands)
Minimal data storage and processing requirements

The Approach

After reading through the initial requirements there's a few things to take note of. First, the requirement to store documents that are most likely in pdf or word document format means that we're going to need a static asset storage system as traditional databases are often inefficient for storing files like these.

Next, the requirement that notifications must be set and sent on certain dates means that we need a cron/timed-job that runs in the background and routinely checks whether any notifications need to be sent.

Finally, the low expected usage of the application means that utilizing a messaging architecture with something like Service Bus is probably overkill, as the added complexity probably isn't worth the minimal expected increase in performance in this small of an application.

While not explicitly mentioned we can also probably assume that we want to ensure that the end solution is as cost-effective as possible without impacting user experience.

The Database

In general, when selecting an appropriate database for an application there are several questions we need to answer to see which database is the most appropriate for our needs:

What is the expected demand of the database?
Is the database demand consistent or variable (i.e. are there times where we have to handle many users and times where we have only a few or is the demand relatively consistent?)
Is the data best modeled by a NoSQL/JSON document model or a relational model?
What is the ratio of application reads to writes?

For our application, the expected demand is relatively low and likely consistent. This means that horizontal scalability isn't necessary since we're not expecting millions of records per table or fluctuating demand. The data model could arguably be defined as either relational or non-relational depending on how the tagging feature is handled, but for our case, let's say relational fits better. Given this, a SQL-based database is the best choice for our application. This leaves three main options on Azure: SQL Server, PostgreSQL, and MySQL.

For application reads/writes, we can expect a high ratio of reads vs. writes; a user uploads a document once but reviews it numerous times. In general, MySQL is the go-to choice for read-optimized applications, while PostgreSQL and SQL Server are better for balanced read/write workloads.

Thus, MySQL is likely the best database option for our application. However, for cost-effectiveness and since the choice between SQL Server and MySQL will likely not result in noticeable performance differences for an application of this size, we’ll start with a B-tier SQL Server instance that provides 5 DTUs and 2GB storage for approximately $5/month. This should be adequate to start, and we can easily upgrade to the S0 tier (10 DTUs and 250GB storage) for approximately $15/month if more computing power is needed.

Now that we’ve chosen our database, let’s add it to our architecture diagram below.

Static File Storage

Since our application needs to store documents and contracts, we’ll need a storage solution in Azure. This choice is straightforward, as Blob Storage is ideal for storing files like PDFs or Word documents with fast retrieval when needed. The main decision is choosing an accessibility tier: premium, hot, cool, cold, or archive. This affects pricing and is based on how often documents need to be retrieved. For instance, if we only need to access documents occasionally (e.g., for audits), we might choose the cold or archive tier, which is much cheaper than premium or hot.

For our needs, the Hot tier should suffice, as premium is approximately seven times as expensive. Our total storage costs are likely to be minimal, but let’s overestimate and assume $5/month.

From a security perspective, it’s essential that the blob storage container isn’t publicly accessible. Instead, it should be accessed in our application via short-lived SAS tokens.

Let's update our diagram to include our Blob Storage Container:

Application Hosting

With data and document storage decided, let’s focus on application hosting.

Assuming we built our application using a modern full-stack framework like Next.js, we have two main hosting options: Static Web Apps and Azure App Service.

Our general advice to clients building SaaS MVPs is to use fixed-price model services, such as App Service, rather than consumption-based pricing (like Static Web Apps) due to the potential for unexpectedly high costs, such as in the case of a DDoS attack (see this article on how a Netlify user received a $104k bill due to a cyberattack).

This leaves us with App Service, and for this relatively simple MVP, we’ll start with the Basic B1 Linux App Service plan at $12.50/month, which is adequate for our low-traffic app. Let’s add it to our diagram below, including arrows indicating the direction of network requests.

Timer / Cron Job

With our application layer, storage container, and database in place, let’s look at implementing the scheduled notifications functionality.

To send alerts to users based on scheduled reminders, we need a job that runs periodically and checks for users needing notifications via email.

In Azure, we have two options: WebJobs or Functions. WebJobs run on an App Service plan on a timed cadence (e.g., every day, every hour, etc.).

For our needs, the free-tier Functions will suffice, so we’ll go with an Azure Function with a Timer Trigger.

Sending Emails

To send users email reminders about contracts nearing expiration or renewal, we recommend SendGrid, which offers an easy setup and a generous free tier that includes basic email authorization and a 100-email-per-day limit.

With SendGrid, we send API requests to their email service, so let’s add this to our architecture diagram.

Logging

Logging is essential for any application, providing human-readable messages about system events. Different logging levels, such as informational, error, and critical, indicate various events within the application.

In Azure, Application Insights is the most logical choice for centralized logging, offering robust out-of-the-box logging for both our App Service application and Function App.

Monitoring

Now that we have logging set up in our application via Application Insights, we need to determine what we should be listening for to determine when there's signs of trouble that need to be addressed.

The first step is decide what we should be listening to, which are our Service Level Indicators, what those things should sound like which are our Service Level Objectives and when we should be scared of what we're hearing which are our Alerts.

Defining Service Level Indicators(SLIs)

While not explicitly mentioned in the above requirements it's always a best practice when developing applications to define a set of Service Level Indicators (SLIs) that are measurements of things that directly reflect the quality of the application. While there's many things that can be measured, three of the most important for ensuring the quality of the application in no particular order are 1. Performance, 2. Availability and 3. Error Rate.

Performance simply measures how fast users get a response when they take an action. For example, when a user navigates to the home page how fast should the page load? How fast should a document upload when a user submits it?

This is the most subjective of the three as a "slow" request may be 1000ms in some contexts and 500ms in others. For this specific application, let's define a "slow" request as anything that takes longer than 500ms and a "really slow" request as anything that takes longer than 2000ms.

Next, let's define our SLI for availability, which according to ScienceDirect refers to the uninterrupted accessibility and functionality of a system or software, ensuring it meets the specified availability requirements, supports failover mechanisms, proactive monitoring, and redundancy to minimize downtime.

In simpler terms, this means whether our application is available to serve requests from users. In our case, this can be measured simply by creating a health check endpoint that is pinged automatically by Azure and returns whether the system is available.

Finally, we need to define our SLI for Error Rate. For our application, this can be measured as the percentage of requests that return either a 2xx or 4xx status code divided by the total number of requests received by the system. If you're knowledgeable on status codes, this may sound counter-intuitive as a 4xx status code indicates certain types of failures such as 404 Not Found or 400 Bad Request. The reason why 4xx status code responses aren't counted against our total error rate is because in general it indicates the reason for the failure is the user's fault, not the fault of the application.

Defining Service Level Objectives (SLOs)

Now that we have our SLIs we can work on defining Service Level Objectives (SLOs) for each of the SLIs. A SLO simply defines a percentage threshold that we expect each of our SLIs to perform at.

To start with performance, we need to define a percent of the time that requests should not be "slow" or "really slow" based on our definition in the previous section.

For our application, it would probably be reasonable to say that 90% or more of our requests should be faster than "slow" (>500ms) and 99% of our requests should be greater than "really slow" (>2000ms).

Next, we need to define how available our system should be, i.e. what percent of the health check requests should return that the system is available. When deciding how available our system should be, the first thing we should consider is the SLOs of all dependencies of the application. For example, we're going to be using Azure App Service which has a 99.95% availability SLO and an Azure SQL DB which has a 99.99% availability SLO at the time of writing this article.

This matters because one of the most important rules of application availability is that in general our application cannot be more available than the least available dependency.

To determine the expected availability of our system, we can multiple the availability percentages of each of our dependencies and come up with a great starting point for our system. For us, when multiplying the SLOs of our resources such as App Service, SQL Database, SendGrid, and Storage container we arrive at a number slightly over 99.9%.

For non-mission critical applications like ours a 99.9% availability target is perfectly fine, so let's set it there.

Finally, to determine our SLO for error rate we can simply look at comparable B2B SaaS products in the industry, which typically aim for a 99.9% successful request rate. This is a great number for us to aim for, so let's use it as our SLO.

Creating Alerts

Now that we know the performance, availability and success rate SLOs that we're aiming for we need to determine when the engineers supporting the application need to be notified that something is wrong.

For example, if 1 request out of 10,000 fails there is likely no reason to alert as we would still be well within our "budget" for errors to meet our 99.9% target. But on the flipside if 500 out of those 10,000 fail it is important for an engineer to be notified so that they can investigate and take corrective action to try to preserve the SLO.

While this article won't get too detailed on the methodology behind creating effective and valuable alerts, Google's SRE team wrote what's arguably the most valuable resource on SLOs and alerting that goes into great detail on how they determine alerting strategy based on a short/long window approach that we'll follow here.

The following are example alert thresholds that we can use for our 99.9% Availability and Success Rate SLOs.

Short-Window Alert Threshold

Time Window: 5 minutes
Error Budget Threshold: 2% of monthly budget in 5 minutes.
Threshold Condition: Trigger alert if errors reach 2%

Long-Window Alert Threshold

Time Window: 1 hour
Error Budget Threshold: 10% of monthly budget in 1 hour.
Threshold Condition: Trigger alert if errors reach 10%

Next, we need to define our Alerts for our Performance SLO, which is based on two SLIs, "slow" requests and "really slow" requests.

Short-Window Alert

Thresholds:
- 90% Threshold: Set an alert if more than 15% of requests exceed 500ms within a 5-minute window. This is slightly more lenient than the 10% target to allow for transient spikes but will catch more significant issues.
- 99% Threshold: Set an alert if more than 2% of requests exceed 2000ms within a 5-minute window.
Conditions:
- Trigger an alert if:
  - More than 15% of requests exceed 500ms in a 5-minute window.
  - More than 2% of requests exceed 2000ms in a 5-minute window.
Example Threshold Conditions:
- If 1000 requests are received in 5 minutes, an alert triggers if:
  - More than 150 requests exceed 500ms.
  - More than 20 requests exceed 2000ms.

Long-Window Alert

Thresholds:
- 90% Threshold: Alert if more than 12% of requests exceed 500ms over a 1-hour window. This is close to the 10% threshold but accounts for minor deviations over longer periods.
- 99% Threshold: Alert if more than 1.5% of requests exceed 2000ms over a 1-hour window.
Conditions:
- Trigger an alert if:
  - More than 12% of requests exceed 500ms in a 1-hour window.
  - More than 1.5% of requests exceed 2000ms in a 1-hour window.
Example Threshold Conditions:
- If 10,000 requests are received in 1 hour, an alert triggers if:
  - More than 1200 requests exceed 500ms.
  - More than 150 requests exceed 2000ms.

Now that we have our alerts defined, we can go about creating them in Azure. To do this, we can use Alerts under Azure Monitor to define the thresholds above for alerting. We can write custom Kudu queries that trigger based on Application Insights metrics and alert via email, SMS or whatever other method you prefer. Azure's pricing for this is consumption based, but for our application we can probably assume that it will be less than $5.

Wrapping It All Up

As you can see, it doesn’t take thousands of dollars per month to architect a high-quality SaaS MVP on Azure. With a starting cost of $5/month for the SQL Database, $12.50/month for App Service, $5/month for Blob Storage, and $5/month for Logging and Alerting, our total monthly cost will likely be under $30—sufficient for our small user base.

By defining and alerting based on SLOs for performance, availability, and success rate, we ensure a smooth user experience and prompt issue resolution.

If you have an idea for a SaaS MVP and want to see what you're application's architecture would look like, feel free to Contact Us anytime for a free consultation and architecture design document.

Stay tuned for our next article in this series, where we’ll follow a similar approach to architect a higher-complexity application with different needs.

View full post