Why a poorly architected Relational DB could be costing you $100s a month

The Elephant in the room

Databases…oh databases. The thing that every business tends to put the least amount of time towards but expects it to work perfectly and in the most efficient way possible right out of the gate.

Why is that the case? Why do we tend to take for granted the backbone of nearly every great application?

Instead of changing the oil of the powerhouse of your app, companies allow for mediocracy in architecture to take over and drive up their operational expenses. Coupled with more and more developers entering the market with a lack of serious database experience (If this is you, we got you covered in our DB fundamentals course!); poor architectures are on the rise, and it’s pertinent that we combat it with knowledge.

Overview

It’s time to finally look under the hood of what’s going on with your apps’ data architecture. Are you doing everything you should be to drive down your OPEX costs? Or are sitting back and praying that your RDB architecture is the cream of the crop and is the silver bullet.

I’m hoping you selected the first one, but if not, it’s totally fine! By the end of this article, you’ll have a better idea of how to pinpoint weaknesses in your current RDB architecture to start driving down your OPEX costs and give your customers a better experience!

I’m going to discuss my top 3 signs of an RDB that has been set up for failure when we’re talking about scale. If you have 10 people hitting your database, then you can probably get away with any of these. But we’re go-getters, so I know you’ll have 1000’s of requests hitting your site. Let’s jump in so you’re ready for when your app goes big!


You have Unnormalized Data

Although the big craze right now is NoSQL databases with highly unnormalized data (DynamoDB, MongoDB, etc)…you want the opposite of that in your relational database. See, when RBS’s are unnormalized, you essentially have a ticking time-bomb waiting in your hands. Sure, you can look at the bright side and be happy that it hasn’t blown-up on you yet, but in due time my friend, it sure will.

To better understand why it’s an issue having data not in 3NF (More power to you if it’s in EBNF) we’ll look at the three anomalies that come from unnormalized data and how they are affecting your application.

Insertion Anomaly: This is when you can’t insert new data into your table because of the lack of other data needed to insert.

For instance, if your relation for a College Course required a professor to be provided with the creation of the course, this would be an insertion anomaly. You shouldn’t need a professor when you’re creating a course. Correlating a professor to a course should be another relation called TeachesCourse.

Insertion anomalies cause tight coupling of data that should not be coupled. Although not as much of a hinderance with the cost for your business, it’s a pain in the ass for technical debt and will cause your developers nightmares in the future. This technical debt will then lead into more money pushed into fixing this in the future, when this could’ve just been done right from the start! Now, let’s see why this tight coupling can then lead to more issues.

Deletion Anomaly: A deletion anomaly occurs when you delete information that is unwanted and unintentionally delete other information as well.

Let’s continue with the story of the Class and Professor. If a professor leaves and we delete the records with the professor, now this course is gone. What are you going to do? Now your students can’t sign up for a class that they need to take to graduate and your phone lines are buzzing out the wazoo! Just with a simple example I’m able to describe the destruction that would happen just from a single unnormalized table. Now imagine if that was a doctor and then all the patients got deleted…πŸ’₯

Update Anomaly: Update anomalies are probably my favorite. The reason being is that update anomalies can easily be seen as a potential problem when creating a database schema, but people just overlook them. So, what are they?

Update anomalies are when you have multiple sources of truth for some data and only certain sources of truth get updated.

Let’s think of the scenario with the Professor again. The professor’s name and home address, gets stored in various relations throughout your database. Now let’s suppose she gets married and moves to a new home. We update her entity and we’re all good, right? WRONG.

Because we have unnormalized data, we have to update that information EVERYWHERE IT’S FOUND IN THE DATABASE. Imagine how much of a pain it will be if the professors’ information is scattered in multiple relations within the DB. Not only that, but we’re also having to update more tables in the database, which uses more computing power, which means more πŸ’Έ is flying out of your companies pocket. Also, even somehow manage to update all of the relations needed today, what happens when newer relations are added and the professors info is copied over again? You will never have that guarantee of complete update coverage unless that data is normalized.

Now let’s think of the business ramifications of update anomalies. When you have update anomalies, information is now stale and incorrect. What if someone’s medication is set to autopay and they just updated their home address, but you send the medication to the wrong address because you forgot to update it within the other relation? They could die. Or if their updated name doesn’t match in every relation in your database, and they can’t take their certification exam on your platform which allows them to get a job next week. Now they’re potentially jobless.

As developers, we have to think about the cost not only to our company but also to the users of our platform. I can guarantee you, unnormalized data in RDB’s will cost your company hundreds if not thousands of dollars in damages, fixing shipping costs, dropped customers, etc, etc. If you can do one thing out of everything else in this article. Please, normalize your RDB’s.


You don’t cache queries

One of the biggest architecture mistakes that I see, is allowing a single DB to take the full workload of a read intensive application. It would go something like this for a book store app:

Want the latest books? Hit the main DB. Want to add a book? Hit the DB. Want to view the top 10 books? Hit the DB.

Out of all of those requests, probably only one of them should’ve hit the main DB, and that’s solely because it was a PUT request to add a new book. No matter what, this type of request has to hit the main DB. But what about the other two? Considering they sound like something that would be on the homepage of every persons app, I don’t think they should’ve made it past a cache!

For read intensive applications, the read to write ratio is usually 70/30. That means 70% of the time, the user is asking for information! Read requests are an operation that can be alleviated from the master DB through caching! All of the requests asking for the top 10 books, would’ve made it to the cache and never have gone any further. This is because they’re common requests, so the cache will cache their result and get the data back faster to the caller than any database transaction could ever happen.

So you may be thinking, πŸ€” “Brady, why is this even important? The same outcome happens and my customers get the data.

Which honestly is a great question. But to answer this question, I want you to think of a time that you opened up an application and it it took forever to load the data that was being displayed to you. How did you feel? I probably would’ve left the site which means that business just lost my money. All because of a slow load. Who know why exactly why it loaded slow? Maybe it’s because they’re using a bloated frontend framework, maybe they had denormalized data and their DB was struggling, or maybe they just took in 1000 orders at once and the master DB was flooded with ACID transactions. In most of these scenarios (besides the frontend decision), the master DB would’ve been so grateful to have any read request taken off of its’ hands from a cache.

Caching can also bring down some costs on data transfer out of the DB. Although you will be paying for a server instance to run the cache, in most cases, it will be more cost effective for your business to be caching common requests. Besides just cost efficiency, performance of your application drives the customer experience. Caches will decrease request latency for common requests which means your performance goes πŸ†™ . Consequently, if you can insure you have the best performing application possible for the scale of your customer base, you wont be worried about losing thousands of dollars to customers like me due to LONG LOADING times.


You have a single compute bottleneck

Although caching requests is amazing, it does not take care of every issue that comes with read heavy DB’s. The issue that it does not take care of is the single point of routed database traffic. To fix this, we can look at two other approaches that help alleviate the single point of congestion. These are: Master Servant Replication and Sharding. Both of these architectures have their advantages and disadvantages, but they both take care of splitting up the workload across multiple machines (Horizontal scaling) which allows for more throughput due to their concurrent nature. I’m going to give a high level overview of both architectures and describe how they loosen the single server bottleneck.

Master-Servant Replication
The way I think of this architecture is that we have one database (the master) that takes in all of the Update, Create, and Delete (CUD) requests. We then have other databases that replicate the current state of the master database. The purpose of these other databases is to take the read intense requests off of the master DB and allow the master db to focus all of its’ attention on the CUD transactions. Once the master takes in a CUD transaction, the effects are then rippled to the replicated DB’s so they have the most up to date data.


Having this architecture works perfectly for read-intensive applications because we can sit all of the databases behind a load balancer. The load balancer will then direct request traffic to balance the work across multiple databases, thus allowing us to answer more requests! The more requests we can answer in a smaller period of time, the happier our customers are! Keep your customer first, and everything else will fall into place.

Database Sharding
Database sharding is similar to the previous architecture in that there are multiple servers that sit behind an application router (you will see why it’s more than a load balancer) and that there is a database running on each server. The difference is that sharding is a horizontal scaling technique where we split the table horizontally and have each server dedicated to a shard of the tables based on a range (range of values), key (hashed value), or directory (lookup table on the router). The mapping, which is decided upon implementation, will tell the router which server to send the request to for fulfillment!

For instance, with applications like slack, they could map requests to databases based on the workspace id. Then, all requests for that workspace will go to the specific database for that maintains the shard for all workspaces within that range, key, or directory.

As you can see, sharding also alleviates the single server bottleneck and promotes high availability for your customers now, and into the future when that number triples! Creating a highly available architecture means there is a minuscule chance that customers are lost due to slow server and database responses. This alone can drive up sales and create conversions that you weren’t seeing before!

Comparisons
One of the big differences between the two architectures is with sharding, each database handles all types of transactions whereas, in Master-Servant replication, only one database handled the CUD transactions. The drawback to sharding, however, is that it can be very complex to implement whereas replication is relatively simple in implementation. Some database’s support sharding out of the box, but implementing it on your own can be cumbersome, BUT, very rewarding for the customer if done right.


Wrap Up

As we’ve seen, making sure that we have a well architected RDB can be make or break for businesses. Ensuring that you have highly available applications that can scale to meet the demands of your users is essential for the longevity of your business.

We also saw that having a RDB that is well architected starts at the core. Ensuring that your database is normalized should be the first step that everyone takes (If you don’t know how to do this, stay tuned for next weeks blog talking about normalizing). Lastly, we focused on making sure your RDB has the architecture to ensure its’ success. We showed how to do this through caching common queries as well as alleviating the single server RDB bottleneck that most apps start with.

I hope you were able to learn a thing or two about RDB’s! I know this topic can be a little daunting at first, so take a day away and come back to it if it’s not sitting right away.

πŸŽ‰ πŸŽ‰ πŸŽ‰ πŸŽ‰
Also, make sure to pat yourself on the back for taking one step forward today!
πŸŽ‰ πŸŽ‰ πŸŽ‰ πŸŽ‰

If you liked todays content, make sure you subscribe to the newsletter down below! As always, thanks for taking the time to unwrap some bytes with me. Cheers! 🍻

Processing…
Success! You're on the list.

Leave A Comment