Example of DynamoDB Implementation using Single Table Design



The idea of Single Table Design was shared in AWS re:Invent 2019. It is a great idea of solving the problem that typically handled by SQL join tables; at the same time optimise the implementation of DynamoDB from cost & performance perspective.


Logical Partition through Partition Key & Sort Key

In a nutshell, Single Table Design for DynamoDB uses Partition Key & Sort Key as a logical partition. If you create a DynamoDB table to store booking records for your customer the Partition Key (PK) & Sort Key (SK) can looks like this:

  1. PK: CUSTOMER#00001  SK: BOOKING#12345
  2. PK: CUSTOMER#00001 SK: CUSTOMERPROFILE#123
  3. BOOKSLOT#45678 SK: BOOKING#12345
  4. BOOKSLOT#45679 SK: BOOKING#12345

Use PK and SK to link entity together 


Within the same DynamoDB Table, you can technically store Booking Record (BOOKING#), Booking Slot/ Details Record (BOOKSLOT#) and Customer Record (CUSTOMER#) who created the booking. All thanks to the concept of NoSQL, data with different attributes can all fit into a single table as long as all data include Partition Key (PK) and Sort Key (SK)  By doing so, you actually overcome the challenge  of doing SQL join in DynamoDB by retrieving different entities from one table. Keeping them all in one table, you can now retrieve all the data needed in your booking screen through one retrieval call ( potentially 1 RCU; depending on the size of the data) rather than multiple retrieval calls. ( retrieve Booking Record then retrieve Booking Slot/Details Record) 

In the access pattern shared in AWS re:Invent 2019 by Alex Debrie, you can perform the following from the single table: 
  • Retrieve Booking from a specific Customer by filtering PK = 'CUSTOMER#0001' and BEGIN_WITH(SK, 'BOOKING#')
  • Retrieve Booking with Booking Slot by filtering SK='BOOKING#12345' 

Isn't it great?

RCU and WCU

Minimise RCU and WCU is important. Pricing of DynamicDB is primarily based on the consumption of RCU (Read Capacity Unit) and WCU (Write Capacity Unit). You are being charged based on how many retrieval / write call you made to DynamoDB. Single Table Design reduce your RCU and WCU, hence reducing your cost.

Is having everything in one table the best design?

Let's face the reality that you hardly have a scenario to retrieve and update all entities at the same time. If you are building an Enterprise grade OLTP transactional system, you will have a clear segregation of role and responsibility. There will be different user role update different entities. Hence, you will then have different screens to perform different update. (make sense?) Having Single Table for all entities won't benefit much in terms of RCU and WCU. Eventually, you need to make different API calls for data retrieval as the functionalities are residing at different screens.

   

One Domain Data in One Table

With the above idea in mind, let's extend the scope of our example. We are now building an OLTP transactional system doing the following:

  1. Booking Management
  2. Payment Management
  3. Customer Management
In this case, we probably need 3 sets of screens to manage the data. As there will be separate API call for this, having the domain data split into 3 DynamoDB table will not be so much of cost impact. (You may benefit to have separate DynamoDB Stream for each table) We can have the 3 tables with the following setup:

3 Tables and their Partition Key and Sort Key

This is the expansion of the Single Table Design, we remain to group relevant data (e.g. Booking, Booking Slot and Customer Profile) into one DynamoDB table. So, we can retrieve all the data in one retrieval call. 

Duplicate Record in Multiple Tables
If you noticed, this design seems to introducing another problem; we started to replicate Customer Profile across multiple tables. The reason for this is we might need to show Customer Information in conjunction with Booking / Payment data. Replicate the information will allow us to make 1 retrieval call for all information. But, you would now need 3 WCU when you update a single Customer Profile record as the same data need to be updated across BookingData, PaymentData and CustomerData.

Assuming Customer Profile being a master record, chance of updating can be relatively low compare to transactional data such as Booking and Payment. If the update (Customer Profile) that potentially happened is 10% of the transactional volume. (100k transaction update vs 10k master record update ) We might eventually have more cost saving by having duplicate record of Customer Profile across multiple tables.

Below is an estimated cost calculation for monthly cost based on RCU and WCU of the 3 tables:
Breakdown of Number of RCU & WCU for Approach A & B

Above table is comparing cost of Approach A - Duplicate Customer Profile in 3 tables vs Approach B - Keep Customer Profile in 1 Table. The cost of Approach A is 7% ($255.42-$209.98=$15.44 based on 1,000,000 monthly booking)  less than Approach B. The difference can be higher if the ratio is less than 10%. (1,000,000 month booking, 100,000 update to Customer Profile)

Conclusion 
Fundamentally, Design of a NoSQL database like DynamoDB is a complete mind shifting compare to traditional RDBMS database design. It will be nice to think out of the box if you are a traditional application designer like me. Hope my little sharing is useful for you.




 

Comments

Popular posts from this blog

API Versioning with Node.JS

Build Retry-able API using idempotency key

Build a Batch Framework with Go