Database Sharding : Managing Data Effectively

Posted by: admin
Category: Database, Design Pattern, SQL
Data Shrading
Data Shrading

Let’s say an application is popular and have heavy traffic from across the world. This leads to a situation where your database gets loaded with thousands of new records every day. Now the questions arises about managing a data of users and for the users. Database Sharding is the best way to proceed with in this scenario.

What is Database Sharding?

Sharding is partitioning of data. Sharding is a method to split a huge dataset into multiple databases through specific technique of partitioning. In general, sharding can be done in two ways Vertical & Horizontal.

Vertical sharding : A technique of storing a data by into multiple sets by diving properties. Let’s say we have user data with around 25 different properties into single table/dataset. We can divide these properties to multiple tables/sets like separate tables/set for personal information, login credentials, health details, biographic details etc.

Horizontal sharding divides the data based on rows into multiple tables or location. i.e., We have 4 million users’ data. We have applied sharding and divided data into 4 different sets of 1 million users based on let’s say regions.

Strategy

Shard or Partition Key is a portion of primary key which help us to plan how the data can be distributed. Plan on how to work with read-writes over the data.

Each database type has a different way to handle sharding process but it can be divided into two types in general scenarios. A logical shard is a technique to store the data with same partition keys shared between multiple data. A physical shard is something with multiple logical shards. To know more about what MciroSoft says about this please Click Here

Algorithmic Sharding

It takes data as input and apply hash function over it to generate hash output and based on the hash it stores that record to appropriate shard. As per my knowledge generally modulus operator is more popular in taking as subset of data as input and generate number of shards.

As we would be having partition key along with us. Reads will be performed over single database. In absence of partition key , then every database needs to be be searched to get the records. Generally, this kind of sharding is suitable for key-value databases as sharding function uniformly distributes data according the logic specified in function.

Dynamic Sharding

In dynamic sharding, there will be a locator service which interacts for each read-write entries to locate the right database. As per different partition keys available, locator service will handle the queries for read and write efficiently. Same as algorithmic sharding, here all queries will have the partition key available. Queries without that have to search the records in all databases.

It is not as uniformed as algorithmic one, has multiple challenges in implementation. Here locator service is point where all queries comes first, so if that fails, entire structure fails. In case of routing failures it is hard to identify those records in database and is a time consuming process.

Here, another concept has also become popular called entity groups which means storing all kind of data for single user with same partition key. This will allow us to read the things very easily and efficiently where as cross partitioned queries might take place but frequency of the same would be lower so it would give better results.

Advantages

High Availability & optimized results

As partitioned databases with partition key will be specified with each query. Query is dedicated to single database only. Results into a quick and optimized response.

Easy failure recovery

The core of Data Shrading is distributed architecture. If any of the partition failed, only specific partitioned data will be impacted, rest will work perfectly. This can be resolved through replica or database backups of that partition only.

Points to Consider

Selection of strategy

Appropriate strategy to handle sharding process should be selected. Changing strategy mid way will be too complex.

Complicated architecture

Complex Problem with Simple Solution is not what implement sharding is all about. As database and application architecture interaction will become more complicated.

Difficulty in back to Non-Sharding Architecture

The level of Complexity in Database Sharding is high. Reverting back to non-sharding architecture after implementation will be for sure not easy.

Microsoft also has given few pointers to be taken care, the same can be checked over here

Conclusion

Database Sharding definitely helps when the large amount of data are to be managed. Needs concrete efforts in devising a sharding strategy and implementation. It is much complex to manage data with sharding. I recommend to go for Database sharding process only when required and Opt for horizontal sharding.

If you’re planning for the Effectively managing your data, get in touch with Ansi ByteCode LLP team

 

Author : Naishadh R. Patel

Let’s build your dream together.