Change Shard Key and Migrate to Atlas Using MongoPush

Changing the shard key of a large data size collection is a time consuming task and painful experience in MongoDB.  For whom plan to migrate their clusters to Atlas, one would wish if changing the shard keys of collections was possible together with migrating data to Atlas.  MongoPush makes it possible in four simple steps.  For the MongoPush introduction, please see MongoPush - Push-Based MongoDB Atlas Migration Tool.

Use Case

The atlanta.vehicles collection is one of many collections in a 2-shard MongoDB cluster, and the shard key is {"year": 1, "brand": 1}.  The collection data size has grown over 5TB.  The engineering team plans to migrate to Atlas with a new configuration of a 3-shard cluster.  However, with the existing shard key, they have experienced a "hot shard" problem when importing inventory data of new vehicles because the "year" field of the shard key.  The marketing department advises that car buying is a both practical and passionate decision, and customers choose vehicles by styles and colors, for example a red convertible. After reviewing all their use cases, the team decides the new shard key to be {"style": 1, "color": 1}.

How can we make this process less painful without using mongodump and mongorestore commands?  Let’s explore the MongoPush solution.

Copy All Configurations

Because of the shard key change, we can’t use the convenient -push all parameter with the mongopush command and have to divide the entire process into four steps.  The first step is to copy all configurations and allow mongopush to automatically configure the target cluster.  Let's use {source} to represent the source cluster connection string and {target} for the target cluster connection string.  The command is as below:

mongopush -push config -drop -source {source} -target {target}

Copy and Review Indexes

Next, copy all indexes by using the command below:

mongopush -push index -source {source} -target {target}

Configure New Shard Key

The next step is to change the shard key. Let’s assume the primary shard is shard01, and ids of the other two shards are shard02 and shard03.  Connect to the target cluster using mongo shell and execute the commands below:

use admin
vehicles", {"style": 1, "color": 1})
sh.splitAt("atlanta.vehicles", {"style": "I", "color": "N"})
sh.splitAt("atlanta.vehicles", {"style": "R", "color": "N"})

The above commands do the following:

  • Change to the admin database

  • Drop the atlanta.vehicles collection

  • Shard the collection using a new shard key {"style": 1, "color": 1}

  • Split chunks

  • Move chunks

A couple of important notes from the above commands.  We only create one chunk per shard and let mongod automatically split chunks as the data are inserted.  For the midpoint of the splitAt command, you’ll have to examine your data to come up with the best splitting points.

After copying indexes is completed, review, add new or remove unwanted indexes if needed.

Copy Data

The last step is to copy data using the -push data parameter. The command is shown as below:
mongopush -push data -source {source} -target {target}
Other Use Cases
The steps can also apply to migrate from a replica set to a sharded cluster by beginning with configuring new shard keys.  Another use case is to use a different namespace within the same sharded cluster combined with the use of -include parameter.  See Exploring MongoPush Using a Case of Filtering and Rename for the details of the renaming feature.
What's Next?
To migrate big data to MongoDB Atlas is a time consuming task.  I hope this provides helpful information to assist your Atlas migration project.  In the next blog, I'll discuss a case of MongoPush Your GridFS to Atlas. Please post your feedback in the comments.  


Popular posts from this blog

Build and Download Keyhole

MongoPush - Push-Based MongoDB Atlas Migration Tool

Survey Your Mongo Land