Exploring MongoPush Using a Case of Filtering and Rename

One of the useful use cases of MongoPush is to copy a subset of a collection and rename the namespace in the target cluster. In this blog, I'll use a simple use case to describe its usage, the snapshot file, and tasks metadata of a migration.  If you are unfamiliar with MongoPush, see MongoPush - Push-Based MongoDB Atlas Migration Tool for an introduction.

Use Case

The use case presented here is to copy a subset of a collection from a 2-shard MongoDB cluster to a replica set.  I use Keyhole to populate data into the vehicles collections. The requirements of this use case are:

  • Locate all red vehicles, using filter {"color": "Red"}, from namespace atlanta.vehicles of the source cluster

  • Copy the data to the cars collection under the austin database in the target replica set using the optional field to

To satisfy the described requirements, use the command below:

mongopush -push data -source ${source_uri} -target ${target_uri} \

   -include '{"namespace": "atlanta.vehicles", "filter": {"color": "Red"}, "to": "austin.cars"}'

Upon completion, verify the number of documents from both clusters.

Execution

The above command executes the following steps:

  1. Connect to the source, a 2-shard sharded cluster

  2. Connect to the target, a replica set

  3. Query documents from namespace atlanta.vehicles of the source using filter {"color": "Red"}. There are two possible scenarios:

    1. If the total number of matched documents is less than the default block size, 10,000, begin copying documents to the target cluster

    2. Otherwise, divide them into small blocks and process them in parallel using multiple threads

  4. Documents are copied to a different namespace, austin.cars, in the target cluster

It is very important to have an index created on the color field of the source collection atlanta.vehicles. Without a proper index, it’ll perform a collection scan resulting in a poor performance.

In addition to the console messages, status is also averrable to be viewed in a browser. The default port is 5226; for example, use http://hostname.example.com:5226 to view progress and status.

Review and Audit

To Stop MongoPush, click on the Stop MongoPush button from the UI, and an HTML status report is automatically downloaded.  The report is a beautified reflection of the snapshot file, which is a compressed bson file with a suffix of -mongopush.bson.gz under the snapshot directory.  We can view the progress of the migration and regenerate an HTML report using the following command:

mongopush -print snapshot/hostname.example.com-mongopush.bson.gz

The hostname.example.com is the FQDN where mongopush was executed.

Splitting a large collection into smaller tasks makes it possible to achieve high parallelism, resumable, auditable, and progress/status reporting.  Tasks, stored in a JSON document, are saved to the _mongopush.tasks.<replica> collections in the target cluster.  Each document has four fields and they are:

  • _id: task ID

  • ids: an array of _id from the source shard (or replica set)

  • ns: namespace

  • replica_set: shard (or replica set) name

The information is used to report status and to resume a migration. About resuming a migration, unless the server hosting mongopush crashes, mongopush will keep trying in the cases of network connectivity interruptions or clusters out of service.

Pause and Resume

As the names imply, you can use the Pause and Resume buttons to pause or resume a data migration. The Pause button pauses the worker threads, but not the tasks splitter. The tasks splitters divide large collections into small tasks which makes migrating data of a collection in parallel possible.

Reverse Oplogs Streaming

MongoPush provides a one-time opportunity to reverse oplogs streaming after all data are copied to the target cluster. Once the oplogs streaming begins, the Reverse Oplogs Streaming button will be available from UI. This comes handy after applications cut over to the target cluster and you still want to keep data in sync between two clusters.



What's Next?

To migrate big data to MongoDB Atlas is a time consuming task.  I hope this provides helpful information to assist your Atlas migration project.  In the next blog, Change Shard Key and Migrate to Atlas Using MongoPush, I'll discuss changing shard key and migrating to Atlas at the same time. Please post your feedback in the Comment below.  


Comments

Popular posts from this blog

Build and Download Keyhole

MongoPush - Push-Based MongoDB Atlas Migration Tool

Survey Your Mongo Land