Feel The Pulse of Mongo

When diagnosing performance issues of a MongoDB cluster, I usually begin with reviewing logs.  The logs truly keep a history of the server operations over time.  In this post, I am writing particularly about the slow database operations.  Keyhole, with the -loginfo option, reads mongo logs and prints a summary of slow operations grouped by query patterns.  Below is an example.

keyhole -loginfo mongod.log

It outputs the results to two files.  One file is in compressed bson format (with a -log.bson.gz suffix) and the other stores results as tab-separated values (tsv).  You can either import the tsv output to a spreadsheet to view the results, or use Maobi to generate an HTML report.

The -redact option can be used together with the -loginfo option to exclude top n slowest database operations in the output file.  This feature prevents exposing sensitive data such as PII or PHI. There are other options available and the usage is as below:

keyhole -loginfo [-collscan] [-regex {regex}] [-redact] mongod.log[.gz] ...

Indexes Are Your Best Friends

Without indexes to support queries are perhaps the most common mistakes found.  These mistakes can be either developers’ ignorance or simply unfamiliar with database technologies.  Maobi summarizes commands, time impacted, index used, and query patterns of collections in a tabular view.  For those query patterns missing indexes, they are flagged as COLLSCAN in red ink.  See an example below.

Watch Viking Raids

In the case that the server is not performing even data is properly indexed, the causes can be an under provisioned server or the ops transaction rate is higher than the server can support.  I would look for spikes on the number of database operations.  The sudden burst of database operations are like Viking raids. These short attacks could raid the server, burning up disk IOPS, and brought oplog collection to the ground before retrieval was complete.

The HTML report includes an Ops Counts chart to show the counts for all database commands.  This chart helps administrators understand the pulse of the server when under stress.  Note that a server should be provisioned for peak transaction rate.  Below is an example of Ops Counts chart.

New Log Format Support

MongoDB v4.4 changed the log format to json, and Keyhole supports the new format.  This log format change made many other log parsing tools obsolete.  Alternatively, you can use jq to parse or import the json formatted logs to a MongoDB instance.

What’s Next?

Reviewing logs is usually my next step to evaluate a cluster’s health after Survey Your Mongo Land.  If I still scratch my head after these two steps, I’ll analyze FTDC metrics to look for resources usages, detailed in Peek at your MongoDB Clusters like a Pro with Keyhole: Part 3


Popular posts from this blog

Build and Download Keyhole

MongoPush - Push-Based MongoDB Atlas Migration Tool

Survey Your Mongo Land