Differences

This shows you the differences between two versions of the page.

--- mongo_management [2020/03/06 08:41] – andonovj
+++ mongo_management [2020/12/16 09:42] (current) – [Index] andonovj
@@ Line 9: / Line 9: @@
 ===Start===
-<sxh bash>
+<Code:bash>
 [root@tain-cx-mdb1 scripts]# ./startMongoDB.sh
 about to fork child process, waiting until server is ready for connections.
 forked process: 67867
 child process started successfully, parent exiting
-</sxh>
+</Code>
 Where the script contains:
 **startMongoDB**
-<sxh bash>
+<Code:bash>
 #!/bin/bash
 sudo /bin/mongod --auth -f /app/mongo/hunterServer/conf/mongo.conf --fork
-</sxh>
+</Code>
 and the contents of the configuration file is:
 **mongo.conf**
-<sxh bash>
+<Code:bash>
 port = 9005
 logpath = /var/mongodb/logs/mongodb.log
@@ Line 36: / Line 36: @@
 smallfiles = true
 maxConns = 16000
-</sxh>
+</Code>
 The database can also be shutdown as follows:
@@ Line 42: / Line 42: @@
 ===Stop===
-<sxh bash>
+<Code:bash>
 ~/mongodb/bin/mongo localhost:9005/admin --eval "db.shutdownServer()"
-</sxh>
+</Code>
 or you can do it manually using
-<sxh bash>
+<Code:bash>
 mongo admin -u  'adminDBA' -p 'password' --port 9005 --eval "db.shutdownServer()"
-</sxh>
+</Code>
@@ Line 58: / Line 58: @@
-<sxh bash>
+<Code:bash>
 prompt = function() {
     user = db.runCommand({connectionStatus:1}).authInfo.authenticatedUsers[0]
@@ Line 78: / Line 78: @@
    |         |       |    |
 Username IP Address Port Database
-</sxh>
+</Code>
-=====Change storage engine=====
+=====Storage=====
+The storage and memory can be overviewed as follows:
+===Using Server Status===
+Server status will give you some rough overview of how the memory is allocated:
+<Code:bash>
+> db.serverStatus().mem
+{ "bits" : 64, "resident" : 27, "virtual" : 397, "supported" : true }
+> db.serverStatus().tcmalloc
+... not easy to read! ...
+var mem = db.serverStatus().tcmalloc;
+mem.tcmalloc.formattedString
+> db.serverStatus().tcmalloc.tcmalloc.formattedString
+"------------------------------------------------
+MALLOC:     2763280920 ( 2635.3 MiB) Bytes in use by application
+MALLOC: +    949514240 (  905.5 MiB) Bytes in page heap freelist
+MALLOC: +    135775296 (  129.5 MiB) Bytes in central cache freelist
+MALLOC: +      2762912 (    2.6 MiB) Bytes in transfer cache freelist
+MALLOC: +     42041608 (   40.1 MiB) Bytes in thread cache freelists
+MALLOC: +     16965888 (   16.2 MiB) Bytes in malloc metadata
+MALLOC:   ------------
+MALLOC: =   3910340864 ( 3729.2 MiB) Actual memory used (physical + swap)
+MALLOC: +    147517440 (  140.7 MiB) Bytes released to OS (aka unmapped)
+MALLOC:   ------------
+MALLOC: =   4057858304 ( 3869.9 MiB) Virtual address space used
+MALLOC:
+MALLOC:          62521              Spans in use
+MALLOC:             78              Thread heaps in use
+MALLOC:           4096              Tcmalloc page size
+------------------------------------------------
+Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).
+Bytes released to the OS take up virtual address space but no physical memory.
+</Code>
+So you can see how much memory your database is storing.
+The storage can be checked as follows:
+<Code:bash>
+> db.getCollectionNames().map(name => ({totalIndexSize: db.getCollection(name).stats().totalIndexSize, name: name})).sort((a, b) => a.totalIndexSize - b.totalIndexSize).forEach(printjson)
+{ "totalIndexSize" : 69632, "name" : "dealers" }
+{ "totalIndexSize" : 69632, "name" : "users" }
+{ "totalIndexSize" : 73728, "name" : "authorization.permissions" }
+{ "totalIndexSize" : 73728, "name" : "authorization.roles" }
+{ "totalIndexSize" : 73728, "name" : "authorization.rules" }
+{ "totalIndexSize" : 221184, "name" : "procedures" }
+{ "totalIndexSize" : 372736, "name" : "reviews" }
+{ "totalIndexSize" : 856064, "name" : "violations" }
+{ "totalIndexSize" : 1617920, "name" : "incidents" }
+{ "totalIndexSize" : 2031616, "name" : "files.files" }
+{ "totalIndexSize" : 2732032, "name" : "rounds.requests" }
+{ "totalIndexSize" : 23449600, "name" : "files.chunks" }
+{ "totalIndexSize" : 114831360, "name" : "audit.trails" }
+{ "totalIndexSize" : 368611328, "name" : "rounds" }
+</Code>
+Of course you can investigate it further with:
+<Code:bash>
+> db.rounds.stats().indexSizes
+{
+        "_id_" : 64692224,
+        "rounds_unique" : 58056704,
+        "rounds_sort_dealer_name" : 18669568,
+        "rounds_sort_is_closed" : 17887232,
+        "rounds_sort_video_id" : 18415616,
+        "rounds_sort_started_at" : 55402496,
+        "rounds_text_game_code" : 113971200,
+        "rounds_sort_is_cancelled" : 21516288
+}
+>
+</Code>
+===Get Storage Size for each Colleciton===
+<Code:bash|Get Storage per Collection>
+var collectionNames = db.getCollectionNames(), stats = [];
+collectionNames.forEach(function (n) { stats.push(db[n].stats()); });
+for (var c in stats) {
+    print(stats[c]['ns'] + ": " + stats[c]['size'] + " (" + stats[c]['storageSize'] + ")");
+}
+</Code>
+<Code:bash|Get Storage per Collection in GB>
+var collectionNames = db.getCollectionNames()
+var col_stats = [];
+// Get stats for every collections
+collectionNames.forEach(function (n) {
+    col_stats.push(db.getCollection(n).stats());
+});
+// Print
+for (var item of col_stats) {
+    print(`${item['ns']} | size: ${item['size']}
+(${(item['size']/1073741824).toFixed(2)} GB) | storageSize: ${item['storageSize']}
+(${(item['storageSize']/1073741824).toFixed(2)} GB)`);
+}
+</Code>
+====Compact Data====
+Data in Mongo Database can easily become fragmented. Especially with frequent additions and removals. Let's check one example:
+<Code:bash>
+> db.files.chunks.stats()
+{
+        "ns" : "incident.files.chunks",
+        "size" : 148870322967,
+        "count" : 588516,
+        "avgObjSize" : 252958,
+        "storageSize" : 394018263040,
+        "capped" : false,
+        "wiredTiger" : {....}
+</Code>
+On our example, this collection "file.chunks" is 148870322967 bytes in data but it occupies 394018263040 on disk level. In other words the data is more than 100% fragmented.
+In order to defremant a collection we have two options, depending on the storage engine:
+  * MMAPv1 : Repair a whole database or directory location
+  * WiredTiger: Compact Collection.
+Let's see how this is done in each of them:
+===MMAPv1===
+If your storage engine is MMAPv1, this is your way forward. The repairDatabase command is used for checking and repairing errors and inconsistencies in your data. It performs a rewrite of your data, freeing up any unused disk space along with it. Like compact, it will block all other operations on your database. Running repairDatabase can take a lot of time depending on the amount of data in your db, and it will also completely remove any corrupted data it finds.
+==Linux Shell==
+<Code:bash>
+mongod --repair --repairpath /mnt/vol1
+</Code>
+==Mongo Shell==
+<Code:bash>
+db.repairDatabase()
+</Code>
+or
+<Code:bash>
+db.runCommand({repairDatabase:1})
+</Code>
+RepairDatabase needs free space equivalent to the data in your database and an additional 2GB more. It can be run either from the system shell or from within the mongo shell. Depending on the amount of data you have, it may be necessary to assign a sperate volume for this using the --repairpath option.
+===Wired Tiger===
+he compact command works at the collection level, so each collection in your database will have to be compacted one by one. This completely rewrites the data and indexes to remove fragmentation. In addition, if your storage engine is WiredTiger, the compact command will also release unused disk space back to the system. You're out of luck if your storage engine is the older MMAPv1 though; it will still rewrite the collection, but it will not release the unused disk space. Running the compact command places a block on all other operations at the database level, so you have to plan for some downtime.
+<Code:bash>
+db.runCommand({compact:'collectionName'})
+</Code>
+===Example===
+So let's check it in action with WiredTiger:
+<Code:bash>
+> db.files.chunks.storageSize()
+394018263040
+> db.runCommand({compact:'files.chunks'})
+{ "ok" : 1 }
+> db.files.chunks.storageSize()
+162991509504
+>
+</Code>
+WoW, we saved more than half of the space :)
+====Change storage engine====
 From 4.2 the: MMAPv1 is depricated, so if you are thinking of upgrading to 4.2, we have to change the storage engine. The engine isn't easy though.
 In nutshell we should:
@@ Line 94: / Line 269: @@
 The export can be done wherever you can, in our case we will create a new directory for that backup:
-<sxh bash>
+<Code:bash>
 [root@localhost mongo]# mkdir -p /app/data/mongo
 [root@localhost mongo]# cd /app/data
@@ Line 110: / Line 285: @@
 drwxr-xr-x. 4 root root 33 Sep 10 15:22 .
 drwxr-xr-x. 6 root root 62 Sep 10 15:25 backup <- The backup (export) location
-</sxh>
+</Code>
 The export is done very easy:
-<sxh bash>
+<Code:bash>
 [root@localhost backup]# mongodump  --out /app/data/backup -u adminDBA -p password123 --authenticationDatabase admin
 -09-10T15:25:28.642-0400    writing admin.system.indexes to
@@ Line 128: / Line 303: @@
 -09-10T15:25:28.691-0400    done dumping ExampleDB.ExampleCol (2 documents)
 -09-10T15:25:28.691-0400    done dumping test.mycol1 (20 documents)
-</sxh>
+</Code>
 ===Modify the mongo config file===
 We have to modify the config file so it will point to the new folder and with the correct engine:
-<sxh bash>
+<Code:bash>
 # Where and how to store data.
 storage:
@@ Line 140: / Line 315: @@
     enabled: true
   engine: wiredTiger <- The New engine
-</sxh>
+</Code>
 P.S. Disable the authentication for now, since it will be problem later:
-<sxh bash>
+<Code:bash>
 security:
   authorization: "disabled"
-</sxh>
+</Code>
 ===Restart the mongod===
 Feel free to restart the mongo however you want :)
-<sxh bash>
+<Code:bash>
 [root@localhost mongo]# mongo -u adminDBA -p password123 localhost:27017/admin --eval "db.shutdownServer()"
 MongoDB shell version v4.0.12
@@ Line 206: / Line 381: @@
 >
 bye
-</sxh>
+</Code>
@@ Line 212: / Line 387: @@
 If the authentication isn't disabled, that will create a problem since in the config file the authentication is enabled, but the user (which is in the admin database in our case) cannot be used. So ensure the authentication is disabled:
-<sxh bash>
+<Code:bash>
 [root@localhost mongo]# mongorestore /app/data/backup/
 -09-10T15:33:08.051-0400    preparing collections to restore from
@@ Line 225: / Line 400: @@
 -09-10T15:33:08.125-0400    restoring users from /app/data/backup/admin/system.users.bson
 -09-10T15:33:08.185-0400    done
-</sxh>
+</Code>
 ===Verify the data==
 We can verify the data, for some reason mongo was showing me 0000, even though there was data in the databases:
-<sxh bash>
+<Code:bash>
 [root@localhost mongo]# mongo
 MongoDB shell version v4.0.12
@@ Line 289: / Line 464: @@
 { "_id" : ObjectId("5d00f61c4422539b4d67539d"), "count" : 9, "username" : "user9", "password" : "u2ff0kvs4i", "createdOn" : ISODate("2019-06-12T12:54:52.378Z"), "score" : 0 }
 { "_id" : ObjectId("5d00f61c4422539b4d67539e"), "count" : 10, "username" : "user10", "password" : "a7ny85xw29", "createdOn" : ISODate("2019-06-12T12:54:52.379Z"), "score" : 0 }
-</sxh>
+</Code>
-=====Compact Data=====
+=====Logical Structure=====
-Data in Mongo Database can easily become fragmented. Especially with frequent additions and removals. Let's check one example:
+As any database, mongo has objects to store information into, we already discussed them:
+====Data Objects====
+  * Collections
+  * Documents
-<sxh bash>
+As you know, a collection can store many documents, which doesn't even need to have the same structure. Imagine a collection, just a laundry bin, in which you put a lot of shirts. These shirts, don't even have to be the same size or colour. Since Mongo however can store pretty big objects, it is good to have index on such objects. So let's check how to create indexes:
-> db.files.chunks.stats()
-{
-        "ns" : "incident.files.chunks",
-        "size" : 148870322967,
-        "count" : 588516,
-        "avgObjSize" : 252958,
-        "storageSize" : 394018263040,
-        "capped" : false,
-        "wiredTiger" : {....}
-</sxh>
-On our example, this collection "file.chunks" is 148870322967 bytes in data but it occupies 394018263040 on disk level. In other words the data is more than 100% fragmented.
+====Index====
+Indexes support the efficient execution of queries in MongoDB. Without indexes, MongoDB must perform a collection scan, i.e. scan every document in a collection, to select those documents that match the query statement. If an appropriate index exists for a query, MongoDB can use the index to limit the number of documents it must inspect.
-In order to defremant a collection we have two options, depending on the storage engine:
+Indexes are special data structures [1] that store a small portion of the collection’s data set in an easy to traverse form. The index stores the value of a specific field or set of fields, ordered by the value of the field. The ordering of the index entries supports efficient equality matches and range-based query operations. In addition, MongoDB can return sorted results by using the ordering in the index.
-  * MMAPv1 : Repair a whole database or directory location
+The following diagram illustrates a query that selects and orders the matching documents using an index:
-  * WiredTiger: Compact Collection.
-Let's see how this is done in each of them:
+{{:mongoindexoverview.jpg?600|}}
-===MMAPv1===
+===Index types===
-If your storage engine is MMAPv1, this is your way forward. The repairDatabase command is used for checking and repairing errors and inconsistencies in your data. It performs a rewrite of your data, freeing up any unused disk space along with it. Like compact, it will block all other operations on your database. Running repairDatabase can take a lot of time depending on the amount of data in your db, and it will also completely remove any corrupted data it finds.
-==Linux Shell==
+==Single Field==
-<sxh bash>
+In addition to the MongoDB-defined _id index, MongoDB supports the creation of user-defined ascending/descending indexes on a single field of a document.
-mongod --repair --repairpath /mnt/vol1
-</sxh>
-==Mongo Shell==
+{{:mongosinglekeyindex.jpg?400|}}
-<sxh bash>
-db.repairDatabase()
-</sxh>
-or
+==Compound Index==
+MongoDB also supports user-defined indexes on multiple fields, i.e. compound indexes.
-<sxh bash>
+The order of fields listed in a compound index has significance. For instance, if a compound index consists of { userid: 1, score: -1}, the index sorts first by userid and then, within each userid value, sorts by score.
-db.runCommand({repairDatabase:1})
+For compound indexes and sort operations, the sort order (i.e. ascending or descending) of the index keys can determine whether the index can support a sort operation. See Sort Order for more information on the impact of index order on results in compound indexes.
-</sxh>
-RepairDatabase needs free space equivalent to the data in your database and an additional 2GB more. It can be run either from the system shell or from within the mongo shell. Depending on the amount of data you have, it may be necessary to assign a sperate volume for this using the --repairpath option.
+{{:mongocompoundindex.jpg?400|}}
-===Wired Tiger===
+==Multikey Index==
-he compact command works at the collection level, so each collection in your database will have to be compacted one by one. This completely rewrites the data and indexes to remove fragmentation. In addition, if your storage engine is WiredTiger, the compact command will also release unused disk space back to the system. You're out of luck if your storage engine is the older MMAPv1 though; it will still rewrite the collection, but it will not release the unused disk space. Running the compact command places a block on all other operations at the database level, so you have to plan for some downtime.
+MongoDB uses multikey indexes to index the content stored in arrays. If you index a field that holds an array value, MongoDB creates separate index entries for every element of the array. These multikey indexes allow queries to select documents that contain arrays by matching on element or elements of the arrays. MongoDB automatically determines whether to create a multikey index if the indexed field contains an array value; you do not need to explicitly specify the multikey type.
-<sxh bash>
+{{:mongomultikeyindex.jpg?400|}}
-db.runCommand({compact:'collectionName'})
-</sxh>
-===Example===
+==Geospatial Index==
-So let's check it in action with WiredTiger:
+To support efficient queries of geospatial coordinate data, MongoDB provides two special indexes: 2d indexes that uses planar geometry when returning results and 2dsphere indexes that use spherical geometry to return results.
-<sxh bash>
-> db.files.chunks.storageSize()
-394018263040
-> db.runCommand({compact:'files.chunks'})
-{ "ok" : 1 }
-> db.files.chunks.storageSize()
-162991509504
->
-</sxh>
-WoW, we saved more than half of the space :)
+==Text Indexes==
-=====Storage & Memory Overview=====
+MongoDB provides a text index type that supports searching for string content in a collection. These text indexes do not store language-specific stop words (e.g. “the”, “a”, “or”) and stem the words in a collection to only store root words.
-The storage and memory can be overviewed as follows:
-===Using Server Status===
+==Hashed Indexes==
-Server status will give you some rough overview of how the memory is allocated:
+To support hash based sharding, MongoDB provides a hashed index type, which indexes the hash of the value of a field. These indexes have a more random distribution of values along their range, but only support equality matches and cannot support range-based queries.
-<sxh bash>
+===Index Properties===
-> db.serverStatus().mem
-{ "bits" : 64, "resident" : 27, "virtual" : 397, "supported" : true }
-> db.serverStatus().tcmalloc
+==Unique Indexes==
-... not easy to read! ...
+The unique property for an index causes MongoDB to reject duplicate values for the indexed field. Other than the unique constraint, unique indexes are functionally interchangeable with other MongoDB indexes.
-var mem = db.serverStatus().tcmalloc;
-mem.tcmalloc.formattedString
+==Partial Indexes==
-> db.serverStatus().tcmalloc.tcmalloc.formattedString
+New in version 3.2.
-"------------------------------------------------
-MALLOC:     2763280920 ( 2635.3 MiB) Bytes in use by application
-MALLOC: +    949514240 (  905.5 MiB) Bytes in page heap freelist
-MALLOC: +    135775296 (  129.5 MiB) Bytes in central cache freelist
-MALLOC: +      2762912 (    2.6 MiB) Bytes in transfer cache freelist
-MALLOC: +     42041608 (   40.1 MiB) Bytes in thread cache freelists
-MALLOC: +     16965888 (   16.2 MiB) Bytes in malloc metadata
-MALLOC:   ------------
-MALLOC: =   3910340864 ( 3729.2 MiB) Actual memory used (physical + swap)
-MALLOC: +    147517440 (  140.7 MiB) Bytes released to OS (aka unmapped)
-MALLOC:   ------------
-MALLOC: =   4057858304 ( 3869.9 MiB) Virtual address space used
-MALLOC:
-MALLOC:          62521              Spans in use
-MALLOC:             78              Thread heaps in use
-MALLOC:           4096              Tcmalloc page size
-------------------------------------------------
-Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).
-Bytes released to the OS take up virtual address space but no physical memory.
-</sxh>
-So you can see how much memory your database is storing.
+Partial indexes only index the documents in a collection that meet a specified filter expression. By indexing a subset of the documents in a collection, partial indexes have lower storage requirements and reduced performance costs for index creation and maintenance.
-The storage can be checked as follows:
-<sxh bash>
-> db.getCollectionNames().map(name => ({totalIndexSize: db.getCollection(name).stats().totalIndexSize, name: name})).sort((a, b) => a.totalIndexSize - b.totalIndexSize).forEach(printjson)
-{ "totalIndexSize" : 69632, "name" : "dealers" }
+==Sparse Indexes==
-{ "totalIndexSize" : 69632, "name" : "users" }
+The sparse property of an index ensures that the index only contain entries for documents that have the indexed field. The index skips documents that do not have the indexed field.
-{ "totalIndexSize" : 73728, "name" : "authorization.permissions" }
-{ "totalIndexSize" : 73728, "name" : "authorization.roles" }
-{ "totalIndexSize" : 73728, "name" : "authorization.rules" }
-{ "totalIndexSize" : 221184, "name" : "procedures" }
-{ "totalIndexSize" : 372736, "name" : "reviews" }
-{ "totalIndexSize" : 856064, "name" : "violations" }
-{ "totalIndexSize" : 1617920, "name" : "incidents" }
-{ "totalIndexSize" : 2031616, "name" : "files.files" }
-{ "totalIndexSize" : 2732032, "name" : "rounds.requests" }
-{ "totalIndexSize" : 23449600, "name" : "files.chunks" }
-{ "totalIndexSize" : 114831360, "name" : "audit.trails" }
-{ "totalIndexSize" : 368611328, "name" : "rounds" }
-</sxh>
-Of course you can investigate it further with:
+You can combine the sparse index option with the unique index option to prevent inserting documents that have duplicate values for the indexed field(s) and skip indexing documents that lack the indexed field(s).
+==TTL Indexes==
+TTL indexes are special indexes that MongoDB can use to automatically remove documents from a collection after a certain amount of time. This is ideal for certain types of information like machine generated event data, logs, and session information that only need to persist in a database for a finite amount of time.
-<sxh bash>
+==Hidden Indexes==
-> db.rounds.stats().indexSizes
+New in version 4.4.
+Hidden indexes are not visible to the query planner and cannot be used to support a query.
+By hiding an index from the planner, users can evaluate the potential impact of dropping an index without actually dropping the index. If the impact is negative, the user can unhide the index instead of having to recreate a dropped index. And because indexes are fully maintained while hidden, the indexes are immediately available for use once unhidden.
+Except for the _id index, you can hide any indexes.
+====Management====
+Let's see how to get what indexes a collection has and how to create or re-create and index:
+===Get Indexes===
+To get the indexes for a certain collection we can use the following function in any collection:
+<Code:bash|Get Indexes>
+> db.files.files.getIndexes()
+[
+        {
+                "v" : 2,
+                "key" : {
+                        "_id" : 1
+                },
+                "name" : "_id_"
+        },
+        {
+                "v" : 2,
+                "unique" : true,
+                "key" : {
+                        "filename" : 1,
+                        "uploadDate" : 1
+                },
+                "name" : "files_files_unique"
+        }
+]
+>
+</Code>
+===Create an Index===
+To create index we can use:
+<Code:bash|Create Index>
+db.files.chunks.createIndex(
+        {
+                "v" : 2
+		},
+		{
+                "unique" : true,
+                "key" : {
+                        "files_id" : 1,
+                        "n" : 1
+                }
+        }
+)
+</Code>
+===Reindex===
+Reindex will re-create all indexes on a collection (after mongoimport for example)
+<Code:bash|Re-Index>
+> db.files.files.reIndex()
 {
-        "_id_" : 64692224,
+        "nIndexesWas" : 2,
-        "rounds_unique" : 58056704,
+        "nIndexes" : 2,
-        "rounds_sort_dealer_name" : 18669568,
+        "indexes" : [
-        "rounds_sort_is_closed" : 17887232,
+                {
-        "rounds_sort_video_id" : 18415616,
+                        "v" : 2,
-        "rounds_sort_started_at" : 55402496,
+                        "key" : {
-        "rounds_text_game_code" : 113971200,
+                                "_id" : 1
-        "rounds_sort_is_cancelled" : 21516288
+                        },
+                        "name" : "_id_"
+                },
+                {
+                        "v" : 2,
+                        "unique" : true,
+                        "key" : {
+                                "filename" : 1,
+                                "uploadDate" : 1
+                        },
+                        "name" : "files_files_unique"
+                }
+        ],
+        "ok" : 1
 }
 >
-</sxh>
+</Code>
 =====Execute a script=====
 If you have a JSON file "js" which you would like to execute on the mongo, you can do as follows:
-<sxh bash>
+<Code:bash>
 [root@tbp-mts-mdb02 ~]# /usr/bin/mongo -u root -p "password" --authenticationDatabase=admin --host localhost < script.js
 MongoDB shell version v4.2.2
@@ Line 451: / Line 637: @@
 { "nIndexesWas" : 8, "ok" : 1 }
 { "nIndexesWas" : 7, "ok" : 1 }
-</sxh>
+</Code>