Introduction to MongoDb with .NET part 37: indexing text fields

Introduction

In the previous post we looked at various aspects of the query plan produced by the explain function. There are three query plan modes that correspond to an increasing level of details: query plan mode, execution stats mode and all plans execution mode. With the query plan mode we can quickly verify which strategy MongoDb will use to execute a query. Execution stats also tells the number of documents and index keys scanned and produced in each step. Finally the all plans execution mode shows how MongoDb selected a certain execution path by producing the details of any rejected plan as well.

In this post we’ll look at how to index text fields.

Text searches

Occasionally a MongoDb document can have a text field like in the following example:

db.texts.insert({"news" : "Decisively surrounded all admiration and not you."})
db.texts.insert({"news" : "Out particular sympathize not favourable introduced insipidity but ham."})
db.texts.insert({"news" : "Evil true high lady roof men had open."})
db.texts.insert({"news" : "surrounded all admiration"})
db.texts.insert({"news" : "sympathize not favourable introduced insipidity admiration and not you lady roof men had"})
db.texts.insert({"news" : "Decisively surrounded all not favourable introduced."})
db.texts.insert({"news" : "Knew as miss my high hope quit true high lady roof men had"})
db.texts.insert({"news" : "particular sympathize not favourable admiration and not you lady roof all not favourable"})
db.texts.insert({"news" : "insipidity but ham admiration surrounded all not favourable"})
db.texts.insert({"news" : "sympathize not favourable admiration and not sympathize not favourable introduced insipidity"})
db.texts.insert({"news" : "Decisively surrounded all not favourable but ham admiration surrounded all not"})
db.texts.insert({"news" : "as miss my high hope quit true high surrounded all not favourable"})

That’s some random English text produced by this web site: Random text generator.

Let’s see if we can find bits of text using a simple search:

db.texts.find({"news" : "high hope"})

No, that won’t find anything since that’s an equality operation and we have no text that equals “high hope”. We could use a regular expression search but we’ll look at another option, namely text indexes. A text index will create an index on the words within the text similar to how array indexes work. Normally when we create an index we have to provide the name of the indexed property and then 1 or -1 whether it is ascending or descending. With a text index it’s slightly different in that we don’t provide an integer argument but a special “text” keyword:

db.texts.createIndex({"news" : "text"})

The text search syntax will also be different. This is how we can rewrite the above find command using the $text and $search operators:

db.texts.find({$text : {$search : "high hope"}})

It returns 3 documents:

{ "_id" : ObjectId("574a71613935a48996923e45"), "news" : "as miss my high hope quit true high surrounded all not favourable" }
{ "_id" : ObjectId("574a71603935a48996923e40"), "news" : "Knew as miss my high hope quit true high lady roof men had" }
{ "_id" : ObjectId("574a71603935a48996923e3c"), "news" : "Evil true high lady roof men had open." }

The documents are returned in the order of how well the texts fit the search term. See that the third document has “high” but no “hope” so it came last. The search is quite flexible. E.g. the search term “hope high” will also produce the above documents, so it doesn’t only search for words in a set order.

Note that casing makes no difference. The search…:

db.texts.find({$text : {$search : "HOPE HIGH"}})

…returns the same set of documents.

An additional interesting feature is that we can ask MongoDb to produce the degree to which the documents match a search term. The $meta operator produces a meta-document which includes the field “textScore”. The text score is of type double and the higher score the better the match:

db.texts.find({$text : {$search : "insipidity roof introduced"}}, {"degree" : {$meta : "textScore"}}).sort({"degree" : {$meta : "textScore"}})

The above command produces the following documents:

{ "_id" : ObjectId("574a71603935a48996923e3e"), "news" : "sympathize not favourable introduced insipidity admiration and not you lady roof men had", "degree" : 1.6875 }
{ "_id" : ObjectId("574a71603935a48996923e3b"), "news" : "Out particular sympathize not favourable introduced insipidity but ham.", "degree" : 1.1666666666666667 }
{ "_id" : ObjectId("574a71603935a48996923e43"), "news" : "sympathize not favourable admiration and not sympathize not favourable introduced insipidity", "degree" : 1.1428571428571428 }
{ "_id" : ObjectId("574a71603935a48996923e3f"), "news" : "Decisively surrounded all not favourable introduced.", "degree" : 0.625 }
{ "_id" : ObjectId("574a71603935a48996923e42"), "news" : "insipidity but ham admiration surrounded all not favourable", "degree" : 0.6 }
{ "_id" : ObjectId("574a71603935a48996923e3c"), "news" : "Evil true high lady roof men had open.", "degree" : 0.5714285714285714 }
{ "_id" : ObjectId("574a71603935a48996923e41"), "news" : "particular sympathize not favourable admiration and not you lady roof all not favourable", "degree" : 0.5714285714285714 }
{ "_id" : ObjectId("574a71603935a48996923e40"), "news" : "Knew as miss my high hope quit true high lady roof men had", "degree" : 0.55 }

This function helps you construct a search box which helps users complete their query as they are typing.

You can view all posts related to data storage on this blog here.

Advertisements

About Andras Nemes
I'm a .NET/Java developer living and working in Stockholm, Sweden.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

ultimatemindsettoday

A great WordPress.com site

iReadable { }

.NET Tips & Tricks

Robin Sedlaczek's Blog

Developer on Microsoft Technologies

HarsH ReaLiTy

A Good Blog is Hard to Find

Softwarearchitektur in der Praxis

Wissenswertes zu Webentwicklung, Domain-Driven Design und Microservices

the software architecture

thoughts, ideas, diagrams,enterprise code, design pattern , solution designs

Technology Talks

on Microsoft technologies, Web, Android and others

Software Engineering

Web development

Disparate Opinions

Various tidbits

chsakell's Blog

Anything around ASP.NET MVC,WEB API, WCF, Entity Framework & AngularJS

Cyber Matters

Bite-size insight on Cyber Security for the not too technical.

Guru N Guns's

OneSolution To dOTnET.

Johnny Zraiby

Measuring programming progress by lines of code is like measuring aircraft building progress by weight.

%d bloggers like this: