DocumentDB Indexing Policies–Portal Choices

I have been using Azure DocumentDB for a good while now and think it is a very interesting service.  The fact that all attributes by default are able to be indexed with no real sign of performance impact on insertion speaks volumes for the team in Redmond.
As I said by default all paths within a document are indexed.  DocumentDB allows you, when creating a collection in the portal, to specify how you want to index documents for that collection.  In days gone by everything was indexed using a hash index and whilst this was good for equality joins it wouldn’t work with range queries (Yes I know you can specify ScanInQuery to get around it but this is a performance hole).
You now get a choice and I for one think this is great.  You can still create your own custom indexing policy whereby you may exclude certain paths from being indexed or you may put a range index on a string field but I think a large number of indexing cases will be satisifed by the options in the portal.  Here are a couple of examples of where range queries help.

 

“Get me all documents where people working in HR earn more than £50k per year”
“Show me documents where people who sat the music exam scored between 45 and 65 percent”

 

Example

 

I have created my database and collection and I also specified the default indexing policy from this screen.

 

image

 

This is a sample document from my collection

 

{
"candidateid": 123456,
"school": "Castleford Academy",
"results": 
[
{
"exam": "english",
"gradepct": 76
},
{
"exam": "maths",
"gradepct": 80
},
{
"exam": "music",
"gradepct": 55
}
]
}

 

And finally here is my query with results that asks for documents where candidates have a gradepct value greater than 78.

 

SELECT c.candidateid, examresult FROM c JOIN examresult IN c.results
WHERE examresult.gradepct > 78

 

[
{
"candidateid": "123456",
"examresult": {
"exam": "maths",
"gradepct": 80
}
}
]

 

Conclusion

 

I really like that DocumentDB has sensible/expected defaults when it comes to indexing your documents as well as still being able to be very precise with custom indexing policies.