Monday, 31 July 2017

Introduction to Azure Cosmos DB

Azure Cosmos DB is one of the latest cloud based service offerings from Microsoft. Azure Cosmos DB is a superset data service of DocumentDB. Earlier, Microsoft offered DocumentDB as Data as a Service (DaaS), which supported a limited set of features and functionalities. Microsoft’s engineers shared their challenges, running the company’s cloud-based services, such as Bing, Azure and Office 365 using DocumentDB. Microsoft understood their engineer’s challenges and marketed the opportunity to take DocumentDB to the next level. As a result, Microsoft come up with Azure Cosmos DB - A globally distributed, multi-model database.

Azure Cosmos DB is a schema-free database system designed for scalable, broadly distributed, highly responsive and highly available applications. Azure Cosmos DB also supports several NoSQL APIs including DocumentDB SQL, MongoDB, Gremlin, and Azure Tables. The Gremlin and Table Storage are currently in under preview. Azure Cosmos DB can handle a variety of data to store, like key-value, document, columnar, and graph types, in a variety of environments, including Internet of Things (IoT) and others.

Azure Cosmos DB

Azure Cosmos DB is a schema agnostic database engine to support multiple systems. This is a schema on-read database, which provides faster DB write operations.

In rest of this article, we will talk about some of the key capabilities of Azure Cosmos DB, which make it stand out from other members in the NoSQL family.

Azure Cosmos DB has robust capabilities that support globally distributed, multi-data models with a rich set of APIs to access and query the data for high availability and highly responsive critical applications for any organization.

Some of the key capabilities are:

1) Global Distribution
2) Multi-model, multi-API support
3) Horizontal scaling of storage and throughput
4) Low latency
5) Transparent multi-homing
6) Multiple, well-defined consistency models
7) Schema-free

The global distribution capability helps to distribute the application instantly. Azure Cosmos DB provides two kinds of distribution;

1) Local distribution
2) Global Distribution.

1. Local distribution: All resources available in a region will be horizontally partitioned using resource partitions.
2. Global Distributions: this is a distribution of resource partition across geographical regions.

Global distribution of resources in Cosmos DB is turn-key. At any time with a few button clicks or programmatically with a single API call, a user can associate any number of geographical regions with their database account.

Azure Cosmos DB natively supports multiple data models including documents, key-value, graph and column family. This database engine is based on the atom-record-sequence (ARS) data model.

ARS data model:

Atom (A) – Atoms consist of a small set of primitive data types like number, string and Boolean.

Record (R) - Records are structs composed of types stored in Atom.

Sequence (S) - Sequences represent arrays consisting of atoms, records, or sequences.

The database engine supports multiple database APIs for data access and querying like DocumentDB SQL, MongoDB, Gremlin (preview), and Azure Tables (preview).

As a user of Cosmos DB, Microsoft assures end-to-end very low latency at 99th percentile within the same Azure region. Cosmos DB provides a method to the user to distinguish between transactions with high latency vs. a database being unavailable.

Azure Cosmos DB also offers more granular options over the control that decides database performance, and consistency. Consistency is a database concept that requires all data to be written to the database in harmony with rules required by the engineer, but it can be set at different levels depending on speed and accuracy when returning data to the user. Microsoft allows engineers to choose between five well-defined consistency models along the consistency spectrum – strong, bounded staleness, session, consistent prefix, and eventual.

Azure Cosmos DB engine is designed to manage elastically scaled throughput, based on the application traffic patterns across different geographical regions, to support fluctuating workloads varying both by geography and time.

Whenever you are in the phase of designing and developing an application, do you always think about how your system should behave at the time of failure or disaster? Transparent multi-homing is the capability that supports an application in the unlikely event of regional failure or disaster. Azure Cosmos DB automatically fails over in the order of defined priority. Priorities can be used to direct the requests to specific available regions in the event of regional failures. One of the best parts is dynamic association of “priority” to the regions associated with the Azure Cosmos DB database account.

Azure Cosmos DB’s database engine is fully schema agnostic and Schema-free. The prime benefit of this feature is that It automatically indexes all the data it ingests without requiring any schema or indexes.

Overall, Azure Cosmos DB gives a level of transparency over the choices needed to be required to design an enterprise level application.