Thursday, 20 April 2017

Database as a Service: A Practicum

Vendors now offer Database as a Service (DBaaS) as part of a bundled solution of managed IT services. The selling point is that businesses will not need to purchase and install database software, create and maintain databases, hire database administrators, or worry about upgrades and scaling up when their business expands. Delegating database management services to an external provider may speed up development time, and allows a company to pay only for the data they store and the database services they consume.

However, delegating database administration services has many hidden dangers. In this article, we focus on how delegating data modeling to an outside service can cause problems when making application updates, changing business rules or doing performance tuning.

Which Part of the Database is the Service?

Many vendors provide information technology solutions in the cloud. These are commonly called managed services, and include data storage, transaction processing and hosting application execution. The idea is to relieve organizations from the resources and time spent on setting up and supporting an entire IT infrastructure for their businesses. Your company can now sell new products and services with a shorter time-to-market, as they can effectively outsource some of their information technology requirements.

Oracle Database Tutorials and Materials, Oracle Database Certifictions

Consider this example. You wish to develop an application that provides retail products to new customers via the internet. As you hand the requirements to your application development team you also provide them to a DBaaS provider. The provider creates (or re-uses a current) database instance using its own hardware and software platform, then takes responsibility for typical application-related tasks such as creating tables and indexes. Finally, the provider schedules regular database backups and guarantees some level of data availability and data recovery in case of disasters.

The provider then charges you fees for the initial environment, fees for database development and deployment, and a schedule of future fees for additional resources should they become necessary.

In theory, this relieves you of the time and effort spent on multiple tasks:
  • Creating and maintaining a hardware environment, including hardware purchase and installation, operating system maintenance and upgrades, system backup and recovery, and performance tuning;
  • Creating and maintaining a database environment, including possible hardware acquisition, software licensing and installation, software maintenance including version upgrades, and database management system performance tuning;
  • Performance monitoring and tuning, include considerations for adding additional hardware, storage, and database objects (such as indexes) in order to react to increases in customer traffic, database size, or application growth.
When considering using DBaaS, most companies consider the costs of this service as an investment that is more than offset by the decrease in getting their new application up and running. Regrettably, there remains one issue that cannot be delegated: proper data modeling.

Data Models and Business Rules

A data model specifies data element domains (numeric? text? currency?), entity types (unique identifier?), and how data elements relate to one another. The data model is usually derived from the business rules. For example, “an Order Number is a unique, numeric identifier that must be related to a valid Customer”.  In this example, some of the business rules are:
  • An Order Numbers is numeric;
  • Each Order Number is unique;
  • There must be a way of specifying or determining valid Customers;
  • There must be a process or definition that specifies which Orders belong to which Customers.
Each of the business rules must be enforced by some process. Consider the rule about unique Order Numbers. One method is for the database administrator to define a unique index on this data item. Any attempt to insert or create an existing Order Number will fail. If this method is used, the application must be coded to handle this error situation.

Another method is for the developers to code the application logic that adds new Orders to use a special “next order” counter. The value of the counter becomes the next Order Number, and after use the counter is incremented by one.

Notice with this example that the application logic is different based upon how and where the business rule is to be enforced. Question: Who makes this decision?  If the developers make the decision (i.e., they do the data modeling), then they may end up creating work for the DBaaS staff.  If the DBaaS staff make the decision, they affect the application logic.

Another example is referential integrity. In our example, orders belong to a customer. Enforcing this means handling several events, including what to do when a customer is deleted: what happens to that customer’s orders?  From the database perspective, there are several choices:
  • Create primary and foreign key relationships between the data elements; the foreign key definitions will specify what to do when a customer is deleted (prevent the deletion? delete all the now-invalid orders? mark each order as having an invalid customer?)
  • Create database code in the form of a trigger that will execute whenever a customer is deleted; the trigger code specifies what to do as with the previous bullet;
  • Tell the application developers that they are responsible for handling all of a customer’s orders correctly before deleting the customer.

Data Models Affects Scalability

Another promise of DBaaS is that the database will “grow” as the business grows. There are several kinds of growth, including database size (more customers, more products, more orders), transaction rates (more orders per hour), and application size and complexity (multiple applications in a suite; more and differing products and services; more complex data types such as images or XML data). The DBaaS provider can implement quick fixes for capacity growth such as faster hardware, more CPUs, more disk storage, faster networks, and so forth. However, there are two growth-associated issues that are more difficult to handle: reduced data availability, and increased transaction response times.

Larger Database -- Harder to Access It All

As your business grows the database grows with it, sometimes in ways that you do not expect. Indeed, this is an issue that the DBaaS provider will struggle with. A competent database designer will review future data and capacity projections and use appropriate features of the database management system to prepare for these needs. Two examples are performance indexes and data partitioning.

A performance index is a database object that supports fast access to keyed data, especially data used in search functions. For example, if your application allows a customer to search for products by name, the DBaaS provider may consider creating an index on product name.

Data partitioning is a way to physically separate data in order to speed up performance. Partitions can be logically defined on a single disk drive, be separated onto two or more drives, or even spread across multiple machines. This allows the database designer to take advantage of data meanings and usage in order to increase performance. For example, consider partitioning an Order table by Order Date, with orders older than 30 days in one partition and the remainder in a second. This isolates current orders in a single physical place, potentially giving faster results for some searches.

Who decides to implement features like these, and when?  Should they be considered at the beginning of the project, or should they be postponed until they are required?  If they are implemented at a later date, who makes the decision and who pays how much for it to happen?

More Orders Per Hour -- Application Slows Down

A significant increase in new and returning customers is a business’ dream. More customers and more orders will translate into more profits. What could go wrong?  From the database perspective, increased activity will at some point lead to data access-related issues such as locking and hot spots.

Locking results when two or more processes attempt to access or change the same portion of data, or when each is attempting to read data that has been changed or deleted by another. Let’s consider a simple case. User A begins a transaction to buy Product X, and there is only one available; so, User A adds it to their shopping cart. While User A continues to purchase other items, User B logs on and is interested in Product X.  How many are available? Until A completes their transaction, User B cannot access the record of Product X, so B is put on hold.  Thus, increased activity leads to locking, which leads to perceived slowdowns.

A hot spot is a portion of the database that is frequently accessed and can cause contention. Consider our earlier example where we used a special counter to remember our Next Order. The result will be orders with ascending numeric values. Where are Orders stored in the database?  One simple implementation would be to create an Order table with the expectation that orders would be stored in ascending physical order, low-numbered orders at the beginning of the tables, recent orders at the end.

As the number of transactions increases, the physical end of the Order table becomes a hot spot. Most of the disk activity for this table will be at this one spot, with the remainder of the table rarely accessed. Indeed, the beginning of the table containing the oldest orders may not get referenced at all once these orders are filled.

Hot spots lead to physical resource contention. In our example, they can lead to skewed disk activity for a particular table. The database administrator has several design alternatives that can relieve hot spots. Again, as with locking, who is responsible for choosing these options, when will they be implemented, and who will pay for it?