How Kubernetes and DB Operators are Driving the Data Revolution – The New Stack

Dmitri Chechetkin

Dmitrii Chechetkin is a senior software engineer and developer advocate at Couchbase. With 14 years of experience as a full-stack software engineer building web and mobile applications, Dmitrii has extensive knowledge of IT technologies. Prior to Couchbase, Dmitrii was a software architect at Media Trust, as well as an API solutions architect at Marriott International.

These are tough days for IT. The amount of data our systems have to process is increasing exponentially. This challenge is further amplified by the growing complexity of data. Information is useless without its context, and context is established by relationships between different data points, but each relationship also requires logic and processing resources. As a result, the demands on our data storage and retrieval systems and their management complexity are increasing, making manual database management practices increasingly unviable.

Luckily, this isn’t the first time engineers have come across a problem like this. Our history is full of inspiring examples of how to meet growing demand. From the first windmills to steam engines, screw lathes and Ford’s conveyor belt – thinking back to the industrial revolution, it can be concluded that proper and successful automation can reach new levels of productivity and increase Economic Growth.

Kubernetes as tower

Another example of successful automation comes from the early 2010s. At that time, we were faced with a similar problem but with a software architecture: the Internet has changed everything in the operation of a user application. Our first approaches, rooted in the application of well-known centralized architectures of the client/server era, did not work. Large centralized application backends simply couldn’t provide the flexibility to scale from thousands to millions of requests per second. Most of us probably remember at least a few instances where “monolithic” web applications had serious performance issues after going viral.

The solution to this problem came from adapting an approach where organizations split these monoliths into smaller “micro” services running on Docker containers which can be scaled horizontally independently of each other and much faster than monoliths. With each microservice adding to the demand on development operations, this strategy, however, would not be as successful without container orchestration frameworks like Kubernetes. Publicly introduced in 2014, Kubernetes, formerly known within Google as Borgquickly established itself as the best choice for automating deployment workflows and is now one of the industry standards for modern development operations.

Additionally, being an open source cloud-native component, Kubernetes continues to evolve and improve. Echoing the idea of ​​automated software installation packages, Kubernetes not only abstracts specific infrastructure implementations, but also automates environment creation and deployment procedures. Most organizations that use Kubernetes trust it to run at least 50% of their overall workload.

Autonomous operators

Today we are in the early days of the data revolution. And, as at the dawn of the industrial revolution, the world expects us to meet the demand for data processing by automating the management functions of our data platforms.

When it comes to working with data and databases, automating business operations can provide a growth impetus for any organization that relies on data literacy and decision making: stability and agility through repeatability. Human operators, while good at problem solving and innovating, are not good at routine tasks and quickly become error prone. Operational tasks such as scaling up and down, backups, patches, and routine database maintenance are examples of such activities. Hearing “oops, wrong command” from your DBA can be a nerve-wracking experience.

The problem to solve is to take the best practices of these human operators and automate them efficiently in a standardized way.

Ten years ago, creating an automated database management system took a lot of effort because it had to be built from scratch. This naturally prompted the emergence of managed database-as-a-service (DBaaS) solutions. AWS was the first major company to create such a service with the launch of DynamoDB in 2012. Following its success, other big players also rushed into the new market. However, there are issues with using a generic DBaaS (eg vendor lock-in, usage requirements for specific versions, minimal customization for specialized workloads, etc.).

The evolution of Kubernetes into a Swiss army knife of automation has changed all that by providing an excellent and stable software management framework. A particularly important step in this evolution was support for stateful sets and persistent volumes, since databases are a classic example of a stateful application.

Using elements of control theory, The operators operate as Kubernetes extensions/plugins and use custom resource definitions (CRDs) to define and control the state of your services. Building your database environment with declarative CRDs is quite simple: what you type is literally what you get. The operator reads your CRD of the desired state of the system and not only creates it for you, but also monitors the environment using internal events and ensures that the system is always close to the desired state . No more complicated configuration scripts – your entire database system is standardized, described in a declarative language (YAML) and self-explanatory.

Couchbase Standalone Operator was one of the first products to make extensive use of this framework for database automation. Many more community built database operators for Kubernetes have also become popular in recent years. Several communities and interest groups have also sprung up around the technology, for example, the DoK (Data on Kubernetes) community.

The new horizons

The rise of DevOps, DBaaS, Kubernetes, and Operators creates a compelling end-to-end platform for distributed applications. Developers don’t have to worry about how their code is deployed or how different components communicate with each other. Instead, developers can focus on the data and the logic that governs its evolution to deliver improved insights and decision-making capabilities for the business. Finally, the same consistent tool/framework can be used to manage all layers of the application stack, including the critical database layer. By freeing up significant organizational resources from routine labor-intensive tasks, automation creates space and time for innovation and new progress.

More broadly within the industry, the future looks bright. Cloud providers are turning to fully managed services to offer a new business model. Since most of these database innovations are open source, some of the cloud providers have added a wrapper around these open source technologies and offered it as DBaaS. Unfortunately, this strategy has not been good news for everyone. This had a huge impact on the revenues of other database vendors, forcing them to change their licenses. While the precise approach has differed from vendor to vendor – with some opting for commercial source licenses (BSL) and others for server-side public licenses (SSPL) – the end goal is the same. However, this is an evolving landscape, and it is hoped that it will stabilize at a state where it will benefit most and original research and innovation will be rewarded.

Feature image via Pixabay.