How graph technology is making inroads into the database market

When Emil Eifrem, founder and CEO of Neo4j, worked for an enterprise content management start-up in Sweden in the mid-2000s, he struggled with the challenge of mapping the relationships between files, folders, and the people who owned all that content in a relational setting. database.

On a flight to Mumbai, he took a briefcase and drew what is now known as the Property Graph Model, laying the foundation for Neo4j to become one of the largest database providers. specialized graphs on the market.

About five years later, in 2012, Yu Xu, a former Teradata programmer who was inspired by Google’s PageRank graph model that the search giant uses to rank search results, launched TigerGraph to make the basics of easier to use and more scalable graph data thanks to a distributed model.

Admittedly, it is difficult to create a new database platform from scratch. Not only do graph databases need to support so-called acid transactions (atomicity, consistency, isolation, and durability), but they also need to scale across multiple machines and large datasets.

“Acid transactions are really important for the graph, because if you’re writing about two different nodes and the relationship between them, you better be able to write that relationship in a transaction,” Eifrem said. “Otherwise, you have a relationship on hold and you don’t want corrupted data.”

These relationships can be between people, entities, and things like bank accounts, making graph databases suitable for applications like anti-money laundering and fraud detection. This opened the door to some of the biggest financial institutions in the world for TigerGraph and Neo4j.

Retailers also use graph databases to improve product recommendations and fulfillment rates through sophisticated supply chain analytics. According to Gartner, graphics technologies will be used in 80% of data and analytics innovations by 2025, up from 10% in 2021, facilitating rapid decision-making across the enterprise.

Merv Adrian, a vice president analyst for Gartner’s data management team who tracks developments in operational database management systems (DBMS), Apache Hadoop, Spark, non-relational DBMS, and adjacent technologies, said said the healthcare industry has also been a big believer in graph databases.

“There’s so much about medical technology and the pharmaceutical industry that is about understanding correlations and being able to look at large populations and find factors that improve outcomes,” he said. , adding that finding correlations in cancers, for example, could help genomic therapy.

Another example is logistics where companies manage hundreds and thousands of nodes and touchpoints across the global supply chain. “Understanding all the possible correlations becomes complex very quickly and the ability to quickly find a least expensive alternative or the fastest alternative routes can make a huge difference in business results,” Adrian said.

Besides financial services, healthcare and logistics, governments are also very interested in using graphics technology to identify threats against their populations, he added. “It’s sort of the same problem as fraud, which in many ways is about finding bad actors. The same also happens in the political environment.

When MongoDB, Microsoft, and Oracle add graph functionality to their products, users find that at least initially, they can use these products for a while until they start switching to graph databases. specialized graphs.

Merv Adrian, Gartner

Despite the growing adoption of graph databases, most people don’t assume they need a specialized graph database to perform graph analysis, Adrian said.

“It’s usually only when they’ve gained some experience and found that the applications or use cases they’re pursuing are complex enough or used by enough people at the same time that they realize it’s time to get into specialized technology,” he said. .

“That’s because you can practically do big graph analysis yourself on data that isn’t in a graph database. But if there are five other people doing it at the same time, everything will stop.

Adrian noted that the difference between a graph database and other databases, such as Oracle’s and IBM’s multimodal databases, is that the former stores relationships, whereas other databases don’t. , which means that the relationships must be created at runtime.

“It requires calculations and is time consuming if it’s complex,” he said. “And if multiple people are doing something like that at the same time, and they’re doing different kinds of joints, you can see it’s almost a geometric explosion of complexity.

“With graph databases, relationships are stored and managed, even if you are performing multiple different analyzes with different relationships.”

Yet the extent to which specialized graph databases take off is moderated by the addition of graph capabilities to other popular multimodal databases on the market.

“When MongoDB, Microsoft, and Oracle add graph functionality to their products, people find that, at least initially, they can use those products for a while until they start switching to databases. specialized graphs,” Adrian said.

A testament to the growth of niche players, Neo4j surpassed $100 million in revenue in 2021, placing it in the top 30 database vendors list that includes multimodal database vendors Oracle and IBM, but also cloud providers such as Amazon Web Services, Microsoft Azure and Google Cloud.

Global cloud providers may well be the next big players in the graph database space. Although the cloud dominates new DBMS deployments globally, not all dominant vendors are competing aggressively in the graphical DBMS space, according to a new report from Gartner.

But that should change as the market develops. Gartner expects the percentage of revenue attributable to cloud in the overall DBMS market to exceed 50% by 2023.

“As incumbent cloud service providers begin to take market share, they will pose a formidable barrier for smaller providers who will need to both partner and compete with them,” Gartner said.

“As with other specialty markets, smaller vendors’ agility and focus on specific requirements, such as enterprise insight graphs, domain solutions, or analytics and AI, will help them keep a step ahead.”

Xu, CEO of TigerGraph, said the company started with “visionary customers” who were adept at using graph databases, but to scale the business, he said it was important to reduce any friction that stood in the way of wider adoption – similar to what data analytics tools like Tableau did to SQL queries.

“We’re innovating in graphical business intelligence and user interface with a product called Query Builder to provide a visual approach to building your graphical business logic,” he said. “People unfamiliar with any query language can easily ask questions through a browser and, if fraud is detected, receive alerts of potentially fraudulent transactions.”

TigerGraph is also working on emerging domain solutions like entity resolution to help organizations like e-commerce companies create “customer identity graphs,” Xu said, to understand the number and types devices used by a customer to access the same service. , among other ideas. “It will help them personalize and contextualize their services for their customers,” he said.

Gartner’s Adrian said small graph database vendors will also need to focus on integrating with other systems and proving they are enterprise-level ready.

This includes integrating into workflows an emerging group of graphics technology users – the data scientists who are driving AI and machine learning models in use cases for which graph databases are designed.

Neo4j, for example, has built what it calls Graph Data Science (GDS), a connected data analytics and machine learning platform that helps users understand connections in big data to answer critical questions and improve forecasts.

GDS is now one of the fastest growing segments of Neo4j’s business since the product launched in early 2020 and presents a “huge opportunity” for the company, with tech leaders like Google having already transitioned to graph-based machine learning to generate user insights. , says Eifrem. “We think the way Google is going, so is the company.”

Another development that could expand the graph database market is GraphQL, a graph query language that will standardize graph database queries across different front-end tools and databases. The International Standards Organization (ISO) standard, in which many vendors, including large enterprise DBMS vendors, are involved, is expected to be ready by the end of 2023.

“This will likely drive a significant increase in growth and facilitate the transfer of skills and programs that were written to be transferable from one product to another,” Adrian said.