Bridging the gap between open source databases and databases

It’s relatively easy to get a group of people to create a new database management system or data store. We know this because over the past five decades of computing, the rate of proliferation of tools for structuring data has increased, and it seems at an increasing rate. Thanks in large part to the innovation of hyperscalers and cloud builders as well as academics who simply like to dig into the guts of a database to prove a point.

But it’s another thing to take an open source database or data mart project and turn it into a business that can deliver enterprise-grade fit and finish and support a wider variety of use cases and customer types and sizes. It’s hard work that takes a lot of people, focus, money and luck.

This is the task that Dipti Borkar, Steven Mih and David Simmen undertook when they launched Ahana two years ago to market the PrestoDB variant of the Presto distributed SQL engine created by Facebook, and it is not a coincidence, it is a similar task that the original creators of Presto undertook in the PrestoSQL, now called Trinio, variant of Presto marketed by their company, called Starburst. Either way, these Presto variants federate databases and data stores and provide a universal SQL layer that allows them to be queries in place – a very powerful capability necessitated by the persistence of legacy databases. and data gravity.

It’s just too difficult to move everything to one place to query it, which is what companies have tried to do to create data warehouses. And even then, data warehouses usually only had summary data, and while they have the advantage of being convenient once the data is in the warehouse, getting data into the warehouse (and making sure it’s not trash) was a huge pain in the neck. In short, as we said a few months ago, you want to do data analytics without the data warehouse, which is the exact opposite of what Snowflake, the darling of the database industry data, made with its cloud data warehouse.

So much that more and more businesses want to query data where it resides using something like PrestoDB. And that’s why Ahana was able to extend its Series A funding announced in August last year, where Google Ventures, Lux Capital, Third Point Ventures and Leslie Capital raised $27.2 million to raise the 4.8 million dollars in seed funding that Ahana has raised to secure began in 2020. With the Series A expansion, Liberty Global Ventures, the venture capital arm of the same company’s telecommunications company that operates its operations across Europe, plus an additional stake from Google Ventures, pumps an additional $7.2 million into the series. A kitten. (We strongly suspect Liberty Global is a client of Ahana, but chief executive Steven Mih won’t comment on that.) That brings the total to $32 million, and Mih adds that Ahana wasn’t looking to raise funds. What we joked about in today’s economic climate, if someone offers you money, you find a reason to take it.

In the ten months since the first tranche of Series A funding arrived, Ahana has more than doubled its staff to just under 50 people, and over 100,000 copies of its Ahana implementation of PrestoDB have been downloaded. Mih isn’t free to say how many paying customers he has on Ahana Cloud’s commercial-grade implementation of the database.

When it comes to increasing the company’s payroll, Mih is understandably cautious. “We want to understand what’s going on with the global economy and any possible headwinds associated with it,” Mih said. The next platformrefraining from using R word. “And if some potential problems don’t occur, we will grow very quickly.”

This growth is driven by the need to perform federated queries across database platforms, which has only been made more evident by the notions of multimodal data processing, eloquently described by Matt Bornstein, Jennifer Li and Martin Casado (one of the creators of OpenFlow and one of the co-founders of Nicira, which gave VMware its NSX virtual networking stack), all prowling the world for good technology investments for Andreesen Horowitz.

At the heart of this modern data processing architecture is what is called a data lake – part old data warehouse and part Hadoop-era data lake, but really a storage in deep and cheap without needing to use MapReduce to browse unstructured data on a cluster of machines.

This table by Mih sums up the center of this table a bit more neatly and legibly:

“As you know, a lot of data gets pumped into data lakes – and it’s semi-structured, structured, and unstructured data,” says Mih. “With everything being commodified, people are wondering why they should put data in another proprietary store like a data warehouse and why they should leave it in open formats. And if they tried to put that data into merchant storage, the compute on the data warehouse is proprietary. The idea of ​​the data lakehouse is to use open source computing, and Presto for SQL query processing is one of the main options. And then for queries and non-SQL workloads, you can use ML and AI frameworks for computation and formats like Parquet for that. Storage is a commodity with the Lakehouse, and the compute layer is really where the costs are always, and Presto is well positioned to play here as a query engine alongside frameworks accessing unstructured data.

This global multimodal data processing architecture has many moving parts, and if Ahana is to be successful in bringing PrestoDB to market and federating the distributed query engine across all kinds of relational data stores, it’s going to have to be easier to install. and to test the SQL Core of a data lakehouse. That’s what the new Community Edition of Ahana Cloud for Presto is all about. This is a free, unlimited version of the database that can run on any cluster, regardless of size. (Most Presto customers have multiple clusters, and that’s where subscriptions will come in.) Here are the differences between the Community Edition and the Ahana Cloud for Presto Full Edition:

The Community Edition runs on the Amazon Web Services cloud, just like the production Ahana Cloud for Presto, and as long as it only runs on a single cluster – regardless of how many EC2 instances are driving it – Community Edition is free. There are a few caveats. The Community Edition does not support Graviton, Graviton2 or Graviton3 instances and only has community support. If you want the professional edition of Ahana Cloud for Presto, you can seamlessly upgrade it, then you can have as many clusters as you want and run on any type of AWS instance, including the Graviton Arm family of server processors that AWS created for its own use. The production version also features increased security, performance enhancements such as autoscaling on AWS, and of course technical support from live human beings employed by Ahana. You can upgrade from Community Edition to Ahana Cloud for Presto (which should just change its name to Enterprise Edition. Ahana Cloud for Presto costs between a few hundred dollars and thousands of dollars per month to license on a modest configuration on AWS, and this does not include the cost of EC2 instances and storage.

Now Ahana has something to help people get started quickly with Presto and save them the days or weeks it would take to set up Presto on top of a data lakehouse. Just grab that container, fire it up, and point it at the data lakehouse, and start hitting it with SQL queries. And each of those Community Edition users can use it forever and never pay until they have a second cluster or need improved security or performance.