Make Web Applications More Efficient | MIT News

Most major websites these days keep huge databases: shopping sites have inventory and customer review databases, travel sites have availability databases seats on flights and social networking sites have databases of photos and reviews. Almost all transactions on one of these sites require multiple database queries, which can slow response times.

This week, at the 38th International Conference on Very Large Databases – the premier database conference – researchers from MIT’s Computer Science and Artificial Intelligence Laboratory presented a new system that automatically streamlines website database access patterns, making sites up to three times faster. And where other systems that promise similar speedups require proficiency in specialized programming languages, MIT’s system, called Pyxis, works with the kinds of languages ​​already favored by web developers.

Alvin Cheung, a graduate student in the Department of Electrical Engineering and Computer Science (EECS), is the first author of the article. He is joined by his advisor, EECS Professor Sam Madden, and by Owen Arden and Andrew Myers from the Department of Computer Science at Cornell University.

A web services transaction typically involves both the retrieval of data – for example, flights on a given route with available seats – and the calculation – for example, whether the flight time difference would allow the traveler to establish a connection. Typically, data is stored on one server and computation, or “application logic,” is performed on another. The application server and the database may have to exchange information multiple times to conclude that, no, a given route will not work.

But if a few frequently used pieces of application logic could run on the database server instead, it would save time, limiting the number of cross-server transactions and bandwidth, since the only remaining transaction could be sending the single information bit “no.”

But application logic and database queries are usually written in very different languages, which are optimized to handle different types of operations. Therefore, moving code to the database may require not only rewriting it, but also rethinking how it is implemented. And it’s hard to split a program in two without introducing bugs – without, for example, losing track of which server needs to change which variable at what time.

Finally, even if the programmer invests time in establishing that partitioning the program will not introduce errors, there remains the difficulty that the demands on the database server are constantly changing. During normal operation, the database server processor might have enough capacity to handle some application logic. But a sudden spike in traffic could overload the processor so much that the extra computation would push it over its limit, causing much worse delays than shuttling data between application and database would.

Graphical results

Pyxis solves all three problems. It automatically partitions a program between the application server and the database server, and it does so in a way that can be mathematically proven not to disrupt program operation. It also monitors CPU load on the database server, giving it more or less application logic to run based on its available capacity.

Pyxis begins by transforming a program into a graph, a data construction made up of “nodes” connected by “edges”. Probably the best-known example of a graph is a network diagram, in which the nodes (represented by circles) represent computers and the edges (represented by lines connecting the circles) represent the bandwidth of the links between them. In this case, however, the nodes represent individual instructions in a program, and the edges represent the amount of data each instruction passes to the next.

“The code moves from this instruction to the next instruction, and there’s a certain amount of data that needs to be transferred from the previous instruction to the next instruction,” Madden explains. “If the next statement uses a variable that was computed in the previous statement, there is a data dependency between the two statements, and the size of that dependency is the size of the variable.” If the entire program runs on a computer, the variable is stored in main memory and each instruction simply accesses it directly. But if consecutive instructions execute on separate computers, the data must jump with them.

“There is a certain cost to send data over the network, and there is a certain cost for each network round trip you make,” Madden explains. “So we want to find a placement of these nodes on the two different servers that minimizes the overall cost – or overall runtime – of the program.”

In fact, Cheung adds, Pyxis finds several such node locations, some pushing more computations to the database server and some pushing less. “Our tool is able to dynamically switch between them based on the current load on the server,” Cheung explains.

Comparative analysis

In experiments involving a standard set of simulated database transactions, Pyxis was three times faster than a typical implementation, while reducing the bandwidth consumed by nearly half. Moreover, the improvements it offered were within a few percent of those resulting from painstaking hand-coded optimizations by software engineers.

At the moment, Pyxis works with programs written in Java, which Madden says is the preferred language of many commercial web developers. But adapting it to other popular languages ​​would require revising only the code that translates programs into graphical models; the rest of the system would remain the same.

“Usually partitioning systems are automated, not automatic – automated in the sense that some input from the programmer is taken in, and then the system handles some of the more difficult aspects of the partitioning process,” says Eli Tilevich, associate professor of ‘computer science. science at Virginia Tech. With Pyxis, however, “they partition things completely automatically, without requiring any user intervention.”

“They have a program analysis technique, called a score graph,” adds Tilevich, “which is an interesting innovation that hasn’t been applied to earlier systems.” And while even successful academic research projects typically require a lot of work before they’re ready for commercial implementation, Tilevich says, “their ideas, their technologies definitely have commercial applicability.”

In ongoing work, the four researchers in the Pyxis paper are collaborating with Armando Solar-Lezama, assistant professor of computer science and electrical engineering at MIT, to make it even easier to streamline database-dependent web applications. Most databases are written in so-called declarative languages ​​such as SQL, which allow programmers to issue high-level commands, such as finding the largest value of a variable, without specifying a computational approach. . The database system then automatically chooses the most efficient algorithm to execute the command, based on the characteristics of the data.

Web programmers who know Java better than SQL will sometimes move large batches of data from the database to the application server, only to perform operations there that SQL would have performed more efficiently anyway. The researchers are developing a system that relies on Pyxis, dubbed StatusQuo, that can identify such inefficient application logic. It then automatically converts the application code into an SQL query, which the database then executes by whatever means it deems most efficient.