Critical bugs in ‘httpd’ web server, fix them now! – Bare Security

Choose a random person, right now at the end of December 2021, and ask them these two questions:

Q1. Have you heard of Apache?
Q2. If so, can you name an Apache product?

We’re willing to bet you’ll get one of two answers:

A1. No. A2. (Not applicable.)
A1. Yes. A2. Log4j.

Two weeks ago, however, we suggest that very few people had heard of Log4j, and even among those in the know, few would have been particularly interested.

Until a group of potentially catastrophic bugs were revealed under the Log4Shell bug tag, the Log4j programming library was just one of many components that were sucked in and used by thousands, if not hundreds of thousands, Java applications and utilities.

Log4j was just “part of the supply chain”, and it had been integrated with more backend servers and cloud-based services than anyone had imagined so far.

Many sysdamins, IT staff, and cybersecurity teams have spent the past two weeks eradicating this programmatic blight from their domains. (Yes, it’s a real word. It’s pronounced areasbut the archaic spelling avoids implying a Windows network.)

Don’t Forget “The Other Apache”

Go back to the so recent era before Log4j and we suggest you get another pair of answers, namely:

A1. Yes. A2. Apache is a web server, right? (Actually, it’s a software base that makes a web server, among other things.)
A1. Yes. A2. Apache does httpdprobably still the most popular web server in the world.

With over 3,000 files totaling nearly one million lines of source code, Apache httpd is a large and powerful server, with a myriad of combinations of modules and options that make it both powerful and dangerous.

Fortunately, open source httpd product receives constant attention from its developers, getting regular updates that bring new features as well as critical security fixes.

So in all the excitement of Apache Log4j, remember that:

  • You almost certainly have Apache httpd in your network somewhere. Just like Log4j, httpd has a habit of being quietly included in software projects, for example as part of an in-house department that works so well that it rarely draws attention to itself, or as a quietly integrated component in a product or service that you sell that is not primarily considered “containing a web server”.
  • Apache just released a httpd update that fixes two security bugs numbered CVE. These bugs might not be exposed in your setup, because they are part of optional runtimes that you might not actually use. But if you use these modules, whether you realize it or not, you could be exposed to server crashes, data leaks, or even remote code execution.

What has been fixed?

The two CVE numbered flaws are listed in the Apache changelog as follows:

  • CVE-2021-44790: Possible buffer overflow when parsing multipart content in mod_lua Apache HTTP Server 2.4.51
  • CVE-2021-44224: Possible NULL or SSRF dereferencing in forward proxy configurations in Apache HTTP Server 2.4.51 and earlier.

The good news about the first bug is that Apache itself warns that the mod_lua server extension (which allows to adapt the behavior of httpd using Lua scripts instead of having to write modules in C):

…holds great power over httpd, which is both a strength and a potential security risk. It is not recommended to use this module on a shared server with users you do not trust, as it can be abused to modify the internal workings of httpd.

However, as Log4j has taught us, potentially exploitable bugs, even on non-public servers, can be troublesome if those bugs can be triggered by untrusted user data transmitted by other Internet servers at the edge of your network. .

And CVE-2021-44790 does not involve dragging untrusted add-on Lua scripts into the setup.

Instead, it’s just tricking the “preprocessor” that prepares untrusted user content to be passed to trusted Lua scripts, so that the attack doesn’t depend on bugs or flaws in any of the additional scripts that you may have written yourself.

Splitting message into multiple parts

Simply put, bug CVE-2021-44790 exists in code that deconstructs multipart messages, common in web form uploads, which typically look like this:

Content-Type: multipart/form-data; boundary=VILC2R2IHFHLZZ

--VILC2R2IHFHLZZ
Content-Disposition: form-data; name="name"
                                 <--blank line denotes start of first data item
Paul Ducklin
--VILC2R2IHFHLZZ                 <--double-dash-plus-boundary denotes end
Content-Disposition: form-data; name="phone"
                                 <--blank line denotes start of second data item        
555-555-5555   
--VILC2R2IHFHLZZ--               <--double-dash-plus-boundary denotes end

Technically, each multi-part component consists of the data after the end of each completely empty line (see above), and before each boundary line, which consists of two dashes (hyphens) followed by the unique text of the boundary marker.

(In case you were wondering, the extra double dash at the end of the very last line above denotes the last item in the list.)

A blank line in the raw data appears as two CRLF (carriage return plus line feed), or the ASCII codes (13,10,13,10), denoted in C by the text string "rnrn".

This analysis is handled very roughly by code that we have simplified like this:

for (start = findnext(start,boundarytext); start != NULL; start = end) {
   crlf = findnext(start,"rnrn");
   if (!crlf) break;
   end = findnext(crlf,boundarytext);
   len = end - crlf - 8;
   buff = memalloc(len+1);
   memcpy(buff,crlf+4,len);
   [. . .]
}

Don’t worry if you don’t know C – this code is impenetrable and poorly documented even if you know it. (The original is even more complex and harder to follow; we’ve stripped it down to its basics here.)

Basically, he’s looking for a double-CRLF string, designating the next empty line.

From there, it finds the next occurrence of the boundary marker text (VILC2R2IHFHLZZ in our example above).

It then assumes that the data it needs to extract consists of everything between these two landmarks, denoted by the memory addresses (pointers in C jargon) crlf and endminus 8 bytes.

The code makes no effort to explain the meaning of that “minus 8” in the code, nor again the “plus 4” two lines later, although it’s a good immediate guess that crlf+4 is there to skip the 4 bytes that make up the data in the CRLFCRLF string itself. (The empty line is a separator and is not part of the data to be used.)

Here is where the “8” comes from:

  • 4 bytes taken over by the CRLFCRLF characters at the beginning, which are not part of the data itself.
  • 2 bytes from CRLF at the end of the last line of data, not included.
  • 2 bytes used by dashes (--) which indicate the beginning of the demarcation line, not included.

As you can see, the code allocates enough memory for the data between the exact start of the line after the CRLFCRLF separator and the exact end of the line before the boundary marker…

… plus 1 additional byte (len+1) to ensure a NUL character (a zero byte) at the end of the buffer to act as the terminator that text strings need in C.

The code then uses memcpy() to copy the relevant data from the incoming message into this new memory buffer, where it will be presented to the Lua script that is about to run.

What if there are not 8 bytes?

You’ve probably figured out the problem: what if there aren’t 8 more bytes to remove? And if the CRLF at the end of the last line of data, or the -- at the beginning of the next line, is there none at all?

What if there are not 8 bytes in total between the CRLFCRLF And the boundary text?

This bug would have been much more obvious if the code had been more clearly constructed or commented out, and would almost certainly have been avoided if the CRLF-- separator between blank line and boundary text had been explicitly checked by the programmer. and explicitly tested.

This bug was fixed by adding a check to ensure that the final buffer size calculation is not too small, by adding a line before the memory allocation attempt:

 if (end - crlf <= 8) break;

This checks that the buffer length cannot be negative, although we still believe that an explicit check for a correct data terminator, in the same way that there is an explicit check for CRLFCRLFwould make the code clearer.

We would also insert a useful comment referring the reader to one of the Internet RFCs on multipart messages, for example RFC 2045.

Proxy issues

Dealing with CVE-2021-44224 involved numerous code changes, the most obvious being a fix to a file full of utility code used by the httpd proxy module.

The fact that there are over 5000 lines of C in proxy_util.c alone, which is the support code for just one of many httpd modules, speaks to the overall size and complexity of the Apache HTTP Server.

The code we are referring to above has been modified from this…

url = ap_proxy_de_socketfy(p, url);

…to the code that checks that the called function actually found a URL string to work with:

url = ap_proxy_de_socketfy(p, url);
if (!url) {
   return NULL;
}

Before the “if no URL” error check would cause the code to abort early, the program would continue even if url were NULLand try to access the memory through the NULL address.

But read or write to a NULL pointer is “undefined” by the C standard, which means you have to be careful never to do that either.

Indeed, on almost all modern operating systems, the value used for NULLusually zero, is chosen so that any attempt to access the NULL address, whether read or write, will not only fail but be trapped by the operating system, which will then usually kill the offending process to avoid dangerous or unforeseen side effects.

What to do?

  • If you are using Apache httpd all over, update to 2.4.52 as soon as you can.
  • If you can’t patch, check if your configuration is at risk. There are many bugfixes beyond these two CVEs, so you should fix as soon as possible. But you can decide to postpone the patch until a more convenient time if you don’t load either the Lua script or the proxy module.
  • If you are a coder, remember to rigorously check your programs for errors. If there’s a chance of spotting mistakes before making them worse, like checking that you really have enough memory to play, or checking that the string you’re looking for is there, take it!
  • If you are a coder, suppose someone else will need to understand your code in the future. Write meaningful and useful comments, on the grounds that those who do not remember the past are doomed to repeat it.