Blast from the Past: Memory Corruption
Low-level languages and compiled applications will be around forever, the performance and efficiency compared to high-level “safer” language is still unrivaled. As a result, vulnerabilities stemming from memory corruption are around to stay, especially in the embedded device realm. Once you understand the basics, you will come to appreciate the sophistication and complexity involved. Memory based attacks are undoubtedly more sophisticated today and they involve the exploitation and bypass of multiple preventative measures. To truly respect the difficulty required to overcome a number of exploit mitigation techniques, such as ASLR, DEP, etc. a brief history lesson on stack-based buffer overflows is in order.
Basic Overflow
A buffer is a temporary area for data storage. When more data (than was originally allocated to be stored) gets placed by a program or system process, the extra data overflows. It causes some of that data to leak out into other buffers, which can corrupt or overwrite whatever data they were holding.
In a buffer-overflow attack, the extra data sometimes holds specific instructions for actions intended by a hacker or malicious user; for example, the data could trigger a response that damages files, changes data or unveils private information.
Attacker would use a buffer-overflow exploit to take advantage of a program that is waiting on a user’s input. There are two types of buffer overflows: stack-based and heap-based. Heap-based, which are difficult to execute and the least common of the two, attack an application by flooding the memory space reserved for a program. Stack-based buffer overflows, which are more common among attackers, exploit applications and programs by using what is known as a stack: memory space used to store user input.
Essence of the problem
Suppose in a C program we have an array of length: char buffer[4]; What happens if we execute the statement below ? buffer[4] = ‘a’; This is UNDEFINED! ANYTHING can happen ! If the data written (i.e,. the “a”) is user input that can be controlled by an attacker, this vulnerability can be exploited: anything that the attacker wants can happen.
In low-level languages speed and power are important. As such, a program is responsible for its memory management. Memory management is very error-prone – Who has not had a C(++) program crash with a segmentation fault? There are no guard rails in low-level languages and this is by design. Guard rails reduce performance and unnecessary load when a proper developer does not truly need them. C and C++ do not offer memory-safety and have typical bugs:
writing past the bound of an array and pointer trouble such as missing initialization
bad pointer arithmetic
use after deallocation (use after free)
double deallocation
failed allocation
forgotten deallocation (memory leaks)
For efficiency, these bugs are not detected at run time and behavior of a buggy program is undefined.
Fun with the stack
... void f(char* buf1, char* buf2, bool b1) { int i; bool b2; void (*fp)(int); char[] buf3; .... }
Overflowing stack-allocated buffer buf3 to:
Corrupt the return address and let this return address point to another buffer where the attack code is placed
Corrupt function pointers, such as fp
Corrupt any other data on the stack, eg. b2, i, b1, buf2,..
… char dest[20]; strcpy(dest, src); // copies string src to dest ..
In this example strcpy assumes dest is long enough , and assumes src is null-terminated. As an alternative, use strncpy(dest, src, size).
Fun on the heap
… struct account { int number; bool isSuperUser; char name[20]; int balance; } …
Overrun name to corrupt the values of other fields in the struct
Memory Layout
Example Code
#include <string.h>
void foo (char *bar)
{
char c[12];
strcpy(c, bar); // no bounds checking
}
int main (int argc, char **argv)
{
foo(argv[1]);
return 0;
}
Solution
Check array bounds at runtime – Algol 60 proposed this back in 1960! Unfortunately, C and C++ have not adopted this solution, for efficiency reasons (Perl, Python, Java, C#, and even Visual Basic have). Many languages are not prone to memory errors like C(++). These are often called safe languages, because they offer memory-safety and sometimes also type-safety. Typical characteristics of safe languages:
checking array bounds
checking for null values
default initialization
no pointer arithmetic
no dynamic memory management with malloc() and free(), but automatic memory management using garbage collector
strong type checking
exception on integer overflow
more precisely defined semantics
What about “N” functions? Example strcpy and stncpy
Many of notoriously vulnerable string functions were addressed with newer “N” versions. These versions require the use of a buffer length, but they don’t outright prevent an issue. For example, don’t replace strcpy(dest, src) with strncpy(dest, src, sizeof(dest)). The secure method of performing a strcpy is by strncpy(dest, src, sizeof(dest)-1) dst[sizeof(dest)-1] = `\0`; Dest should be null-terminated! A strongly typed programming language could of course enforce that strings are always null-terminated.
Spot the defect
Example 1:
#ifdef UNICODE #define _sntprintf _snwprintf
#define TCHAR wchar_t
#else
#define _sntprintf _snprintf
#define TCHAR char
#endif TCHAR buff[MAX_SIZE];
_sntprintf(buff, sizeof(buff), ”%s\n”, input);
Defect: The CodeRed worm exploited such an mismatch: code written under the assumption that 1 char was 1 byte allowed buffer overflows after the move from ASCI to Unicode
EXAMPLE 2
bool CopyStructs(InputFile* f, long count)
{ structs = new Structs[count];
for (long i = 0; i < count; i++)
{ if !(ReadFromFile(f,&structs[i])))
break;
}
}
Defect: “new Structs[count]” effectively does a malloc(count*sizeof(type)) which may cause integer overflow. This integer overflow can lead to a (heap) buffer overflow. Since 2005 the Visual Studio C++ compiler adds check to prevent this.
EXAMPLE 3
char buf[20];
char prefix[] = ”http://”;
...
strcpy(buf, prefix);
// copies the string prefix to buf
strncat(buf, path, sizeof(buf));
// concatenates path to the string buf
Defect: Strncat’s third parameter is number of chars to copy, not the buffer size. A common mistake is giving sizeof(path) as a third argument.
EXAMPLE 4
char src[9];
char dest[9];
char* base_url = ”www.ru.nl”;
strncpy(src, base_url, 9);
// copies base_url to src
strcpy(dest, src);
// copies src to dest
Defect: Base_url is 10 chars long including it’s null terminator, src will not be null-terminated and strcpy will overrun the dest buffer.
EXAMPLE 5
char *buf;
int i, len;
read(fd, &len, sizeof(int));
// read sizeof(int) bytes, ie. an int,
// and store these in len
buf = malloc(len);
read(fd,buf,len); // read len bytes into buf
Defect: Len might become negative and is later cast to unsigned, a negative length overflows the read() operation and goes beyond the end of buf (which is also not null-terminated). A check could be performed to ensure that len+1 is positive.
EXAMPLE 6
#define MAX_BUF 256
void BadCode (char* input)
{ short len;
char buf[MAX_BUF];
len = strlen(input);
if (len < MAX_BUF) strcpy(buf,input);
}
Defect: What if input is longer than 32k? Len will be a negative number due to the integer overflow and a potential buffer overflow caused within the strcpy() operation. The integer overflow is the root problem, but the (heap) buffer overflow that this enables makes it exploitable.
EXAMPLE 7
char buff1[MAX_SIZE], buff2[MAX_SIZE];
// make sure url is valid and fits in buff1 and buff2:
if (! isValid(url)) return;
if (strlen(url) > MAX_SIZE – 1) return;
// copy url up to first separator, ie. first ’/’, to buff1
out = buff1;
do {
// skip spaces
if (*url != ’ ’) *out++ = *url;
} while (*url++ != ’/’);
strcpy(buff2, buff1);
Defect: strlen(url) calculates length up to the first null, but what if there is no “/” in the URL? What about 0-length URLs? Is buff1 always null-terminated?
EXAMPLE 8
#include <stdio.h>
int main(int argc, char* argv[])
{ if (argc > 1)
printf(argv[1]);
return 0;
}
Defect: This program is vulnerable to format string attacks, where calling the program with strings containing special characters can result in a buffer overflow attack. Strings can contain special characters, e.g., %s in printf(“Cannot find file %s”, filename); Such strings are called format strings.
Classic Mitigations
Stack Canaries
Introduced in StackGuard in gcc and a dummy value is written on the stack in front of the return address and checked when function return. A careless stack overflow will overwrite the canary, which can then be detected. A careful attacker can overwrite the canary with the correct value.
NX/DEP
Distinguish executable memory (for storing code) from non-executable memory (for storing data) and let processor refuse to execute non-executable code .This can be done for the stack, or for arbitrary memory pages. How does this help? Attacker can no longer jump to his own exploit code, as any provided as exploit code will be non-executable . Return-to-libc attacks are a way to get around non-executable memory by overflowing the stack to jump to code that is already there, e.g., library code in libc() instead of jumping to your own attack code. Libc() is a rich library that offers many possibilities for attacker, e.g., system(), exec(), fork(). Return-to-libc is the foundation for more advanced means of bypassing NX/ DEP through Return-Oriented-Programming (ROP).
ASLR
Attacker needs detailed information on memory layout. By randomizing the layout every time we start a program (i.e., moving the offset of the heap, stack, etc, by some random value) the attacker’s life becomes much harder as It prevents easily predicted target addresses (on x64 systems, x86 can be brute forced).
Prevention References
Risky function rating
Low risk i.e., “N” functions (fgets • memcpy • snprintf • strccpy • strcadd • strncpy • strncat • vsnprintf )
Moderate risk (getchar • fgetc • getc • read • bcopy )
High risk i.e., non-”N” functions (strcpy • strcat • sprintf • scanf • sscanf • fscanf • vfscanf • vsscanf • streadd 64 • strecpy • strtrns • realpath • syslog • getenv • getopt • getopt_long • getpass )
Extreme risk (gets)
Better string libraries
libsafe.h provides safer, modified versions of eg. strcpy – prevents buffer overruns beyond current stack frame in the dangerous functions it redefines
libverify enhancement of libsafe – keeps copies of the stack return address on the heap, and checks if these match
glib.h provides Gstring type for dynamically growing nullterminated strings in C – but failure to allocate will result in crash that cannot be intercepted, which may not be acceptable
Strsafe.h by Microsoft guarantees null-termination and always takes destination size as argument
C++ string class – but data() and c-str()return low level C strings, ie char*, with result of data()is not always null-terminated on all platforms...
Bug Hunting
Code reviews & Scanning
Manual tools that look for suspicious patterns in code; ranges for CTRL-F or grep, to advanced analyses in an IDE (Understand)
Free tools – RATS – for PHP, Python, Perl – Flawfinder , ITS4, – PREfix, PREfast by Microsoft
Commercial tools Checkmarx, Coverity, PolySpace, Klockwork, CodeWizard, Cqual, Fortify