This is a post made to accompany my Bsides Canberra 2023 talk under the same name (+ slides and PoCs) - as often a blog is preferable to a video / offers new insight. I will say that I am very late in writing this - the research was primarily done in June 2023, and then debuted September 2023. Since this research was done, scudo has already had significant changes - primarily the addition of a quarantine for secondary chunks. Hopefully this blog still serves as a useful source!
I will note that this doesn’t aim to be an explanation of scudo’s internals, and that the talk does a better job of that / other articles (such as un1fuzz’s, though there are other articles just generally on scudo internals).
Last thing - this is based on the Android config of scudo.
The main existing research (at the time of writing):
un1fuzz - Great overview of internals, + techniques for double-returning a chunk/getting a chunk at an arbitrary address.
infosectbr - On breaking the checksums used to verify chunk headers
Both are great articles, so thanks to the authors for writing them. I will note that some of the techniques listed here build off un1fuzz’s techniques, so reading their articles will help.
This is by far the most complex exploit in this blog, and by far my least favourite. To fully understand it’ll probably require a lot of deep-diving into scudo’s code. The later two techniques are more robust, and easier to apply. Nonetheless, in cases without secondary chunks, it is useful.
The first goal is to forge a TransferBatch
- that is, to have a chunk we can write to allocated as a TransferBatch This requires an Out-of-Bound write to corrupt the header. We additionally need to corrupt the forged TransferBatch, which could be a Use-After-Free, or we could spray chunks in the same region so that some of those chunks look like ‘part’ of the Transferbatch (if theyre allocated after the forged chunk).
Let’s start on the exploit. First, we allocate, then free the ‘target’ chunk, so that it enters quarantine. Then, we corrupt it’s header. A chunk’s header typically looks like:
typedef u64 PackedHeader;
// Update the 'Mask' constants to reflect changes in this structure.
struct UnpackedHeader {
uptr ClassId : 8;
u8 State : 2;
// Origin if State == Allocated, or WasZeroed otherwise.
u8 OriginOrWasZeroed : 2;
uptr SizeOrUnusedBytes : 20;
uptr Offset : 16;
uptr Checksum : 16;
};
We want to set it’s ClassId to 0. When a chunk is being freed, a ClassId of 0 indicates a chunk from the secondary allocator, and is sent to the corresponding free
. (which we don’t want to happen). However, otherwise, a ClassId of 0 is reserved for internal use with TransferBatch
-es, which look like:
struct TransferBatch {
TransferBatch *Next;
private:
CompactPtrT Batch[MaxNumCached];
u16 Count;
};
Thus, when a chunk in quarantine is de-quarantined, and returned to the appropriate region’s freelist - if it has a ClassId of 0 in it’s header, it goes to Region 0’s freelist.
CRITICAL NOTE - the order of regions is randomised, thus region 0 may be before, or after, any given region. If the chunk we are smuggling in is in a region that is before region 0 in memory, then this exploit will SIGSEGV and fail. The talk would be more detailed, or try the PoC and explore with a debugger. This is a major flaw of this technique, which’ll happen 50% of the time (given the ordering of regions, as said, is random).
So, after corrupting the target’s header whilst it’s in quarantine, we flush the quarantine via allocating+freeing many chunks. The size of the quarantines is known from the config, so we know exactly how many allocations to get our ‘target’ flushed out.
Next, we need to cause a TransferBatch allocation. When a region’s cache of chunks is empty, it’s refilled via a TransferBatch - but if there are none of those, it creates some TransferBatches for the region. Simply put, we need to allocate chunks to exhaust a region’s cache to trigger our forged chunk to be made into a proper TransferBatch.
In the PoC, this is done bunch of code as flushing the quarantine.
Now - the struct reponsible for storing a region’s cached chunk is called PerClass
. And these are all stored in an array, in the big struct that’s essentially all of the data for the primary allocator. After this array is a field called Allocator
- which controls a lot of the actual allocator behaviour. We want to corrupt the Allocator
pointer to a region of memory we control.
struct scudo::SizeClassAllocatorLocalCache
private:
static const scudo::uptr NumClasses;
static const scudo::uptr BatchClassId;
scudo::SizeClassAllocatorLocalCache::PerClass PerClassArray[45];
scudo::LocalStats Stats;
SizeClassAllocator *Allocator;
...
}
How we plan to do this is, when a TransferBatch
replenishes a PerClass
’ cache of chunks, it’s essentially a memcpy into the PerClass, with the count == TransferBatch.count
.
Thus, we corrupt TransferBatch.count
to be large enough that the memcpy overwrites Allocator
, as well as putting what we’re setting Allocator
to as an ‘entry’ in the TransferBatch
. We also add a 0 for later
//Corrupt count for our TransferBatch!! :))
((unsigned int *)chunk)[11] = 0x40a;
// Corrupt Allocator of SizeClassAllocatorLocalCache
chunk[0x202] = (unsigned long)fakeAllocator;
chunk[0x203] = 0;
More allocations from the region our TransferBatch
was allocated for will cause the refill to occur. Now, we control Allocator
.
SizeClassAllocator
(the type of Allocator
) has an array of structs called RegionInfo
, which define the info for each region. Namely, the field RegionBeg
of RegionInfo
is critically important.
For the region we corrupted, within our fake allocator we’ll set RegionBeg
to be our target address - 0x10. (The 0x10 is for the header). Then, our allocation, which initially triggered the refill, will return our target address.
Technically, the 0 we wrote above is the offset from the region that the chunk we’re allocating is at. With an offset of 0, we nullify this pretty much.
And like thus, we have a chunk at our target address.
Full Poc here
Secondary chunks have an additional, ‘secondary’ header in addition to the primary header.
struct alignas(Max<uptr>(archSupportsMemoryTagging()
? archMemoryTagGranuleSize()
: 1,
1U << SCUDO_MIN_ALIGNMENT_LOG)) Header {
LargeBlock::Header *Prev;
LargeBlock::Header *Next;
uptr CommitBase;
uptr CommitSize;
MemMapT MemMap;
};
The field we are mainly concerced about is CommitBase
. This is a pointer, which defines the ‘base’ of the allocation. Assuming a scenario where we can corrupt a secondary chunk’s header, if we can set CommitBase
to point to where we want to corrupt, then upon freeing and reallocating that secondary chunk, it will be at our target address.
There is one downside - the header section of the chunk (which takes up 80 bytes before the user-allocation), must obviously be writeable. Meaning if you’re trying to write to the start of a section, with unwriteable memory before it, a SIGSEGV will be caused. In the related PoC, the variable target
is our target, but we also define the variable big
so that there’s a writeable region before target
. Without big
, we would SIGSEGV trying to get a chunk at target
.
Some deeper details - The code to determine the address for the user-allocation and the header-position is:
const uptr CommitBase = Entries[I].CommitBase;
if (!CommitBase)
continue;
const uptr CommitSize = Entries[I].CommitSize;
const uptr AllocPos =
roundDown(CommitBase + CommitSize - Size, Alignment);
HeaderPos =
AllocPos - Chunk::getHeaderSize() - LargeBlock::getHeaderSize();
Assuming we corrupt the header, we control CommitBase (the most important part) + CommitSize. Additionally Size is the # of bytes we’re allcoation + 80, which we may or may not control.
The short of it is, we need to balance these so AllocPos
== target_addr. In the related PoC, we set CommitSize
to be 80 more than we allocate to account for the header, and put CommitBase
to be 80 bytes before our target.
PoC (+ here):
#include <stdio.h>
#include <stdlib.h>
unsigned long big[0x2000] = {0};
char target[64];
int main() {
unsigned long * bigChunk = malloc(0x20000);
bigChunk[-7] = 0x20050; //CommitSize
bigChunk[-8] = (unsigned long)target - 0x50; //CommitBase
free(bigChunk);
unsigned long * forgedChunk = malloc(0x20000);
printf("Address of target is %p\n", target);
printf("Address of forged chunk is %p\n", forgedChunk);
}
Similar to how we can corrupt secondary chunks to lead to arbitrary-writes, we can also forge secondary chunks. The only important differentiation from a secondary chunk and a normal chunk is the addition of a secondary header, which lies before the primary header.
Assuming we can allocate and write to several primary chunks of the same region, its pretty easy to: 1. Allocate one designated chunk 2. Corrupt it’s header to set the region to 0 3. Spray however many extra chunks in the same region. The contents of these chunks should all be fake secondary-headers 4. Free and reallocate the designated chunk
The scudo allocator randomizes chunks of a region in batches, so that allocations arent contiguous. However, if we allocate enough chunks, and make them all have the fake secondary-header, then one will be allocated before the designated chunk, making it look like the designated chunk has a secondary header.
Then its the same principles of setting CommitBase
and CommitSize
accordingly to get a chunk at your designated address.
PoC (full version here):
...
enum AllocationState { Available = 0, Allocated = 1, Quarantined = 2};
int make_state(int classId, enum AllocationState state, int size) {
return classId | (state << 8) | (size << 12);
}
unsigned long big[0x2000] = {0};
char globalTarget[64];
int main() {
...
unsigned long * chunk = malloc(0x40);
size_t normal_cksum = (*(chunk-2)) >> 48;
int old_state = make_state(4, Allocated, 0x40);
int new_state = make_state(0, Allocated, 0x40);
size_t corrupted_cksum = find_cksum(normal_cksum, old_state, new_state);
(*(chunk-2)) = (corrupted_cksum << 48) | new_state;
for (int i = 0 ; i < 0x1000 ; i++) {
unsigned long * tmp = malloc(0x40);
tmp[0] = 0;
tmp[1] = 0;
tmp[2] = globalTarget - 0x50;
tmp[3] = 0x20050;
}
free(chunk);
unsigned long * forged = malloc(0x20000);
printf("forged chunk is at %p\n", forged);
printf("target is at %p\n", globalTarget);
}
These are things that I’ve found useful when exploiting the allocator but aren’t really deep/novel enough to be called ‘techniques’
Lots of effort went into ensuring that for the ‘primary’ chunk header, no pointers were included, thus in the event of an Out-of-Bound read, an info-leak wouldnt occur (ignoring the user-contents).
The secondary header, however, does contain pointers derived from mmap, as most secondary chunks are mmapped. Since mmap chunks are often contiguous with shared libraries, this can give you an info-leak for those libraries.
It’s important to note that the header is before user-content in memory, so one either needs to read before the user-content (e.g with a negative index), or to read ‘between’ secondary chunks. However, there is a guard page after each secondary chunk (generally), which makes this difficult.
Aside: To avoid unneeded memory allocations, the secondary allocator does cache and re-use chunks, so maybe one could: 1. Allocate a massive secondary chunk 2. Free it 3. Allocate two smaller secondary chunks such thay they can both fit inside the first
I’m not sure off the top of my head whether there’d then be a guard page? ¯\_(ツ)_/¯
Similarly to the info-leak, because mmaped chunks may be contiguous to shared libraries, you might be able to use an Out-of-Bounds write on a secondary chunk to corrupt shared library content.
Similar to the ‘Forging TransferBatch’ technique, nothing specifies we must corrupt the region to be 0. We can set it to be any region we want, thus smuggling that chunk into a different region. We’ll still SIGSEGV if the target region if before the origin region of the chunk, and it’s overall much more situational.
Thanks for reading! I hope you found the article useful. If you have any questions, send them to me on twitter or (preferably) discord. Both can be found here