references: * ~/linux/include/linux/mm.h * man 2 mmap # mm thingy i told some people that i wanted to write a memory management post and they all thought i was going to talk about malloc. but no, it'll mention malloc maybe once and zoom right along. i've been finally connecting a lot of details in how linux implements some kinds of memory tricks so i'm going go to write it all down and you get to learn it too. so here's a small chapter on how computers use memory, from a book i'm not writing. if you're already familiar with how hardware addresses memory, and how you'd configure an MMU and you're just here for the interesting Linux bits: i'll get there eventually, you might want to [skip over the buildup](link to the linux parts). ## primitives ### eletricals and computer architecture (like, physical discrete chips on a breadboard) i want to start from (approximately) scratch here: the juicy nuggets that i set out to write about are at the end, but i think there's something difficult to appreciate without also holding "how we got here" in your head alongside. so, let's start with the simplest machines and see how we get to the systems we have... today. the simplest computer might be something that just has load, store, and arithmetic instructions. maybe even just "load" for all data transfer - "load" to memory, "load" from memory, "load" from one register to another. the processor has a few registers it can directly operate on. so, to do complex workloads you attach some read-write memory (RAM) to store data into with your load and store instructions. volatile memory is expensive and requires power to maintain bits, so you also attach read-only storage (ROM) to store some bootstrap programs like a BASIC interpreter. a simple processor might do 8-bit operations, and use up to 16 bits to address storage. that gives it 2^16 addresses it can access, but your largest (and very expensive) storage parts might only be as large as 4kb - 2^12 needing only 12 bits to address any byte in the part. the processor itself just knows about addresses, and might not particularly care what it's reading or writing to - an access is an access, a load is a load. it just executes a `ld A, (4567h)`, and that execution is just setting an address on a memory bus and reading whatever is on the data lines a few cycles later. this is very flexible! say there aren't monolithic storage devices large enough to span the whole 16-bit-addressable range. if you happen to know none of your storage parts need more than 12 bits to select a byte, that leaves four address-select lines that the processor _will_ signal, if an instruction says to do so, but you might use to select _which part_ you're addressing. to the program running on the processor, exactly what address selection does isn't really important, just that it selects an address and accesses it. so with a single kind of load instruction, you can have a computer that addresses volatile and non-volatile memory. great! you can go even further, and have some address-select lines actually select things that aren't "memory" - a keyboard, some latches driving LEDs, who knows. even a video buffer, if you're getting wild. all at the low cost of deciding to use a few bits of address selection to select where you're actually selecting an address from. at this point you're most of the way to a [TRS-80](trs-80-schematic-diagram-goes-here). processors might have a mechanism to auto-increment after an access, so it makes sense to ensure sequential addresses are to the same storage. if you've got 4kb of RAM and 4kb of ROM you can use the low twelve bits to select a byte in each of those, and pick some other higher bit to select _which_ of those you're accessing. say you decide that bit 15, `0b1000_0000_0000_0000` is how do that device selection. a RAM address would be something low, between `0b0000_0000_0000_0000` and `0b0000_1111_1111_1111`. then a ROM address would be something higher, between `0b1000_0000_0000_0000` and `0b1000_1111_1111_1111`. as a diagram: | DIAGRAM GOES HERE | now you've also got enough of a computer to have problems. it's well and good to say "0x0000 to 0x0fff is RAM, 0x8000 to 0x8fff is ROM", but the processor doesn't know or care about this. the processor will still happily try executing an `ld A, (4567h)`, even if your manual says that's a nonsense address. it will still try to set the address `0100_0101_0110_0111` on the address bus, and still read a byte from the data bus a few cycles later. so what happens? in programming languages people talk about "undefined behavior", and hardware can be undefined just the same. _probably_ what will happen is the address lines that weren't going to select something useful get ignored. so the high bit might still select RAM or ROM, and only the low twelve bits would get used for addressing. the bits in between could easily just be pins of the processor that aren't even electrically connected to anything interesting. you could also have some separate circuitry that detects an invalid address on the address bus and signals an interrupt to handle the condition. but that's a lot of circuitry to handle an error condition that you shouldn't be encountering anyway - just don't make an incorrect program that does stray memory accesses and we can all avoid the hassle. if you decoded to write out what the hardware _does_ with this "undefined behavior", you might end up with a machine that has a strange mapping of addresses to memory: bit 15 is useful, bits 12, 13, and 14 are not, then bits 0 through 11 are functional again. the earlier diagram skipped over the undefined regions, but if you wrote out how this addressing would work in practice, when you _execute_ these "nonsense" accesses, you'd have a diagram more like this: | LONGER AND MORE DEPRESSING DIAGRAM GOES HERE | note that the processor _really doesn't care what an address is selecting_. multiple addresses might select the same physical byte of memory in the machine! that's fine. the processor won't complain, anyway. if you designed a program in tandem with the storage it would be residing in, you might even take advantage of this: _an address now does not have to select the same byte as an address later_. as long as the program using a so-afflicted storage system knows how it will behave, things could still work! _this_ gives you a very interesting option in designing a computer: say you want to store downright luxurious amounts of data - 256kb in total. this would need 18 bits of addressing to pick out individual bits, but your processor is still a little tiny machine that only has 16 address bus pins. if you've designed your program along with the storage for this machine, you could work around this by arranging your storage into _banks_ of memory and reserving a specific byte in memory to select _which bank_ an address should select! now, you'd always want to be able to access the bank select byte. you probably want some RAM that's easily accessed regardless of what bank is selected. so, say you reserve 32kb for bank-select-and-other-misc-purposes, with the banked memory being the other 32kb you can address at a given time. with this scheme, 8 bits for bank selection could let you have a program that spans `32kb (non-banked) + 8 * 32kb (banked)` or `288kb` (!!) of memory, while still using a processor that can only address 16 bits - 64kb - of memory! (we've now invented the [NES's Memory Management Controller/Multi-Memory Controller/MMC](https://www.nesdev.org/wiki/Mapper).) this all is to reinforce some important points about addresses as a simple processor sees them: * addresses don't have to go to a single block of storage * addressing causes a computer to perform actions, electrically or otherwise * addresses can select whatever they're physically wired up to select - maybe not even storage * one address does not have to always select the same word in memory * "addressing" can be a larger operation than just the address indicated on an address bus when a processor is executing an instruction * the processor might trod along even when working with unintended addresses one last thought before getting really into it: i talked at you about a computer design where some addresses select read-only memory, and some addresses select read-write memory. and, a bit about what might happen if you try to do an access to an address that isn't what the hardware was designed to support. but the same line of "what if" might also have you ask, what happens if you try to store a byte to read-only memory? the simplest computer might faithfully select ROM at the address an instruction indicated, and assert that a write is occurring on the data bus. and put the word on its data lines. and wait the agreed-upon number of cycles for the addressed device to complete its storage. and the memory device on the other end of the lines may just have entirely ignored it. the write would be lost. the hardware doesn't much care if it's driven incorrectly, unless it's made to care. so, one last point about addresses: * an address doesn't have to tell you or the processor what it's usable for OK! hopefully that's enough about addresses electrically selecting bytes in storage somewhere. over in processors themselves, things get off the rails. ### memory management units say you've gone forward a few years. it's