[ Index ] [ Prev: Blinking The LED ] [ Next: Taming The MMU, continued ]
So I had a slightly modified sparc kernel, which was able to control the front LED, and run a few lines of C code, then crash. It would crash because the code was trying to talk to an MMU in the wrong dialect. In order to go further, I needed to tame the MMU, and be able to program it correctly.
So I peeked (again) at the various .h files in order to gather some information. Here is a summary of what I found.
Now I'll explain these in more detail.
Load and store operations from and to memory addresses on sparc can use an optional form of the ld and st instruction families, with the ``a'' suffix, specifiying as an extra parameter an Adress Space Identifier. This is mostly intended for bus accesses, to request different accessing modes (with or without snooping, with or without cache, etc).
Since there are 256 possible ASI, it is possible to ``hide'' specific functions through them.
This is how Panasonic chose to provide access to the control regs. Accesses to specific ASI, regardless of the address provided in the instruction, would set or retrieve a specific MMU register.
idt/mmu.h comes to the rescue with the following definitions:
/* +-- Translated */ /* | +-- Internal/External */ /* | | +-- Type */ /* | | | */ /* v v v */ #define ASI_LOOKUPD 128 /* Y I RO -- Data translation lookup */ #define ASI_LOOKUPI 136 /* Y I RO -- Instr. translation lookup */ #define ASI_GTLB_RANDOM 192 /* N I WO -- Random tlb dropin */ #define ASI_GTLB_DROPIN 193 /* N I WO -- tlb dropin */ #define ASI_GTLB_INVAL_ENTRY 194 /* N I WO -- invalidate entry */ #define ASI_GTLB_INVAL_PID 195 /* N I WO -- invalidate PID */ #define ASI_GTLB_INVALIDATE 196 /* N I WO -- invalidate tlb */ #define ASI_ITLB_DROPIN 200 /* N I WO -- dropin entry into ITLB */ [...] #define ASI_MMCR 224 /* N I RW -- mmu/cache control/stat reg */ #define ASI_PDBR 225 /* N I RW -- page directory base addr */ #define ASI_FVAR 226 /* N I RW -- fault virtual addr */ #define ASI_PDER 227 /* N I RO -- page dir entry pointer */ #define ASI_PTOR 228 /* N I RO -- page table offset */ #define ASI_FPAR 229 /* N I RW -- fault physical addr */ #define ASI_FPSR 230 /* N I RW -- fault physical space */ #define ASI_PIID 231 /* N I RW -- process ID invalidation */ #define ASI_PID 232 /* N I RW -- process ID */ #define ASI_BCR 233 /* N I RW -- bus control */ #define ASI_FCR 234 /* N I RW -- fault cause */ #define ASI_PTW0 235 /* N I RW -- PTW 0 */ #define ASI_PTW1 236 /* N I RW -- PTW 1 */ #define ASI_PTW2 237 /* N I RW -- PTW 2 */
Of particular interest are the process ID registers. They seem to imply that the MMU has hardware contexts, like all other SPARC MMU architectures.
Contexts here mean that a specific set of translation tables could be associated to a magic number (the context). Usually, one context is reserved for the supervisor mode (the kernel), and every process or lwp has one context allocated. The MMU remembers informations for n contexts (the value of n depending upon the hardware), so that if switching back and forth between a small set of contexts, it does not have to reload its root page directory pointers.
So I'll experiment first with everything tied to the first context, the one the MMU was left in by the PROM; then once I have something that works, I'll try to optimize things by using this ability (assuming I am not mistaken in my analysis).
A few words on the ASI_GTLB values. They seem to not only allow the invalidation of TLB entries (necessary when you alter Page Table Entries, in case they are in the MMU internal cache), but also to insert specific entries. This makes sense since idt/trap.h defines the two constants:
#define T_ITLBMISS 0x2C #define T_DTLBMISS 0x3CWhich makes me think the MMU will trap every time we'll access a page which PTE is not in a TLB.
This might sound like a performance killer (and will be, if the TLB is small). However, since there are separate traps for instruction and data TLB miss, it should be possible to implement an X bit in the page tables, which would be enforced by the iTLB miss handler. Ain't life great?
The bad side of the coin is that some comments scattered in various files mention that some older KAP processors have bugs in the TLB management, and that it is necessary to update the TLB in a very specific way, instead of adding entries at random:
/*#define TMISS_CIRCULAR /* use a circular tlb replacement instead of random */ #if defined(KAPBUG_S444) && !defined(TMISS_CIRCULAR) ERROR -- must use circular tlb replacement with KAPBUG_S444 #endifFortunately for us, neither KAPBUG_S444 nor TMISS_CIRCULAR seem to be defined in header files or kernel configuration files. I hope the problem only affected early production machines (KAP masks M2C3 and below), so I won't have to figure out how to sing and dance the non-random TLB management.
The PTW are controlled by three MMU registers, PTW0, PTW1, PTW2.
It looks from idt/mmu.h that windows only match the topmost 8 bits of an address. Which implies that the size each window is 16MB (01000000), and it spans the xx000000-xxffffff address range, where xx is the window number.
#define PHYSICAL_WIN_MASK 0xff000000 /* mask for phys windows */ #define PHYS_WIN_SIZE 0x01000000 /* phys. window size (bytes) */
The PROM initializes them so that PTW0 maps the low 16 MB of memory (which starts with the PROM data area, at physical address 0), to the window fd. This means that the PROM is at virtual address fd000000 onwards, and explains why the kernel needs to be loaded at an address beyond fd044000, which accounts for a 44000 bytes PROM data section, or in decimal, 278528 bytes (a bit more than 256 KB).
Comments in idt/mmu.h also tell us that window fe points to the same physical area, but with cache disabled. Finally, the last window, window 00, is said to point to the PROM text (code).
/* * The PTWs are set by the ROM so that: * virtual space 0xfd is physical memory (lowest 16 megs) * virtual space 0xfe is same as 0xfd but non-cacheable * virtual space 0x0 is the bootrom text (later disabled by kernel) */
Later in the same file, we can see the layout of these translation registers. One can indeed specify an arbitrary 16MB virtual address to physical address translation, with or without cache, read-only or read-write, and optionnaly supervisor only. The role of the PTW_MASK field is unclear to me, it might be a way to reduce the range further down than 16MB.
#define PTW_V 0x00000001 /* valid bit */ #define PTW_RO 0x00000002 /* read only bit */ #define PTW_UP 0x00000004 /* user protect */ #define PTW_MA(V) ((V) << 3) #define PTW_IO PTW_MA(0) /* I/O Attribute */ #define PTW_CACHE PTW_MA(1) /* Cache Attribute */ #define PTW_BYTE_SHARED PTW_MA(2) /* Byte-Writeable Shared */ #define PTW_SHARED PTW_MA(3) /* Non-Byte-Writeable Shared */ #define PTW_MASK(V) ((V & 0xff) << 8) #define PTW_TPA(V) ((V & 0xff) << 16) #define PTW_TVA(V) ((V & 0xff) << 24)The sharing bits were probably intended for multiprocessor systems based upon the KAP processor; to the best of my knowledge, no such system was designed.
[ Index ] [ Prev: Blinking The LED ] [ Next: Taming The MMU, continued ]