[Lab Report] MIT 6.S081 Lab: page tables (v2022)
Speed up system calls
According to the hints, we need to map a struct usyscall
to the USYSCALL
address in kernel/memlayout.h
. First, we make the following modification to kernal/proc.h
:
struct proc {
...
struct usyscall *usyscallpage;
};
The mapping process is performed through proc_pagetable
in kernel/proc.c
, which requires the use of the mappages
(with read-only permissions for the userspace):
According to the RISC-V manual, if the PTE_U
flag is set, user code can use that PTE, otherwise only supervisor mode can use it.
pagetable_t
proc_pagetable(struct proc *p)
{
...
// An empty page table.
pagetable = uvmcreate();
if(pagetable == 0)
return 0;
if(mappages(pagetable, USYSCALL, PGSIZE,
(uint64)(p->usyscallpage), PTE_U | PTE_R) < 0){
uvmfree(pagetable, 0);
return 0;
}
// map the trampoline code (for system call return)
...
}
Next, we initialize the page and save the current PID in allocproc
. Also, we free the page in freeproc
:
static struct proc*
allocproc(void)
{
...
// Allocate a trapframe page.
if((p->trapframe = (struct trapframe *)kalloc()) == 0){
freeproc(p);
release(&p->lock);
return 0;
}
if ((p->usyscallpage = (struct usyscall *)kalloc()) == 0) {
freeproc(p);
release(&p->lock);
return 0;
}
p->usyscallpage->pid = p->pid;
// An empty user page table.
...
}
static void
freeproc(struct proc *p)
{
...
p->trapframe = 0;
if(p->usyscallpage)
kfree((void *)p->usyscallpage);
p->usyscallpage = 0;
if(p->pagetable)
...
}
After completing the above modifications, running ./grade-lab-pgtbl ugetpid
caused the system to hang. Attempting make qemu
resulted in the following error:
xv6 kernel is booting
hart 2 starting
hart 1 starting
panic: freewalk: leaf
Notice that the error message can be found in the freewalk
function of kernel/vm.c
. In particular, the kernel panics due to the presence of PTE_V = 1
, which means the page is still valid!
void
freewalk(pagetable_t pagetable)
{
// there are 2^9 = 512 PTEs in a page table.
for(int i = 0; i < 512; i++){
pte_t pte = pagetable[i];
if((pte & PTE_V) && (pte & (PTE_R|PTE_W|PTE_X)) == 0){
// this PTE points to a lower-level page table.
uint64 child = PTE2PA(pte);
freewalk((pagetable_t)child);
pagetable[i] = 0;
} else if(pte & PTE_V){
panic("freewalk: leaf");
}
}
kfree((void*)pagetable);
}
We proceed by launching gdb
:
$ make qemu-gdb
Setting a breakpoint at freewalk
, and then trace back:
$ gdb-multiarch -x .gdbinit
...
(gdb) b freewalk
Breakpoint 1 at 0x80000970: file kernel/vm.c, line 273.
(gdb) c
Continuing.
^C
Thread 1 received signal SIGINT, Interrupt.
panic (s=s@entry=0x800080f8 "freewalk: leaf") at kernel/printf.c:127
127
(gdb) backtrace
#0 panic (s=s@entry=0x800080f8 "freewalk: leaf") at kernel/printf.c:127
#1 0x00000000800009c0 in freewalk (pagetable=0x87f71000) at kernel/vm.c:283
#2 0x0000000080000998 in freewalk (pagetable=0x87f72000) at kernel/vm.c:280
#3 0x0000000080000998 in freewalk (pagetable=pagetable@entry=0x87f73000) at kernel/vm.c:280
#4 0x00000000800009f2 in uvmfree (pagetable=pagetable@entry=0x87f73000, sz=sz@entry=4096) at kernel/vm.c:296
#5 0x0000000080001028 in proc_freepagetable (pagetable=0x87f73000, sz=sz@entry=4096) at kernel/proc.c:231
#6 0x000000008000431e in exec (path=path@entry=0x3fffffcf00 "/init", argv=argv@entry=0x3fffffce00) at kernel/exec.c:129
Apparently the problem arises from proc_freepagetable
in kernel/proc.c
. Let’s see what this function actually does:
// Free a process's page table, and free the
// physical memory it refers to.
void
proc_freepagetable(pagetable_t pagetable, uint64 sz) {
...
}
The problem is that the mapping we added earlier are not being freed. A simple tweak will do the trick:
void
proc_freepagetable(pagetable_t pagetable, uint64 sz)
{
uvmunmap(pagetable, TRAMPOLINE, 1, 0);
uvmunmap(pagetable, TRAPFRAME, 1, 0);
uvmunmap(pagetable, USYSCALL, 1, 0);
uvmfree(pagetable, sz);
}
Which other xv6 system call(s) could be made faster using this shared page? Explain how.
Any system call that directly or indirectly invokes the copyout
fuction will be accelerated, as it saves time on copying data. Additionally, system calls used purely for information retrieval, such as getpid
in this section, will also be faster. This is because the operation of trapping into the operating system is no longer necessary, and the corresponding data can be read in usermode instead.
Print a page table
As the hints suggest, we start with inserting if(p->pid==1) vmprint(p->pagetable)
in kernel/exec.c
just before return argc
:
int
exec(char *path, char **argv)
{
...
proc_freepagetable(oldpagetable, oldsz);
if(p->pid==1)
vmprint(p->pagetable);
return argc; // this ends up in a0, the first argument to main(argc, argv)
bad:
...
}
Recall that the freewalk
function that we have seen in the previous section utilizes a recursive approach to free all page tables, which means some minor modifications for that function would suffice for our purposes:
void
printpgtb(pagetable_t pagetable, int depth)
{
// there are 2^9 = 512 PTEs in a page table.
for(int i = 0; i < 512; i++){
pte_t pte = pagetable[i];
if(pte & PTE_V){
printf("..");
for(int j=0;j<depth;j++) {
printf(" ..");
}
printf("%d: pte %p pa %p\n", i, pte, PTE2PA(pte));
if((pte & PTE_V) && (pte & (PTE_R|PTE_W|PTE_X)) == 0) {
uint64 child = PTE2PA(pte);
printpgtb((pagetable_t)child, depth+1);
}
}
}
}
void
vmprint(pagetable_t pagetable) {
printf("page table %p\n", pagetable);
printpgtb(pagetable, 0);
}
Remember to define the prototype for vmprint
in kernel/defs.h
:
// vm.c
...
int copyinstr(pagetable_t, char *, uint64, uint64);
void vmprint(pagetable_t);
Detect which pages have been accessed
We start with defining the access bit in kernel/riscv.h
. Its offset can be found in the RISC-V manual:
...
#define PTE_U (1L << 4) // user can access
#define PTE_A (1L << 6)
Notice that SYS_pgaccess
has already been registered, we can jump directly into implementing the sys_pgaccess
function in kernel/sysproc.c
. Initialization of the arguments is trivial:
Remark: the 32-bit RISC-V standard calling convention
- The return value is placed in the
a0
register.- Arguments are placed in the
a0
,a1
, …,a7
registers from left to right. If there are more arguments, they are pushed onto the stack from right to left, with the 9th argument at the top of the stack.
uint64 startaddr; // the starting virtual address of the first user page to check
int npage; // the number of pages to check
uint64 useraddr; // a user address to a buffer to store the results into a bitmask
uint64 bitmask = 0; // a datastructure that uses one bit per page and where the first page corresponds to the least significant bit
argaddr(0, &startaddr);
argint(1, &npage);
argaddr(2, &useraddr);
Now we start iterating the pages we just passed in, and fill the bits of bitmask
accordingly. After that we output bitmask
with copyout
.
If PTE_A = 1
, we left shift the mask by $i$ bits and perform a bitwise AND operation with the bitmask so that the other bits in bitmask
remain unchanged, and only the bit at position $i$ is set to 1.
uint64 complement = ~PTE_A;
struct proc *p = myproc();
for (int i = 0; i < npage; ++i) {
pte_t *pte = walk(p->pagetable, startaddr+i*PGSIZE, 0);
if (*pte & PTE_A) {
bitmask |= (1 << i);
*pte &= complement; // reset PTE_A
}
}
copyout(p->pagetable, useraddr, (char *)&bitmask, sizeof(bitmask));
The entire function looks like this:
int
sys_pgaccess(void)
{
uint64 startaddr;
int npage;
uint64 useraddr;
argaddr(0, &startaddr);
argint(1, &npage);
argaddr(2, &useraddr);
uint64 bitmask = 0;
uint64 complement = ~PTE_A;
struct proc *p = myproc();
for (int i = 0; i < npage; ++i) {
pte_t *pte = walk(p->pagetable, startaddr+i*PGSIZE, 0);
if (*pte & PTE_A) {
bitmask |= (1 << i);
*pte &= complement;
}
}
copyout(p->pagetable, useraddr, (char *)&bitmask, sizeof(bitmask));
return 0;
}