System Calls

Due to the protection mechanisms built into the processor, programs running with user privilege cannot access certain memory locations, or execute certain instructions. This, after all, is the whole point of the protection mechanisms - to protect the kernel against errant user programs. Some mechanism has to be found to let a user program indicate to the kernel that it requires certain tasks to be undertaken on its behalf. The easiest way to accomplish this (although not the only way) is to use the syscall and sysretq instructions. These instructions allow the program to switch privilege levels, run some kernel code, and then switch back again.

In the original incantation of IanOS I thought up my own list of system calls. Now I have decided that it more expedient to use the well-defined set in UNIX systems (but, of course, my implementation of these calls is completely different). Apart from the fact that they are a well thought out list it means that, should I ever get to that stage, porting programs written for UNIX or Linux will be easier. That seems unlikely to happen any time soon, but you never know!

All system calls are defined in syscalls.s (although most of the routines here call functions from elsewhere in the kernel). The syscall instruction always jumps to the same code (in our case SysCalls) with register r9 containing the number of the particular system call requested. A simple jump table routes to the appropriate routine. These routines are all very simple, and must all end with a sysretq instruction to return to the calling program. The current selection is by no means complete; in a more complete system there will be a lot more of them.

There is a very important point to note if you wish to write your own system calls. The syscall instruction is a bit like a subroutine call, but rather than storing its return address on the stack it stores it in the register rcx and the sysretq instruction expects its return address to be in rcx. It is essential that, if your code is going to change rcx (which any C routine is very likely to do), you immediately save that value and restore it just before sysretq. The flags register is also stored in r11, but it doesn't really matter whether this is preserved or not.

There is just one system call where we put a different value into rcx. This is the execve system call, which loads a new program and so needs to start at the starting address of that new program. In this case we put the correct value into rcx and discard the one that was there before.