Before we can start multitasking a certain amount of preparation must be done, most noticeably the creation of an Interrupt Descriptor Table. Tasking will rely on a regular timer tick which is produced by an interrupt routine servicing the real-time clock interrupt. Also, we will want to set up interrupt routines to cover the most common faults (general protection faults and page faults); my routines print a trace of the last five stack entries which is very useful when debugging faults.
The start of the 64-bit code is contained in the file os.s. The first three instructions set up the data and stack segment registers (the code segment selector was set up by the preceding jmp instruction. Then there are a series of writes to three Model-Specific Registers; these registers (0xC0000081, 0xC0000082, and 0xC0000083) specify the address and code and stack selectors to be used by the syscall and sysret instructions (see section 6.1.1 in AMD2). These two, incredibly useful, instructions allow one to easily make calls from user-mode code to privileged system code, and back again; without them our task would be very much more complicated. The routine to be called is SysCalls in the file syscalls.s; this routine does an indirect jump to the various system-call routines based on the number of the call (which is passed in register R9).
A call to InitIDT (in the file gates.c) sets up the Interrupt Descriptor Table, which points to various routines in interrupts.s. At this stage we also construct a Task State Selector Descriptor and load a pointer to it into the Task Register. Although hardware task switching is not supported in long mode it is a requirement that at least one 64-bit Task State Segment exists. A pointer to the IDT is now loaded into the IDT register. Although the IDT is now set up, we don't allow interrupts just yet; that would allow task switching to start, but we don't yet have any tasks to switch to.
Now we zero a page of memory where the task structures will reside and then initialize a few values in the first task structure - just enough to allow the first task to run. Pages are now allocated, and Page Table entries created, one each for Kernel Stack, User Stack, User Code and User Data. The Kernel Stack puzzled me for a while, but I eventually realized that it is a separate stack that can be allocated for interrupts to use. This is necessary because an interrupt may be responding to a fault casued by an invalid stack reference; having a separate, known good stack, allows the interrupt to work even if there is a problem with the User Stack. To this end any entry is made in the TSS for this stack; actually you can have several, different ones for different interrupts, but I just use the one. Which stack an interrupt uses is configured in the interrupt descriptor in the IDT.
The code for this first task is then moved to its correct position. A routine StartTasks is now called to set the kernel tasks (essentially device drivers) ready to run. The sysretq instruction, which is what we shall use to actually start tasking, expects the current program location to be contained in register rcx and the flags register to be stored in register r11. When tasking starts we want the flags to be as they currently are but with the interrupt flag set to enable interrupts (it is this enabling of interrupts that allows the timer to start and thus task switching to occur). To accomplish this we pushfq the flags and pop them back to r11; then we switch on the interrupt flag in r11. The sysretq instruction now sets the whole ball rolling by running tas1.c. Tas1 will load the other initial tasks and then kill itself.
So, now it's running and the rest is merely detail! But there's quite a lot of detail so I'll explain it as well as I can. The most important details concern memory management and tasking - this is to a large extent what an OS kernel is all about - so I'll start with those.