Writing a kernel module for speed and profit

Having fun in kernel space

31 August 2024


Recently, I was thinking about the vDSO or the Virtual Dynamic Shared Object System in the linux kernel which allows you to export kernel space routines in user space.

vDSO allows you to use system calls while avoiding the additional latency incurred by switching into kernel space and I was interested in exporting some system calls using vDSO myself.

I was reading an article from lwn.net about implementing virtual system calls and it contained the following sentence

Clearly, only “read-only” system calls are valid candidates for this type of emulation because user-space processes are not allowed to write into the kernel address space.

Which caused me to ask “why?”. If I only care about performance and have no regard for security, what’s stopping me from rewriting the kernel to allow me to do this?

I turned 18 just over two years ago and I would like to use my freedom as an adult to hack the kernel.

I have lots of directions I want to go with exploring the kernel but today I decided to write a kernel module and benchmark the speed improvements I get by performing syscalls while in kernel mode. This short project was spurred by a crazy idea of “there can’t be any cost of moving into kernel mode if we are already in kernel mode to begin with”.

While learning how to do the things I wanted to do, I encountered many warning about how certain operations are usually inadvisable. For example, one of the questions I had was “how do I open files in kernel mode?” and the most common answer was “don’t”.

I ignored every warning telling me to not open files in kernel space

God bless curiosity

I decided that I want to benchmark writing to a file in user space against writing to a file in kernel space.

My C code for writing in userspace is pretty basic, it looks like this:

#include <fcntl.h>
#include <sys/time.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int main() {
    unsigned long long int total = 0;
    for (int it = 0; it < 128; it++) {
        int fd = open("/home/sam/writespeed", O_CREAT | O_WRONLY, 0777);
        char *buffer = calloc(1024, 1);
        struct timeval start, end;

        gettimeofday(&start, NULL);
        for (int i = 0; i < 100 * 1024; i++) {
            write(fd, buffer, 1024);
        }
        gettimeofday(&end, NULL);
        close(fd);
        int diff = (end.tv_sec - start.tv_sec) * 1000000 + end.tv_usec - start.tv_usec;
        total += diff;
    }
    printf("microseconds: %llu", total / 128);
    return 0;
}

My buffer size is 1kb which is pretty small but I’ll use the same buffer size when writing in kernel mode.

So the linux kernel is open source meaning I have all the documentation I need in the torvalds/linux git repository.

Inside linux/file.fs I found filp_open which returns a pointer to a file struct whose definition looks a bit like this.

struct file {
    union {
        struct callback_head    f_task_work;
        struct llist_node   f_llist;
        unsigned int        f_iocb_flags;
    };

    spinlock_t      f_lock;
    fmode_t         f_mode;
    atomic_long_t       f_count;
    struct mutex        f_pos_lock;
    loff_t          f_pos;
    unsigned int        f_flags;
    struct fown_struct  f_owner;
    const struct cred   *f_cred;
    struct file_ra_state    f_ra;
    struct path     f_path;
    struct inode        *f_inode;   /* cached value */
    const struct file_operations    *f_op;

    u64         f_version;
    void            *private_data;

    /* Used by fs/eventpoll.c to link all the hooks to this file */
    struct hlist_head   *f_ep;
    struct address_space    *f_mapping;
    errseq_t        f_wb_err;
    errseq_t        f_sb_err; /* for syncfs */
}

It contains some kernel data structures like the spinlock, mutex and inode. The structure definition also contains some data related to the epoll syscall.

I found the kernel_write function and I use it to write to the file. My kernel module looks like this:

#include <linux/init.h>
#include <linux/module.h>
#include <linux/uaccess.h>
#include <linux/fs.h>
#include <linux/proc_fs.h>
#include <linux/fcntl.h>
#include <linux/kernel.h>
#include <linux/syscalls.h>
#include <asm/uaccess.h>


MODULE_AUTHOR("Sam Ezeh");
MODULE_DESCRIPTION("Hello world driver");
MODULE_LICENSE("GPL");

static int __init custom_init(void) {
    printk(KERN_INFO "Loaded write module");
    unsigned long long int total = 0;
    for (int it = 0; it < 128; it++) {
        struct file * filp = filp_open("/home/sam/writespeed", O_CREAT | O_WRONLY, 0777);
        void *buffer = kzalloc(1024, GFP_KERNEL | __GFP_HIGH | GFP_ATOMIC);
        u64 start = ktime_get_ns();
        loff_t pos = 0;
        for (int i = 0; i < 100 * 1024; i++) {
            kernel_write(filp, buffer, 1024, &pos);
        }
        u64 end = ktime_get_ns();
        int diff = (end - start);
        total += diff;
        kfree(buffer);
        filp_close(filp, 0);
    }

    char *string = (char *) kzalloc(128, GFP_KERNEL | __GFP_HIGH | GFP_ATOMIC);
    sprintf(string, KERN_INFO "microseconds: %lld", (total / 1000) / 128);
    printk(string);
    kfree(string);
    return 0;
}

static void __exit custom_exit(void) {
    printk(KERN_INFO "Exiting write module");
}

module_init(custom_init);
module_exit(custom_exit);

Something that I found interesting about writing kernel code was that I couldn’t use libc. For example, gettimeofday didn’t exist while doing kernal development and neither did malloc. I had to use ktime_get_ns to get the time in order to benchmark performance and kzalloc to allocate memory.

After running both of the benchmarks I found that I got about a 1.2x speedup from writing my code in kernel space.

Cool!