Writing a kernel module for speed and profit
31 August 2024
Recently, I was thinking about the vDSO or the Virtual Dynamic Shared Object System in the linux kernel which allows you to export kernel space routines in user space.
vDSO allows you to use system calls while avoiding the additional latency incurred by switching into kernel space and I was interested in exporting some system calls using vDSO myself.
I was reading an article from lwn.net about implementing virtual system calls and it contained the following sentence
Clearly, only “read-only” system calls are valid candidates for this type of emulation because user-space processes are not allowed to write into the kernel address space.
Which caused me to ask “why?”. If I only care about performance and have no regard for security, what’s stopping me from rewriting the kernel to allow me to do this?
I turned 18 just over two years ago and I would like to use my freedom as an adult to hack the kernel.
I have lots of directions I want to go with exploring the kernel but today I decided to write a kernel module and benchmark the speed improvements I get by performing syscalls while in kernel mode. This short project was spurred by a crazy idea of “there can’t be any cost of moving into kernel mode if we are already in kernel mode to begin with”.
While learning how to do the things I wanted to do, I encountered many warning about how certain operations are usually inadvisable. For example, one of the questions I had was “how do I open files in kernel mode?” and the most common answer was “don’t”.
God bless curiosity
I decided that I want to benchmark writing to a file in user space against writing to a file in kernel space.
My C code for writing in userspace is pretty basic, it looks like this:
#include <fcntl.h>
#include <sys/time.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main() {
unsigned long long int total = 0;
for (int it = 0; it < 128; it++) {
int fd = open("/home/sam/writespeed", O_CREAT | O_WRONLY, 0777);
char *buffer = calloc(1024, 1);
struct timeval start, end;
(&start, NULL);
gettimeofdayfor (int i = 0; i < 100 * 1024; i++) {
(fd, buffer, 1024);
write}
(&end, NULL);
gettimeofday(fd);
closeint diff = (end.tv_sec - start.tv_sec) * 1000000 + end.tv_usec - start.tv_usec;
+= diff;
total }
("microseconds: %llu", total / 128);
printfreturn 0;
}
My buffer size is 1kb which is pretty small but I’ll use the same buffer size when writing in kernel mode.
So the linux kernel is open source meaning I have all the
documentation I need in the torvalds/linux
git repository.
Inside linux/file.fs
I found filp_open
which returns a pointer to a file
struct whose definition looks a bit like this.
struct file {
union {
struct callback_head f_task_work;
struct llist_node f_llist;
unsigned int f_iocb_flags;
};
;
spinlock_t f_lock;
fmode_t f_mode;
atomic_long_t f_countstruct mutex f_pos_lock;
;
loff_t f_posunsigned int f_flags;
struct fown_struct f_owner;
const struct cred *f_cred;
struct file_ra_state f_ra;
struct path f_path;
struct inode *f_inode; /* cached value */
const struct file_operations *f_op;
;
u64 f_versionvoid *private_data;
/* Used by fs/eventpoll.c to link all the hooks to this file */
struct hlist_head *f_ep;
struct address_space *f_mapping;
;
errseq_t f_wb_err; /* for syncfs */
errseq_t f_sb_err}
It contains some kernel data structures like the spinlock, mutex and inode. The structure definition also contains some data related to the epoll syscall.
I found the kernel_write
function and I use it to write
to the file. My kernel module looks like this:
#include <linux/init.h>
#include <linux/module.h>
#include <linux/uaccess.h>
#include <linux/fs.h>
#include <linux/proc_fs.h>
#include <linux/fcntl.h>
#include <linux/kernel.h>
#include <linux/syscalls.h>
#include <asm/uaccess.h>
("Sam Ezeh");
MODULE_AUTHOR("Hello world driver");
MODULE_DESCRIPTION("GPL");
MODULE_LICENSE
static int __init custom_init(void) {
(KERN_INFO "Loaded write module");
printkunsigned long long int total = 0;
for (int it = 0; it < 128; it++) {
struct file * filp = filp_open("/home/sam/writespeed", O_CREAT | O_WRONLY, 0777);
void *buffer = kzalloc(1024, GFP_KERNEL | __GFP_HIGH | GFP_ATOMIC);
= ktime_get_ns();
u64 start = 0;
loff_t pos for (int i = 0; i < 100 * 1024; i++) {
(filp, buffer, 1024, &pos);
kernel_write}
= ktime_get_ns();
u64 end int diff = (end - start);
+= diff;
total (buffer);
kfree(filp, 0);
filp_close}
char *string = (char *) kzalloc(128, GFP_KERNEL | __GFP_HIGH | GFP_ATOMIC);
(string, KERN_INFO "microseconds: %lld", (total / 1000) / 128);
sprintf(string);
printk(string);
kfreereturn 0;
}
static void __exit custom_exit(void) {
(KERN_INFO "Exiting write module");
printk}
(custom_init);
module_init(custom_exit); module_exit
Something that I found interesting about writing kernel code was that
I couldn’t use libc. For example, gettimeofday
didn’t exist
while doing kernal development and neither did malloc
. I
had to use ktime_get_ns
to get the time in order to
benchmark performance and kzalloc
to allocate memory.
After running both of the benchmarks I found that I got about a 1.2x speedup from writing my code in kernel space.
Cool!