I enjoy solving problems
20 February 2023
Recently I created a startup-esque minimum viable product and I ran into problems I needed to solve and technical challenges I needed to overcome. The challenges I faced while building this product were different to a lot of the problems I’ve confronted myself with in the past because these problems were real. I have a real need to make sure my server doesn’t run out of memory and can complete its jobs in time.
To set the scene, I’m repeatedly searching through text, representing both words and documents as embeddings in a vector space. I’d like the user to enter a search term and bre presented with the most relevant documents. The naive solution would be to loop through every single item in the corpus, compute a relevancy score for each document, sort the documents by this relevancy score then select the most relevant documents. I didn’t actually try this because I knew this was slow and there’s a better solution. The problem here is that my code is running on a dinky little machine with a small amount of memory and computational power. I had to increase the RAM on the machine because it could barely fit the Machine Learning models.
A better solution is maintaining a min-heap of the top results, I can now iterate through the documents and choose to insert elements into the min-heap when it has a better score than the current worst element. Asymptotically, this is really good and it worked well when I implemented it. The speed was good but you could notice that there was a delay in receiving results. This solution was good so I continued to build the rest of the product and decided to return to this later.
There were sysadmin tasks that I found quite fun to resolve. There was a mysterious issue where the server would periodically become unresponsive and solving it involved setting up nginx on my machine. Some of the small tasks were also really fun to carry out. I noticed that I was aware of details such as the start-up time, response time, memory usage of my web-server because they mattered; the server used to run out of memory on startup and it wouldn’t be able to complete requests.
I love vectors
So, there’s a much better solution for returning the best results and it involves a bit of linear algebra. In essence, I pre-compute the embeddings for the documents and combine them to create a tensor. Calculating every single relevancy metric can now all be done in one go as a single vectorisable tensor operation using einstein summation in numpy. I can now partition the results in \(O(n)\) to get the top results. The best part here is that I can now invoke highly-optimised C code, making my code even faster than before.