Kernel vs. User-Level Networking: Don't Throw Out the Stack with the Interrupts

Peter Cai and Martin Karsten

This paper reviews the performance characteristics of network stack processing for communication-heavy server applications. Recent literature often describes kernel-bypass and user-level networking as a silver bullet to attain substantial performance improvements, but without providing a comprehensive understanding of how exactly these improvements come about. We identify and quantify the direct and indirect costs of asynchronous hardware interrupt requests (IRQ) as a major source of overhead. While IRQs and their handling have a substantial impact on the effectiveness of the processor pipeline and thereby the overall processing efficiency, their overhead is difficult to measure directly when serving demanding workloads. This paper presents an indirect methodology to assess IRQ overhead by constructing preliminary approaches to reduce the impact of IRQs. While these approaches are not suitable for general deployment, their corresponding performance observations indirectly confirm the conjecture. Based on these findings, a small modification of a vanilla Linux system is devised that improves the efficiency and performance of traditional kernel-based networking significantly, resulting in up to 45% increased throughput without compromising tail latency. In case of server applications, such as web servers or Memcached, the resulting performance is comparable to using kernel-bypass and user-level networking when using stacks with similar functionality and flexibility.

ACM SIGMETRICS 2024
Proceedings of the ACM on Measurement and Analysis of Computing Systems, Volume 7, Issue 3, December 2023

Preprint
Supplementary Material

Notice

This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive version is published in ACM POMACS, https://doi.org/10.1145/3379483.