I recently read Jérôme Petazzoni’s blog post about a tool called AppSwitch which made some Twitter waves on the busy interwebz. I was intrigued. It turns out that it was something that I was familiar with. When I met Dinesh Subhraveti back in 2015 at Linux Plumbers in Seattle, he had presented me with a grand vision of how applications needs to be free of any networking constraints and configurations and a uniform mechanism should evolve that make such configurations transparent (I’d rather say opaque now). There are layers overlayers of network related abstractions. Consider a simple network call made by a java application. It goes through multiple layers in userspace (though the various libs, all the way to native calls from JVM and eventually syscalls) and then multiple layers in kernel-space (syscall handlers to network subsytems and then to driver layers and over to the hardware). Virtualization adds 4x more layers. Each point in this chain does have a justifiable unique configuration point. Fair point. But from an application’s perspective, it feels like fiddling with the knobs all the time :
For example, we have of course grown around iptables and custom in-kernel and out of kernel load balancers and even enhanced some of them to exceptional performance (such as XDP based load balancing). But when it comes to data path processing, doing nothing at all is much better than doing something very efficiently. Apps don’t really have to care about all these myriad layers anyway. So why not add another dimension to this and let this configuration be done at the app level itself? Interesting.. 🤔
I casually asked Dinesh to see how far the idea had progressed and he ended up giving me a single binary and told me that’s it! It seems AppSwitch had been finally baked in the oven.
So there is a single static binary named
ax which runs as an executable as well as in a daemon mode. It seems AppSwitch is
distributed as a docker image as well though. I don’t see any kernel module (unlike what Jerome tested). This is definitely the
userspace version of the same tech.
I used the
ax docker image.
ax was both installed and running with one docker-run command.
$ docker run -d --pid=host --net=none -v /usr/bin:/usr/bin -v /var/run/appswitch:/var/run/appswitch --privileged docker.io/appswitch/ax
Based on the documentation, this little binary seems to do a lot — service discovery, load balancing, network segmentation etc. But I just tried the basic features in a single-node configuration.
Let’s run a Java webserver under
# ax run --ip 184.108.40.206 -- java -jar SimpleWebServer.jar
This starts the webserver and assigns the ip 220.127.116.11 to it. It’s like overlaying the server’s own IP configurations through
that all request are then redirected through 18.104.22.168. While idling, I didn’t see any resource consumption in the
ax daemon. If it was
monitoring system calls with auditd or something, I’d have noticed some CPU activity. Well, the server didn’t break, and when accessed
via a client run through
ax, it starts serving just fine.
# ax run --ip 22.214.171.124 -- curl -I 126.96.36.199 HTTP/1.0 500 OK Date: Wed Mar 28 00:19:25 PDT 2018 Server: JibbleWebServer/1.0 Content-Type: text/html Expires: Thu, 01 Dec 1994 16:00:00 GMT Content-Length: 58 Last-modified: Wed Mar 28 00:19:25 PDT 2018
Naaaice! 🙂 Why not try connecting with Firefox. Ok, wow, this works too!
I tried this with a Golang http server (Caddy) that is statically linked. If
ax was doing something like
LD_PRELOAD, that would trip it up.
This time I tried passing a name rather than the IP and ran it as regular user with a built-in
# ax run --myserver --user suchakra -- caddy -port 80 # ax run --user suchakra -- curl -I myserver HTTP/1.1 200 OK Accept-Ranges: bytes Content-Length: 0 Content-Type: text/html; charset=utf-8 Etag: "p6f4lv0" Last-Modified: Fri, 30 Mar 2018 19:25:07 GMT Server: Caddy Date: Sat, 31 Mar 2018 01:52:28 GMT
So no kernel module tricks, it seems. I guess this explains why Jerome called it “Network Stack from the future”. The future part here is applications and with predominant containerized deployments, the problems of microservices networking have really shifted near to the apps.
We need to get rid of the overhead caused by networking layers and frequent context switches happening as a single containerized app communicates with another one. AppSwitch could potentially just eliminate this all together and the communication would actually resemble traditional socket based IPC mechanisms with an advantage of a zero overhead read/write cost once the connection is established. I think I would want to test this out thoroughly sometime in the future if I get some time off from my bike trips 🙂
How does it work?
Frankly I don’t know in-depth, but I can guess. All applications, containerized or not, are just a bunch of executables linked to
libs (or built statically) running over the OS. When they need OS’s help, they ask. To understand an application’s behavior or to morph
it, OS can help us understand what is going on and provide interfaces to modify its behavior. Auditd for example, when configured,
can allow us to monitor every syscall from a given process. Programmable LSMs can be used to set per-resource policies through kernel’s
help. For performance observability, tracing tools have traditionally allowed an insight into what goes on underneath. In the world of
networking, we again take the OS’s help – routing and filtering strategies are still defined through iptables with some advances happening
in BPF-XDP. However, in the case of networking, calls such as
accept() could be intercepted purely in userspace as well.
But doing so robustly and efficiently without application or kernel changes with reasonable performance has been a hard academic problem
for decades . There must be some other smart things at work underneath in ax to keep this robust enough for all kinds of apps.
With interception problem solved, this would allow ax to create a map and actually perform the ‘switching’
part (which I suppose justifies the AppSwitch name). I have tested it presently on a Java, Go and a Python server. With network syscall
interception seemingly working fine, the data then flows like hot knife on butter. There may be some more features and techniques that
I may have missed though. Going through
ax --help it seems there are some options for egress, WAN etc, but I haven’t played it
with that much.
 Practical analysis of stripped binary code [link]
 Analyzing Dynamic Binary Instrumentation Overhead [link]
Original version of this post was here