AppSwitch

Kubernetes, Istio and Network Function Devirtualization with AppSwitch

Root causing the networking complexity of Kubernetes
Authored by Dinesh Subhraveti

“Kubernetes Networking is Hard”

Kubernetes networking is considered complex. It introduces several new constructs and concepts in order to provide basic connectivity to containerized applications on the cluster. It’s been several years at this point but the perception of complexity continues to persist. Kubernetes also places specific, sometimes hard-to-meet, requirements on the underlying network in order to ensure mutual reachability among applications running as pods and between pods and entities outside the cluster. At a high level, the approach to make that work involves making pods look like independent hosts on the network. There are a few ways of achieving this but they require the architecture of the backend network to be reconsidered to accommodate the new set of constraints. What normally required at most a dozen IP addresses may now require couple hundred. Or new pieces of infrastructure such as network overlay controllers and NAT gateways may be required with their attendant complexity and operational burden. Public cloud vendors offering Kubernetes service made substantial investments to accommodate these requirements. However typical on-prem Kubernetes environments may not be able to absorb that level of change.

First Principles

If we step back and peel the layers of Kubernetes and containers all the way back to the bare application, all of this networking complexity exists to address two new application-level problems introduced by distributed applications running on a shared cluster.

  1. Port conflicts: Two services running on a node cannot bind to the same port
  2. IP preservation: When a service is rescheduled to a different node, it becomes unreachable at its former IP address

(Refer to this and this for a deeper analysis about the nature of problems associated with identifiers in general beyond ports and IP addresses)

Implementation details such as pods and service objects aside, conceptually the key aspect of Kubernetes’ networking solution involves identifying applications based on IP addresses within the network such that they are qualified as individual network hosts even though they are simply processes running on the OS. It does solve the two problems above. Since each application would expose its service over its own IP address, there cannot be port conflicts with other applications. And since the application has its own IP address, it is no longer tied to the IP address of the node. When it moves to a different node, it would be possible to reference the application with that same IP address.

However there is a fundamental problem with using IP addresses as application identifiers – IP addresses are not designed to be application identifiers. IP addresses are designed to identify network hosts. Applications have very different properties and requirements than network hosts. Identifying applications with IP addresses by making every application into a network host inevitably imposes extraordinary burden on the underlying network. There are too many applications and their rate of churn is too high.

Problem Originates with the Network Namespace

If IP addresses are not the right identifiers for applications, then why did Kubernetes choose them to identify applications and pods? The truth is that Kubernetes did not make that choice. It was forced by the design of container networking itself. Particularly, the way network namespaces are designed. Network namespaces fundamentally depart from the container model. In contrast to other namespaces, where the underlying physical resource is mapped to an application level abstraction, a network namespace directly maps network devices into the container. The storage resource, for example, is exposed by mount namespace as a file system, not a block device. PID namespace effectively represents the CPU resources as processes and threads. In contrast, network namespace is essentially a namespace of network devices rather than application level abstractions like sockets.

Network Namespace

Containers don’t abstract network resources to application level

The design of network namespaces not only forces applications to be identified by IP addresses, the device abstraction they adopt forces the heaviness associated with physical networking artifacts all the way into the application layers. A process running on the host has ready access to network but once placed in a network namespace, it suddenly loses all connectivity. Elaborate mechanisms and tooling are then required just to plumb the network that already exists on the host into the network namespace. So much time has been spent in debating and defining the right interface to perform the steps required to plumb container network namespace into host’s network through external plugins. If there was a way for the containerized application to safely and directly access host’s network, all that complexity would not have been needed.

Approach to the Solution

Coming back to the two main problems to be solved, if we conclude that IP addresses are not good identifiers for applications then what are? Well, ports. Ports are designed to distinguish multiple application services sharing the same host.

Let us reconsider the port conflict problem. The problem goes away if somehow a conflicting second service binds to a different available port and then somehow all its clients know to reach the server at that new port. Likewise, let’s say a server moves to a different node but then all its clients automatically redirect themselves to the server’s new location. Then the service remains reachable. AppSwitch provides a simple, efficient and transparent mechanism to do that.

AppSwitch

AppSwitch transparently tracks the network system calls made by the applications and facilitates mutual discovery and connectivity among them. It does so through a logically centralized data structure called service table that maintains a record of all currently running services across the cluster. The table is automatically updated as services come and go. No change to applications is required. When an application calls listen() system call (thereby becoming a service), AppSwitch automatically records the emergence of the new service in the service table along with a set of system and user-defined attributes that includes a virtual IP address. Clients can then discover and reference the services in the table using the virtual IP addresses associated with them independent of the IP addresses and ports where those services are actually running. That occurs transparently as well. When a client calls connect() system call to reach a service, AppSwitch appropriately directs the connection to the right service based on access control and traffic management policies.

Ax-K8S

AppSwitch removes Kubernetes’ dependence on the backend network

The ability to transparently tap into the API between applications and network is extremely powerful. In addition to automated service discovery described above, AppSwitch is able to provide a variety of network functions and capabilities such as access control and traffic management, all at the same API boundary between applications and the underlying network. In essence, similar to the way containers have removed compute virtualization, AppSwitch devirtualizes the applications from a network perspective by allowing them to run directly on the host. It provides a simple and effective mechanism to provide an entire gamut of network functions without the cost and complexity of the layers of network virtualization otherwise required.

Demo Time

AppSwitch is integrated with Kubernetes as a DaemonSet. AppSwitch is also integrated with Istio to serve as its dataplane through an agent that consumes Pilot (XDS) API and conveys traffic management policies to AppSwitch. The following walk-through demonstrates how all of this works in the context of a Kubernetes environment.

The environment shown below consists of two nodes (a master and a minion) with AppSwitch CNI plugin and AppSwitch daemonset. No other AppSwitch related components are installed either on this Kubernetes cluster or outside as part of the infrastructure. No requirements are placed on the underlying network, except that the nodes can reach each other. You can see the AppSwitch daemonset and the Pilot agent along with the standard Istio components.

[root@ax-istio-test-new-1 ~]# kubectl get pods  --all-namespaces
NAMESPACE      NAME                                      READY     STATUS    RESTARTS   AGE
default        ax-pilot-agent-65f49855f8-xqrmb           1/1       Running   12         3d
istio-system   istio-ca-59f6dcb7d9-v9gt2                 1/1       Running   11         38d
istio-system   istio-ingress-779649ff5b-pdk6x            1/1       Running   12         38d
istio-system   istio-mixer-7f4fd7dff-lvqn9               3/3       Running   33         38d
istio-system   istio-pilot-5f5f76ddc8-7wdjw              2/2       Running   22         38d
kube-system    etcd-209.205.217.151                      1/1       Running   6          39d
kube-system    kube-apiserver-209.205.217.151            1/1       Running   6          39d
kube-system    kube-appswitch-55286                      1/1       Running   13         10d
kube-system    kube-appswitch-kfdfb                      1/1       Running   7          10d
kube-system    kube-controller-manager-209.205.217.151   1/1       Running   7          39d
kube-system    kube-dns-6f4fd4bdf-nghqn                  3/3       Running   18         39d
kube-system    kube-flannel-ds-752x4                     1/1       Running   6          39d
kube-system    kube-flannel-ds-pv6nh                     1/1       Running   13         39d
kube-system    kube-proxy-w87t7                          1/1       Running   11         39d
kube-system    kube-proxy-zmwcz                          1/1       Running   6          39d
kube-system    kube-scheduler-209.205.217.151            1/1       Running   7          39d
kube-system    kubernetes-dashboard-545f866c5-7ssb5      1/1       Running   6          39d

AppSwitch daemons running one per node form a cluster and the information about that cluster can be obtained by querying AppSwitch’s REST API. AppSwitch daemon and the client to access its REST API are integrated into one static binary called ‘ax’. It supports a variety of options. The following command shows the stat of the AppSwitch cluster.

[root@ax-istio-test-new-1 ~]# ax get nodes
          NAME               DATACENTER      IP      EXTERNALIP    ROLE     APPCOUNT  
----------------------------------------------------------------------------------------
  ax-istio-test-new-1.novalocal  appswitch   10.0.0.144              [compute]  0         
  ax-istio-test-new-2.novalocal  appswitch   10.0.0.145              [compute]  0         

The following command shows that there are no applications currently running under AppSwitch.

[root@ax-istio-test-new-1 ~]# ax get apps
[root@ax-istio-test-new-1 ~]#

Let’s deploy the standard bookinfo application. Normally deploying this with Istio requires injection of the Envoy side car container into each pod of the application. In the case of AppSwitch however, there are no side cars and so no injection is required. AppSwitch natively performs traffic management without being in the data path. The picture below illustrates how AppSwitch integrates with Istio:

Istio-AppSwitch

[root@ax-istio-test-new-1 ~]# kubectl create -f bookinfo.yaml 
service "details" created
deployment "details-v1" created
service "ratings" created
deployment "ratings-v1" created
service "reviews" created
deployment "reviews-v1" created
deployment "reviews-v2" created
deployment "reviews-v3" created
service "productpage" created
deployment "productpage-v1" created
ingress "gateway" created
[root@ax-istio-test-new-1 ~]# 

After several seconds, all the pods should be up and running.

[root@ax-istio-test-new-1 ~]# kubectl get pods  -o wide
NAME                              READY     STATUS    RESTARTS   AGE       IP              NODE
ax-pilot-agent-65f49855f8-xqrmb   1/1       Running   12         3d        10.244.0.34     209.205.217.151
details-v1-6767686b4c-w472z       1/1       Running   0          32s       10.33.73.2      209.205.217.153
productpage-v1-74fdf76df-q9x6g    1/1       Running   0          31s       10.72.124.102   209.205.217.153
ratings-v1-677bb48699-7t5kr       1/1       Running   0          32s       10.0.158.149    209.205.217.153
reviews-v1-6b8d75888-mj259        1/1       Running   0          32s       10.96.253.245   209.205.217.153
reviews-v2-6fc48bc48b-wtgg4       1/1       Running   0          31s       10.132.244.56   209.205.217.151
reviews-v3-947778468-gd2n8        1/1       Running   0          31s       10.154.111.67   209.205.217.153

The IP addresses seen for these pods are assigned by AppSwitch CNI plugin. Note that those IP addresses are merely references to the application. There is no real interfaces backing those IP addresses as it is the case for regular pod IP addresses provisioned by other CNI plugins. Those IP addresses are “reachable” however from other pods.

The following command shows AppSwitch’s view of the applications running under it at this point.

[root@ax-istio-test-new-1 ~]# ax get apps
           NAME                        APPID                   NODEID              DATACENTER      APPIP      DRIVER     LABELS          ZONES       
-----------------------------------------------------------------------------------------------------------------------------------------------------------
  default-productpage-v1-74fdf76df-q9x6g  abfcdb77c1223a80  ax-istio-test-new-2.novalocal  appswitch   10.72.124.102          zone=default  [zone==default]  
  default-reviews-v2-6fc48bc48b-wtgg4     abfccec7c44e3a80  ax-istio-test-new-1.novalocal  appswitch   10.132.244.56          zone=default  [zone==default]  
  default-ratings-v1-677bb48699-7t5kr     effcc711783bce00  ax-istio-test-new-2.novalocal  appswitch   10.0.158.149           zone=default  [zone==default]  
  default-details-v1-6767686b4c-w472z     67fcc712143ba700  ax-istio-test-new-2.novalocal  appswitch   10.33.73.2             zone=default  [zone==default]  
  default-reviews-v3-947778468-gd2n8      23fcc712623b9380  ax-istio-test-new-2.novalocal  appswitch   10.154.111.67          zone=default  [zone==default]  
  default-reviews-v1-6b8d75888-mj259      effcdb7773224e00  ax-istio-test-new-2.novalocal  appswitch   10.96.253.245          zone=default  [zone==default]  

The bookinfo application contains a Kubernetes service spec for the reviews service. The definition of that service is picked up by Istio Pilot, which in turn is read by AppSwitch Pilot Agent and conveyed to AppSwitch. AppSwitch then creates an internal object called vservice that maps the service IP address to a set of pod IP addresses. Please checkout AppSwitch documentation for details of vservice API. The following command shows the vservice objects currently in place. Particularly, it shows the load balancing strategy applied to each one, which is RoundRobin by default.

[root@ax-istio-test-new-1 ~]# ax get vservices
    VSNAME       VSTYPE        VSIP                VSBACKENDIPS           
------------------------------------------------------------------------
  details      RoundRobin  10.101.244.16  [ 10.33.73.2]                   
  ratings      RoundRobin  10.106.159.18  [ 10.0.158.149]                 
  productpage  RoundRobin  10.99.73.39    [ 10.72.124.102]                
  reviews      RoundRobin  10.103.13.137  [ 10.132.244.56 10.154.111.67   
                      10.96.253.245]                  

Now let’s make this application externally accessible by creating an external vservice. That would essentially expose the specified application IP and port on the specified port on all nodes in the cluster (similar to Kubernetes NodePort). It can be done with ‘ax create vservice’ CLI or curling AppSwitch’s REST endpoint directly as follows:

[root@ax-istio-test-new-1 ~]# curl -X POST -H 'Content-Type: application/json' -d '{"name":"external","ip": "5.5.5.5","type":"Random","backendips":["10.72.124.102"],"extports":[{"extport":31111, "appport":9080}]}' http://localhost:6664/appswitch/v1/oper/createvirtualservice
{
  "name": "external",
  "result": true,
  "details": "Success",
  "portmap": null
}

Now the app should be reachable on the requested port (31111 in this case) on all of the nodes in the cluster. Doing a curl under watch shows that the backend pods of the reviews service are hit in a random order as per the default behavior.

[root@ax-istio-test-new-1 ~]# watch -n 0.5 curl -s http://10.0.0.145:31111/productpage | grep -A 10 Reviewer1

The load balancer type can be changed by modifying Pilot’s policy spec. Here’s how the new policy spec would look with that change.

kind: DestinationPolicy
metadata:
  name: reviews-random
spec:
  destination:
    name: reviews
  loadBalancing:
    name: RANDOM

The policy spec can be applied with istioctl.

[root@ax-istio-test-new-1 ~]# istio/istio-0.6.0/bin/istioctl create -f destination-policy-reviews-random.yaml 
Created config destination-policy//reviews-random at revision 4754401

The new policy can be seen in be in effect by querying AppSwitch vservices. The load balancer type now shows up as Random for reviews service and also for the external ports where it is exposed.

[root@ax-istio-test-new-1 ~]# ax get virtualservices
    VSNAME       VSTYPE        VSIP                VSBACKENDIPS           
------------------------------------------------------------------------
  ratings      RoundRobin  10.106.159.18  [ 10.0.158.149]                 
  productpage  RoundRobin  10.99.73.39    [ 10.72.124.102]                
  reviews      Random      10.103.13.137  [ 10.132.244.56 10.154.111.67   
                      10.96.253.245]                  
  external     Random      5.5.5.5        [10.72.124.102]                 
  details      RoundRobin  10.101.244.16  [ 10.33.73.2]                   

The same can also be verified by looking at Kubernetes CRD directly.

[root@ax-istio-test-new-1 ~]# 
[root@ax-istio-test-new-1 ~]# kubectl get destinationpolicy
NAME             AGE
reviews-random   11s

With that change in effect, doing a curl under watch would now show that the pods running the reviews service are hit in random order.

[root@ax-istio-test-new-1 ~]# curl -s http://10.0.0.145:31111/productpage | grep -A 10 Reviewer1

Published by in general and tagged istio and kubernetes using 2239 words.