11 August 2014

Some challenging questions… about SDN

CAP (Brewer’s) Theorem states that among the three desirable characteristics of a distributed computational system (Consistency, Availability and Partition tolerance), only 2 (but not all 3) can be obtained simultaneously. Given that routing/forwarding packets in a network is a computational problem, then the a network can be seen – from this perspective - as a distributed computational system...

Now, imagine in future SDN scenarios, the availability of programmable and scalable nodes (e.g., routers with computing and storage capabilities), developed starting from commodity hardware and allowing multiple Parties (from Service Providers to end Users) to program, install and execute their services just like network applications. These nodes ideally should be exploited at the edge, as core nodes require by far higher performances.

One may imagine this programmable edge node having multiple instances of execution environment and exposing an Application Programming Interface (API), which is required for the development of any network services applications. Then, one may guess also that there should be a sort of hypervisor capable of receiving each packet, extracting the most important relevant parameters (e.g., MAC/IP addresses, TCP/UDP ports, etc.) and delivering the packet to the execution environment which has to process it to execute some services. Obviously some other services (e.g. the control plane) could be executed on the Cloud.

In principle, all of this can be done with commodity hardware, so with very low cost. But there is one point of attention which has to be considered, in order to get really e-2-e ultra-low latencies: the performance of the execution, especially of of data plane applications (those applications - if any - which will make sense executing on the data plane). I/O operations might limit the expected performances. In fact, data plane packet processing involves moving data from an I/O device to system memory, classifying the data and then moving the data to a destination I/O device. General purpose hardware has been mainly engineered, not for that, but for instruction-bound processing (mainly based on computing instructions rather than I/O), which is mostly local.

If the goal is an ultra-low latency network-service infrastructure, there still unsolved and challenging questions little investigated: where/how executing network services and applications (on the data plane, logically distributed on blades, centralized in the Cloud) ? What’s the required type of Hardware (general purpose, specialized, hybrid) to optimize the costs (including energy consumption) for certain levels of performance ? How handing the CAP Theorem limitations ?