Clouds Need Clouds

In RIFT.io we talk about NFV in terms of cloud. How cloud is ubiquitous, limitless, and available. Cloud services are the same whether you are in Peoria or Timbuktu. They are there when you need them; for everyone on the evening commute or the insomniac at 4am. Cloud services don’t go away. Nobody shuts YouTube or Facebook down for a two-hour maintenance window.

While it may take some time for NFV to become a true cloud service like Facebook or YouTube, it is important to lay the proper groundwork now. Much of the focus of NFV today is on VNFs and network services. However, in order for NFV to become a true cloud service, it is also important that the NFV support infrastructure such as NFV management, orchestration, and test infrastructure also be cloud services.

Consider a VNF that consists of hundreds or even thousands of VMs (if you dare). Now consider a service chain consisting of ten or more of these VNFs. Would monolithic, centralized controllers be able to scale and manage VNFs and services of such size? And what happens if such controllers fail?

As the variety of SDN controller scaling efforts show (e.g. ONOS, ONIX, Logical xBar, Kandoo), it is important to move away from monolithic controller architectures and replace with distributed architectures that have superior scalability and fault tolerance characteristics.

Testing of cloud services is another issue. Many virtualized testing solutions today are built for compliance and functional testing. They leave performance and load testing to appliance based solutions. Yet cloud resources are plentiful and should be able to support load testing . The dilemma is that it requires a lot of VMs to support load testing, but it is difficult to instantiate, scale, and manage large numbers of VMs in a co-ordinated fashion. This forces service providers to have two solutions and two different ways of testing when one should do.

  • Now imagine if you will a world in which network operators can:
  • instantly bring up several large scale VNFs in a massive scale service chain,
  • test that end-to-end service in place using a virtualized test harness that can be easily chained into the network service,
  • perform functional, compliance, load, and SLA testing with a single test harness,
  • and automatically switch live traffic over to the new service chain

With this capability operators can turn out new services quicker in a more automated fashion. But a fundamental requirement is to have a scalable, extensible foundation.

For this reason RIFT.ware™ was designed to scale not only the VNFs themselves but also the cloud support systems. The elements of the RIFT.ware hyperscale engine such as the distributed management and distributed control plane are used not only by VNFs that are integrated with RIFT.ware but also by the lifecycle management functions as well. The lifecycle management function itself is architected as a distributed cloud service with support for hierarchical, parallel, and distributed instantiation and configuration of thousands of VMs simultaneously.

rift_layer-cake-design_002

Such divide-and-conquer techniques are common in cloud services (examples include YouTube’s video transcoding and Hadoop MapReduce) but less so in network functions. Perhaps this is a legacy of networks that evolved from massive, centrally controlled architectures. However there is nothing to prevent massive, distributed systems from being controlled, configured, and managed as a single entity. Granted it is difficult to do this — this is why technologies like Google Spanner create such a buzz when they are presented — which is where RIFT.ware comes in.

RIFT.ware is designed to provide the infrastructure necessary to allow network functions to scale. Picture what happens with a chassis based service when a chassis runs out of capacity: a new chassis needs to be deployed, new network connections need to be configured, and all neighboring systems also need to be reconfigured to know about the new chassis. If there are load balancers in between, those need to be configured as well. This is a deployment nightmare. Clearly this method of operation cannot carry over into the virtualized world. Instead, we must get in the business of creating highly scalable, singly managed virtual services.

RIFT.ware’s hyperscale capabilities are designed to provide the necessary infrastructure to achieve this goal. RIFT.ware borrows from the hierarchical, distributed architectures found in web scale services today. This division of labor approach can be used by VNFs as well as NFV orchestrators and managers. Laying the foundation well with a hyperscale architecture means service providers can “get it right” the first time and not worry about having to re-architect for scalability later. It also enables new possibilities such as VNFs that span geographies and cloud testing at scale on “in place” (live) networks service chains instead of in a controlled lab setting.

Without this foundation, network operators will not be able to deploy, manage, and test web scale virtual network services end-to-end, which is a necessary prerequisite for creating agile network services. This is why at RIFT.io we say that clouds need clouds.