## Segment Routing at 100G Using FPGA Smart NIC and P4

Viktor Puš, Petr Kaštovský<sup>1</sup>, Michal Kekely {pus, kastovsky, kekely}@netcope.com Netcope Technologies, a.s. Sochorova 3232 616 00 Brno Czech Republic Pavel Benáček benacek@cesnet.cz CESNET, a.l.e. Zikova 4 160 00 Prague 6 Czech Republic

## Abstract:

Service Function Chaining (SFC) is a process of passing network traffic among individual typically virtualized network functions in NFV and SDN infrastructures. The typical SFC can be a series of IDS/IPS, firewalls, WAN optimizers and load balancers that the traffic needs to go through on its way from client to server and vice versa.

One approach to support SFC in network function virtualization infrastructure (NFVI) is to use Segment routing, in particular, the IPv6-based Segment routing (SRv6). This technology is now becoming very attractive and deployed in large networks of the future as demonstrated by SoftBank's recent joint announcement with Cisco as well as other deployments. This raised our attention first during P4.org workshop in May 2017 where teams from Bell Canada, Cisco Systems and Barefoot Networks presented the concept of The Extensible Network - Evolution in Protocol and Data Plane Agility and explained the benefits of SRv6 for SFC.

In our demo we focus on demonstrating the ability to quickly develop SRv6 acceleration using an FPGA-based hardware accelerator and P4 programming language. Similarly to accelerating SRv6, other applications can be accelerated using this approach. Good candidates being processing nodes of Vector Packet Processing (VPP).

In order to perform segment routing, SRv6 router goes through the Segment List in Segment Routing Header (SRH) and uses Segments Left field as an index of the active segment that is copied over to Destination Address of IPv6 header. We accelerate this data plane operations to demonstrate the productivity and flexibility of P4 language combined with FPGAs. We use Netcope P4 Cloud compiler service to generate FPGA firmware bitstream from the P4 description. The compiler was initially built for Xilinx Virtex-7 based Netcope board, but now the cloud service also supports (or will support in near future) newer UltraScale+ and Arria 10 based boards from Netcope and other hardware vendors. In case of this demo, we decided to use the NFB-2002QL card which has two 100 Gbps Ethernet ports and is equipped with a powerful Xilinx UltraScale+ VU7P FPGA. After compiling the firmware, it is tested for both correctness and performance. Wireshark is used to compare packets before and after (see Fig. 1). As for performance, simple command-line script shows throughput of almost 100 Gbps (Fig. 2).

## Technical requirements:

large table; at least two power plugs; LCD screen with VGA connector; possibly a poster stand

<sup>&</sup>lt;sup>1</sup> Presenting author



Figure 1: Comparing packets before and after SRv6 node processing.

Every 1.0s: cat /bin/stats.txt

| P4 performance | *******                                                                                        | *****                                                                                                                          |
|----------------|------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------|
| INTERFACE      |                                                                                                |                                                                                                                                |
| Frames         | :                                                                                              | 881181391                                                                                                                      |
| Bytes          | :                                                                                              | 312819393805                                                                                                                   |
| Data rate      | :                                                                                              | 94.712 Gbps                                                                                                                    |
| INTERFACE      |                                                                                                |                                                                                                                                |
| Frames         | 2                                                                                              | 881182465                                                                                                                      |
| Bytes          | :                                                                                              | 312819775075                                                                                                                   |
| Data rate      | :                                                                                              | 94.713 Gbps                                                                                                                    |
|                | P4 performance<br>INTERFACE<br>Bytes<br>Data rate<br>INTERFACE<br>Frames<br>Bytes<br>Data rate | P4 performance *********<br>INTERFACE<br>Frames :<br>Bytes :<br>Data rate :<br>INTERFACE<br>Frames :<br>Bytes :<br>Data rate : |

Figure 2: SRv6 processing performance measurement.