Introduction

- Networks continue to increase bandwidths without achieving much latency reduction.
- Latency is particularly important in data center networks.
- In-network computing brings network computation closer to its use.
- We develop P4DNS using P4→NetFPGA
  - 52x throughput improvement and 100x latency reduction over NSD
  - Identify areas where P4 is ill-suited for developing traditional applications on an FPGA.
Architecture

Data Plane (P4) + Control Plane (Python)

Packet Checks → Packet

DNS Request → DNS

Non DNS Request → Switch

Switch → Output Packet

DNS Response or Recursive Request → Control Plane

Control Plane → Host Processor

Ethernet → IP → UDP → DNS

DNS Request (64B) → Accept
Architecture

Data Plane (P4) + Control Plane (Python)

Packet Checks

DNS Request or Recursive Request

Switch

Output Packet

Ethernet → IP → UDP → DNS → Accept

DNS Request (64B)

DNS Request (65B)
Architecture

Data Plane (P4) + Control Plane (Python)

Packet Checks → Data Plane

Packet Checks → Non DNS Request → Switch → Output Packet

Packet Checks → DNS Request → DNS → Switch → Output Packet

Packet Checks → DNS Response or Recursive Request → Control Plane

Control Plane → Host Processor
Architecture: Control Plane

- Functionality:
  - Recursive requests
  - Cache updates
  - TTL updates

- Multi-threaded python running on a CPU
Design Lessons: Hardware for Traditional Protocols

- Control plane is a bottleneck:
  - Protocols with mutable state tax this bottleneck.
- Existing protocols are designed for software:
  - DNS uses C-style strings.
    - String length is not clear until you have reached the last character.
Design Lessons: Hardware for Traditional Protocols

- Control plane is a bottleneck:
  - Protocols with mutable state tax this bottleneck.
- Existing protocols are designed for software:
  - DNS uses C-style strings.
    - String length is not clear until you have reached the last character.

But, partial implementations can work:
- P4DNS achieves 52x throughput improvement and 100x latency improvement.
P4 on Hardware Limitations

- Field length limitations: 384 bits.
- Complex parsing state machines used excessive hardware resources on FPGAs.
P4 on Hardware Limitations

- Field length limitations: 384 bits.
- Complex parsing state machines used excessive hardware resources on FPGAs.

- For many applications, a simple bitstream is enough
- FPGAs remove some advantages (recursion) of state machines.
We implemented P4DNS, a DNS accelerator integrated into a P4 switch using P4→NetFPGA.

We demonstrated potential for large performance improvement without changing existing protocols.

But P4 is not without limitations for hardware targets.