RoCEv2 Configuration and Verification
RDMA over Converged Ethernet version 2 (RoCEv2) enables direct memory access between servers without CPU overhead, delivering the ultra-low latency and high throughput required for GPU cluster interconnects and high-performance storage fabrics.
Supermicro Enterprise Advanced SONiC provides a streamlined RoCEv2 enablement command that automatically configures lossless buffer allocation, Priority Flow Control (PFC), WRED/ECN marking, and QoS scheduling policies optimized for RDMA traffic. For advanced configurations, refer to the Supermicro Enterprise Advanced SONiC User Guide.
Important: Enabling and Disabling RoCE requires a switch restart. You will be prompted with a warning; after you input Y, the configuration will be saved and the switch will be reloaded.
-
Configure the switch to enable RoCEv2 with the default RoCEv2/ISCSI lossless buffer settings, as well as the default WRED/ECN, scheduling, and QoS map configurations that are defined for the switch.
Leaf1(config)# roce enable {force-defaults}
-
(VXLAN only) If you are configuring RoCEv2 in combination with VXLAN, you must set the QoS Mode of the VXLAN VTEP interface to uniform mode to copy the DSCP value from the inner header to the outer VXLAN header.
Leaf1(config)# interface vxlan vtep[name]
Leaf1(config-if-vtep1)# qos-mode uniform
-
(VXLAN only) Similarly, to ensure ECN trims and marks the packet as expected in a VXLAN topology, you must configure a WRED policy and associate it to the VXLAN VTEP interface.
Leaf1(config)# qos wred-policy <wred-policy-name>
Leaf1(config-wred-wred-green)# green minimum-threshold <minimum-threshold-value> maximum-threshold <maximum-threshold-value> drop-probability <drop-probability-value>
Leaf1(config-wred-wred-green)# ecn green
!
Leaf1(config)# interface vxlan vxlan-interface-name
Leaf1(config-if-vtep1)# queue [0-7] wred-policy <wred-policy-name>
-
After the switch reboots and the system status is ready, you can review and verify the default RoCE QoS policy and behavior.
Leaf1# show qos map dscp-tc
DSCP-TC-MAP: ROCE
- - -
- - -
DSCP
TC
- - -
- - -
0
0
1
0
2
0
3
0
4
4
5
0
<Snipped>
23
0
24
3
25
0
26
3
27
0
28
0
<Snipped>
47
0
48
6
49
0
<Snipped>
62
0
63
0
!
Leaf1# show qos map dot1p-tc
DOT1P-TC-MAP: ROCE
- - -
- - -
DOT1P
TC
- - -
- - -
0
0
1
0
2
0
3
3
4
4
5
0
6
0
7
0
!
! These ingress traffic classes are assigned to RoCEv2 Priority Groups
!
Leaf1# show qos map tc-pg
Traffic-Class-Priority-Group-MAP: ROCE
- - -
- - -
TC
PG
- - -
- - -
0
7
1
7
2
7
3
3
4
4
5
7
6
7
7
7
- - -
- - -
! Ingress traffic classes are assigned to egress queues 0, 3, 4, and 6. No front-panel ports are mapped to egress queues 1, 2, 5, and 7
! Traffic generated by switch CPU is sent using queue 7
!
Leaf1# show qos map tc-queue
Traffic-Class-Queue-MAP: ROCE
- - -
- - -
TC
Queue
- - -
- - -
0
0
1
0
2
0
3
3
4
4
5
0
6
6
7
0
!
! The PFC priority traffic is assigned to RoCEv2 PFC priority queues
!
Leaf1# show qos map pfc-priority-queue
PFC-Priority-Queue-MAP: ROCE
- - -
- - -
PFC Priority
Queue
- - -
- - -
0
0
1
1
2
2
3
3
4
4
5
5
6
6
7
7
!
! The default scheduler policy for WRED/ECN configures the PFC priority queues for RoCEv2 traffic
!
Leaf1# show qos scheduler-policy
Scheduler Policy: ROCE
Queue: 0
type: dwrr
weight: 50
Queue: 3
type: dwrr
weight: 50
Queue: 4
type: dwrr
weight: 50
Queue: 6
type: strict
!
! The default WRED policy has a minimum and maximum threshold value, drop rate, and ECN traffic filter configured
Note: The output will vary depending on your switch platform.
Leaf1# show buffer pool
egress_lossless_pool:
size
: 31617024
type
: egress
mode
: static
egress_lossy_pool:
size
: 24320512
type
: egress
mode
: dynamic
ingress_lossless_pool:
size
: 32157184
type
: ingress
shared-headroom-size
: 2621440
mode
: dynamic
!
! Various buffer profiles are created and associated with an ingress or egress buffer pool
! This specifies reserved memory, static/dynamic thresholds, and optional pause/resume thresholds
!
! By default, all switch interfaces are assigned to PFC priority groups with ingress buffer profiles
!
Leaf1# show buffer interface Ethernet all priority-group
Interface
priority-group
Profile
Ethernet0
3-4
pg_lossless_25000_40m_profile
Ethernet0
7
ingress_lossy_profile
Ethernet1
3-4
pg_lossless_25000_40m_profile
Ethernet1
7
ingress_lossy_profile
Ethernet2
3-4
pg_lossless_25000_40m_profile
Ethernet2
7
ingress_lossy_profile
<Snipped>
!
! By default, all switch interfaces are assigned to egress queues with egress buffer profiles
!
Leaf1# show buffer interface Ethernet all queue
Interface
queue
Profile
CPU
0-47
egress_lossy_cpu_profile
Ethernet0
0-2,5-19
egress_lossy_profile
Ethernet0
3-4
egress_lossless_profile
Ethernet1
0-2,5-19
egress_lossy_profile
Ethernet1
3-4
egress_lossless_profile
Ethernet2
0-2,5-19
egress_lossy_profile
Ethernet2
3-4
egress_lossless_profile
<Snipped>