RoCEv2 Configuration and Verification

RDMA over Converged Ethernet version 2 (RoCEv2) enables direct memory access between servers without CPU overhead, delivering the ultra-low latency and high throughput required for GPU cluster interconnects and high-performance storage fabrics.

Supermicro Enterprise Advanced SONiC provides a streamlined RoCEv2 enablement command that automatically configures lossless buffer allocation, Priority Flow Control (PFC), WRED/ECN marking, and QoS scheduling policies optimized for RDMA traffic. For advanced configurations, refer to the Supermicro Enterprise Advanced SONiC User Guide.

Important: Enabling and Disabling RoCE requires a switch restart. You will be prompted with a warning; after you input Y, the configuration will be saved and the switch will be reloaded.

  1. Configure the switch to enable RoCEv2 with the default RoCEv2/ISCSI lossless buffer settings, as well as the default WRED/ECN, scheduling, and QoS map configurations that are defined for the switch.

    Leaf1(config)# roce enable {force-defaults}

  2. (VXLAN only) If you are configuring RoCEv2 in combination with VXLAN, you must set the QoS Mode of the VXLAN VTEP interface to uniform mode to copy the DSCP value from the inner header to the outer VXLAN header.

    Leaf1(config)# interface vxlan vtep[name]

    Leaf1(config-if-vtep1)# qos-mode uniform

  3. (VXLAN only) Similarly, to ensure ECN trims and marks the packet as expected in a VXLAN topology, you must configure a WRED policy and associate it to the VXLAN VTEP interface.

    Leaf1(config)# qos wred-policy <wred-policy-name>

    Leaf1(config-wred-wred-green)# green minimum-threshold <minimum-threshold-value> maximum-threshold <maximum-threshold-value> drop-probability <drop-probability-value>

    Leaf1(config-wred-wred-green)# ecn green

    !

    Leaf1(config)# interface vxlan vxlan-interface-name

    Leaf1(config-if-vtep1)# queue [0-7] wred-policy <wred-policy-name>

  4. After the switch reboots and the system status is ready, you can review and verify the default RoCE QoS policy and behavior.

    Leaf1# show qos map dscp-tc

    DSCP-TC-MAP: ROCE

    - - -

    - - -

    DSCP

    TC

    - - -

    - - -

    0

    0

    1

    0

    2

    0

    3

    0

    4

    4

    5

    0

    <Snipped>

    23

    0

    24

    3

    25

    0

    26

    3

    27

    0

    28

    0

    <Snipped>

    47

    0

    48

    6

    49

    0

    <Snipped>

    62

    0

    63

    0

    !

    Leaf1# show qos map dot1p-tc

    DOT1P-TC-MAP: ROCE

    - - -

    - - -

    DOT1P

    TC

    - - -

    - - -

    0

    0

    1

    0

    2

    0

    3

    3

    4

    4

    5

    0

    6

    0

    7

    0

    !

    ! These ingress traffic classes are assigned to RoCEv2 Priority Groups

    !

    Leaf1# show qos map tc-pg

    Traffic-Class-Priority-Group-MAP: ROCE

    - - -

    - - -

    TC

    PG

    - - -

    - - -

    0

    7

    1

    7

    2

    7

    3

    3

    4

    4

    5

    7

    6

    7

    7

    7

    - - -

    - - -

    ! Ingress traffic classes are assigned to egress queues 0, 3, 4, and 6. No front-panel ports are mapped to egress queues 1, 2, 5, and 7

    ! Traffic generated by switch CPU is sent using queue 7

    !

    Leaf1# show qos map tc-queue

    Traffic-Class-Queue-MAP: ROCE

    - - -

    - - -

    TC

    Queue

    - - -

    - - -

    0

    0

    1

    0

    2

    0

    3

    3

    4

    4

    5

    0

    6

    6

    7

    0

    !

    ! The PFC priority traffic is assigned to RoCEv2 PFC priority queues

    !

    Leaf1# show qos map pfc-priority-queue

    PFC-Priority-Queue-MAP: ROCE

    - - -

    - - -

    PFC Priority

    Queue

    - - -

    - - -

    0

    0

    1

    1

    2

    2

    3

    3

    4

    4

    5

    5

    6

    6

    7

    7

    !

    ! The default scheduler policy for WRED/ECN configures the PFC priority queues for RoCEv2 traffic

    !

    Leaf1# show qos scheduler-policy

    Scheduler Policy: ROCE

    Queue: 0

     

    type: dwrr

     

    weight: 50

    Queue: 3

     

    type: dwrr

     

    weight: 50

    Queue: 4

     

    type: dwrr

     

    weight: 50

    Queue: 6

     

    type: strict

    !

    ! The default WRED policy has a minimum and maximum threshold value, drop rate, and ECN traffic filter configured

    Note: The output will vary depending on your switch platform.

    Leaf1# show buffer pool

    egress_lossless_pool:

    size

    : 31617024

    type

    : egress

    mode

    : static

    egress_lossy_pool:

    size

    : 24320512

    type

    : egress

    mode

    : dynamic

    ingress_lossless_pool:

    size

    : 32157184

    type

    : ingress

    shared-headroom-size

    : 2621440

    mode

    : dynamic

    !

    ! Various buffer profiles are created and associated with an ingress or egress buffer pool

    ! This specifies reserved memory, static/dynamic thresholds, and optional pause/resume thresholds

    !

    ! By default, all switch interfaces are assigned to PFC priority groups with ingress buffer profiles

    !

    Leaf1# show buffer interface Ethernet all priority-group

    Interface

    priority-group

    Profile

    Ethernet0

    3-4

    pg_lossless_25000_40m_profile

    Ethernet0

    7

    ingress_lossy_profile

    Ethernet1

    3-4

    pg_lossless_25000_40m_profile

    Ethernet1

    7

    ingress_lossy_profile

    Ethernet2

    3-4

    pg_lossless_25000_40m_profile

    Ethernet2

    7

    ingress_lossy_profile

    <Snipped>

     

     

    !

    ! By default, all switch interfaces are assigned to egress queues with egress buffer profiles

    !

    Leaf1# show buffer interface Ethernet all queue

    Interface

    queue

    Profile

    CPU

    0-47

    egress_lossy_cpu_profile

    Ethernet0

    0-2,5-19

    egress_lossy_profile

    Ethernet0

    3-4

    egress_lossless_profile

    Ethernet1

    0-2,5-19

    egress_lossy_profile

    Ethernet1

    3-4

    egress_lossless_profile

    Ethernet2

    0-2,5-19

    egress_lossy_profile

    Ethernet2

    3-4

    egress_lossless_profile

    <Snipped>