Seeing dhcp packet drops in latest master

Chinnapaiyan, Nagamani Nagamani.Chinnapaiyan at viasat.com
Thu Aug 13 15:11:10 CEST 2020


We are seeing packet drops frequently in latest 4.x version(#8c2e4122572a12f43d4ec294d3258a2f0da07b1d). We are not doing scale test, just ~10 DORAs. If I run some 10 testcases(10 DORAs each test case), it fails randomly at any one of the tests. The same testcases were passing around one month before.



We have Client -> Freeradius dhcp Relay agent(Load Balancer) -> freeradius dhcp server.



I could not capture issue clearly with debug mode as it is slower. So I enabled some of the debug logs in the code.

----------------

diff --git a/src/lib/io/network.c b/src/lib/io/network.c

index 7276606..3850725 100644

--- a/src/lib/io/network.c

+++ b/src/lib/io/network.c

@@ -684,6 +684,7 @@ static void fr_network_read(UNUSED fr_event_list_t *el, int sockfd, UNUSED int f

                                s->listen->fd, sockfd)) return;

        DEBUG3("Reading data from FD %u", sockfd);

+       ERROR("DEBLOGJK: Reading data from FD %u", sockfd);

        if (!s->cd) {

                cd = (fr_channel_data_t *) fr_message_reserve(s->ms, s->listen->default_message_size);

@@ -751,6 +752,7 @@ next_message:

        s->cd = NULL;

        DEBUG3("Read %zd byte(s) from FD %u", data_size, sockfd);

+       ERROR("DEBLOGJK: Read %zd byte(s) from FD %u", data_size, sockfd);

        nr->stats.in++;

        s->stats.in++;

diff --git a/src/modules/proto_dhcpv4/proto_dhcpv4_udp.c b/src/modules/proto_dhcpv4/proto_dhcpv4_udp.c

index 89e7ac9..38736fe 100644

--- a/src/modules/proto_dhcpv4/proto_dhcpv4_udp.c

+++ b/src/modules/proto_dhcpv4/proto_dhcpv4_udp.c

@@ -159,6 +159,7 @@ static ssize_t mod_read(fr_listen_t *li, void **packet_ctx, fr_time_t *recv_time

                             &address->src_ipaddr, &address->src_port,

                             &address->dst_ipaddr, &address->dst_port,

                             &address->if_index, recv_time_p);

+  ERROR("DEBLOGJK: proto_dhvpv4_udp");

        if (data_size < 0) {

                DEBUG2("proto_dhvpv4_udp got read error %zd: %s", data_size, fr_strerror());

                return data_size;

@@ -203,6 +204,9 @@ static ssize_t mod_read(fr_listen_t *li, void **packet_ctx, fr_time_t *recv_time

        DEBUG2("proto_dhcpv4_udp - Received %s XID %08x length %d %s",

               dhcp_message_types[message_type], xid,

               (int) packet_len, thread->name);

+       ERROR("DEBLOGJK: proto_dhcpv4_udp - Received %s XID %08x length %d %s",

+              dhcp_message_types[message_type], xid,

+              (int) packet_len, thread->name);

        return packet_len;

}

-------------------



Below are tcpdump and radiusd logs in success case and failure cases.



normal time (success case):

Client side:(tcpdump)

12:12:36.039499 IP 10.43.18.32.67 > 10.43.18.31.67: BOOTP/DHCP, Request from 00:a0:bc:6c:7d:3a, length 311

12:12:36.106401 IP 10.43.18.48.67 > 10.43.18.32.67: BOOTP/DHCP, Reply, length 768

12:12:36.107826 IP 10.43.18.32.67 > 10.43.18.31.67: BOOTP/DHCP, Request from 00:a0:bc:6c:7d:3a, length 323

12:12:36.113856 IP 10.43.18.48.67 > 10.43.18.32.67: BOOTP/DHCP, Reply, length 768

12:12:36.439471 IP 10.43.18.32.67 > 10.43.18.31.67: BOOTP/DHCP, Request from 00:a0:bc:6c:7d:3b, length 311

12:12:36.442614 IP 10.43.18.48.67 > 10.43.18.32.67: BOOTP/DHCP, Reply, length 328

12:12:36.442895 IP 10.43.18.32.67 > 10.43.18.31.67: BOOTP/DHCP, Request from 00:a0:bc:6c:7d:3b, length 323

12:12:36.445647 IP 10.43.18.48.67 > 10.43.18.32.67: BOOTP/DHCP, Reply, length 328



Relay Agent side:(tcpdump)

12:12:36.041608 IP 10.43.18.32.67 > 10.43.18.31.67: BOOTP/DHCP, Request from 00:a0:bc:6c:7d:3a, length 311

12:12:36.041851 IP 10.43.18.31.67 > 10.43.18.48.67: BOOTP/DHCP, Request from 00:a0:bc:6c:7d:3a, length 312

12:12:36.109916 IP 10.43.18.32.67 > 10.43.18.31.67: BOOTP/DHCP, Request from 00:a0:bc:6c:7d:3a, length 323

12:12:36.110206 IP 10.43.18.31.67 > 10.43.18.48.67: BOOTP/DHCP, Request from 00:a0:bc:6c:7d:3a, length 324

12:12:36.441579 IP 10.43.18.32.67 > 10.43.18.31.67: BOOTP/DHCP, Request from 00:a0:bc:6c:7d:3b, length 311

12:12:36.441894 IP 10.43.18.31.67 > 10.43.18.48.67: BOOTP/DHCP, Request from 00:a0:bc:6c:7d:3b, length 312

12:12:36.444965 IP 10.43.18.32.67 > 10.43.18.31.67: BOOTP/DHCP, Request from 00:a0:bc:6c:7d:3b, length 323

12:12:36.445130 IP 10.43.18.31.67 > 10.43.18.48.67: BOOTP/DHCP, Request from 00:a0:bc:6c:7d:3b, length 324



DHCP server side: (tcpdump + enabled radiusd logs)

12:12:36.041788 IP 10.43.18.31.67 > 10.43.18.48.67: BOOTP/DHCP, Request from 00:a0:bc:6c:7d:3a, length 312  <<tcpdump line

Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30931]: Network 0 - DEBLOGJK: Reading data from FD 51  <<< added error logs

Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30931]: DEBLOGJK: proto_dhvpv4_udp

Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30931]: DEBLOGJK: proto_dhcpv4_udp - Received DHCP-Discover XID 00000000 length 312 dhcpv4_udp server 10.43.18.48 port 67

Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30931]: Network 0 - DEBLOGJK: Read 312 byte(s) from FD 51

Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30937]: DHCP-DISCOVER giaddr=10.43.18.32 chaddr=00:a0:bc:6c:7d:3a 12= 50= 55=DHCP-Subnet-Mask,DHCP-Broadcast-Address,DHCP-Time-Offset,DHCP-Router-Address,DHCP-Domain-Name,DHCP-Domain-Name-Server,DHCP-Hostname 60= 61=0x01afac00000000 82.1=0x0000804bff66 82.2=0x00a0bc6c7d3a 82.6=00a0bc000001 at perf.test.viasat 82.100=82.1  <<< logs from our module's config

Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30931]: (77)          ldap - Closing connection (3): Hit idle_timeout, was idle for 207.534324944s

Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30931]: rlm_ldap (ldap) - Directory type: OpenLDAP

Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30937]: DHCP-LDAP_FOUND LDAP option43 string found for MAC 00:a0:bc:6c:7d:3a.

Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30931]: (77)      redis_ippool - Closing connection (5): Hit idle_timeout, was idle for 212.217780777s

12:12:36.108074 IP 10.43.18.48.67 > 10.43.18.32.67: BOOTP/DHCP, Reply, length 768

Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30937]: DHCP-OFFER giaddr=10.43.18.32 chaddr=00:a0:bc:6c:7d:3a yiaddr=172.21.0.181 1=255.255.0.0 3=172.21.0.1 6=72.173.31.41,72.173.31.42 43=0x010300a0bc0209555400000000000007030e55543200000000000307030a0009041155545f332e372e332e31302e392e62696e05040a4fff0f1444010002010311050044004407004300430101020103060701bd01bd0101020107008700880101020106c0a8640018070016001701010201031106c0a86400180700a100a11901012642687474703a2f2f6163732e6465762e636d742e7669617361742e696f3a393637352f6c6976652f4350454d616e616765722f435045732f67656e657269635452363927020005280207d02a0e55542e352e322e322e31303530332b040a4fff0f2c0e55542e352e322e322e3130333933330c5554320000000000000029083601ff3b0e55543200000000000307030a00093c125554325f332e372e332e31302e392e62696e3d040a4fff0f3e3c687474703a2f2f31302e37392e3235352e31362f7075622f7677615f696d616765732f7677615f55542e352e322e322e31303530332e7461722e677a3f3c687474703a2f2f31302e37392e3235352e31362f7075622f7677615f696d616765732f7677615f55542e352e322e322e31303339332e7461722e677a400100 51=120 61=0x01afac00000000 82.1=0x0000804bff66 82.2=0x00a0bc6c7d3a 82.6=00a0bc000001 at perf.test.viasat 82.100=

12:12:36.110060 IP 10.43.18.31.67 > 10.43.18.48.67: BOOTP/DHCP, Request from 00:a0:bc:6c:7d:3a, length 324

Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30931]: Network 0 - DEBLOGJK: Reading data from FD 51

Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30931]: DEBLOGJK: proto_dhvpv4_udp

Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30931]: DEBLOGJK: proto_dhcpv4_udp - Received DHCP-Request XID 00000000 length 324 dhcpv4_udp server 10.43.18.48 port 67

Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30931]: Network 0 - DEBLOGJK: Read 324 byte(s) from FD 51

Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30937]: DHCP-REQUEST giaddr=10.43.18.32 chaddr=00:a0:bc:6c:7d:3a ciaddr=10.43.18.32 12= 50=172.21.0.181 55=DHCP-Subnet-Mask,DHCP-Broadcast-Address,DHCP-Time-Offset,DHCP-Router-Address,DHCP-Domain-Name,DHCP-Domain-Name-Server,DHCP-Hostname 60= 61=0x01afac00000000 82.1=0x0000804bff66 82.2=0x00a0bc6c7d3a 82.6=00a0bc000001 at perf.test.viasat 82.100=

Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30937]: DHCP-LDAP_FOUND LDAP option43 string found for MAC 00:a0:bc:6c:7d:3a.

Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30931]: (78)      Closing connection (7): Hit idle_timeout, was idle for 207.588781775s

12:12:36.115554 IP 10.43.18.48.67 > 10.43.18.32.67: BOOTP/DHCP, Reply, length 768

Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30937]: DHCP-ACK giaddr=10.43.18.32 chaddr=00:a0:bc:6c:7d:3a yiaddr=172.21.0.181 1=255.255.0.0 3=172.21.0.1 6=72.173.31.41,72.173.31.42 12= 43=0x010300a0bc0209555400000000000007030e55543200000000000307030a0009041155545f332e372e332e31302e392e62696e05040a4fff0f1444010002010311050044004407004300430101020103060701bd01bd0101020107008700880101020106c0a8640018070016001701010201031106c0a86400180700a100a11901012642687474703a2f2f6163732e6465762e636d742e7669617361742e696f3a393637352f6c6976652f4350454d616e616765722f435045732f67656e657269635452363927020005280207d02a0e55542e352e322e322e31303530332b040a4fff0f2c0e55542e352e322e322e3130333933330c5554320000000000000029083601ff3b0e55543200000000000307030a00093c125554325f332e372e332e31302e392e62696e3d040a4fff0f3e3c687474703a2f2f31302e37392e3235352e31362f7075622f7677615f696d616765732f7677615f55542e352e322e322e31303530332e7461722e677a3f3c687474703a2f2f31302e37392e3235352e31362f7075622f7677615f696d616765732f7677615f55542e352e322e322e31303339332e7461722e677a400100 51=300 61=0x01afac00000000 82.1=0x0000804bff66 82.2=0x00a0bc6c7d3a 82.6=00a0bc000001 at perf.test.viasat 82.100=

12:12:36.441821 IP 10.43.18.31.67 > 10.43.18.48.67: BOOTP/DHCP, Request from 00:a0:bc:6c:7d:3b, length 312

Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30931]: Network 0 - DEBLOGJK: Reading data from FD 51

Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30931]: DEBLOGJK: proto_dhvpv4_udp

Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30931]: DEBLOGJK: proto_dhcpv4_udp - Received DHCP-Discover XID 00000001 length 312 dhcpv4_udp server 10.43.18.48 port 67

Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30931]: Network 0 - DEBLOGJK: Read 312 byte(s) from FD 51

Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30937]: DHCP-DISCOVER giaddr=10.43.18.32 chaddr=00:a0:bc:6c:7d:3b 12= 50= 55=DHCP-Subnet-Mask,DHCP-Broadcast-Address,DHCP-Time-Offset,DHCP-Router-Address,DHCP-Domain-Name,DHCP-Domain-Name-Server,DHCP-Hostname 60= 61=0x01afac00000000 82.1=0x0000804bff66 82.2=0x00a0bc6c7d3a 82.6=00a0bc000001 at perf.test.viasat 82.100=

12:12:36.444307 IP 10.43.18.48.67 > 10.43.18.32.67: BOOTP/DHCP, Reply, length 328

Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30937]: DHCP-OFFER giaddr=10.43.18.32 chaddr=00:a0:bc:6c:7d:3b yiaddr=172.18.16.112 1=255.0.0.0 3=172.16.0.1 6=72.173.31.41,72.173.31.42 43= 51=92 61=0x01afac00000000 82.1=0x0000804bff66 82.2=0x00a0bc6c7d3a 82.6=00a0bc000001 at perf.test.viasat 82.100=

12:12:36.444982 IP 10.43.18.31.67 > 10.43.18.48.67: BOOTP/DHCP, Request from 00:a0:bc:6c:7d:3b, length 324

Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30931]: Network 0 - DEBLOGJK: Reading data from FD 51

Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30931]: DEBLOGJK: proto_dhvpv4_udp

Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30931]: DEBLOGJK: proto_dhcpv4_udp - Received DHCP-Request XID 00000001 length 324 dhcpv4_udp server 10.43.18.48 port 67

Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30931]: Network 0 - DEBLOGJK: Read 324 byte(s) from FD 51

Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30937]: DHCP-REQUEST giaddr=10.43.18.32 chaddr=00:a0:bc:6c:7d:3b ciaddr=10.43.18.32 12= 50=172.18.16.112 55=DHCP-Subnet-Mask,DHCP-Broadcast-Address,DHCP-Time-Offset,DHCP-Router-Address,DHCP-Domain-Name,DHCP-Domain-Name-Server,DHCP-Hostname 60= 61=0x01afac00000000 82.1=0x0000804bff66 82.2=0x00a0bc6c7d3a 82.6=00a0bc000001 at perf.test.viasat 82.100=

12:12:36.447348 IP 10.43.18.48.67 > 10.43.18.32.67: BOOTP/DHCP, Reply, length 328

Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30937]: DHCP-ACK giaddr=10.43.18.32 chaddr=00:a0:bc:6c:7d:3b yiaddr=172.18.16.112 1=255.0.0.0 3=172.16.0.1 6=72.173.31.41,72.173.31.42 12= 43= 51=300 61=0x01afac00000000 82.1=0x0000804bff66 82.2=0x00a0bc6c7d3a 82.6=00a0bc000001 at perf.test.viasat 82.100=









Packet drop:

Client side:

12:41:16.748120 IP 10.43.18.32.67 > 10.43.18.31.67: BOOTP/DHCP, Request from 00:a0:bc:6c:7d:3a, length 311

12:41:17.148104 IP 10.43.18.32.67 > 10.43.18.31.67: BOOTP/DHCP, Request from 00:a0:bc:6c:7d:3b, length 311



Relay agent side:

12:41:16.750013 IP 10.43.18.32.67 > 10.43.18.31.67: BOOTP/DHCP, Request from 00:a0:bc:6c:7d:3a, length 311

12:41:16.750285 IP 10.43.18.31.67 > 10.43.18.48.67: BOOTP/DHCP, Request from 00:a0:bc:6c:7d:3a, length 312

12:41:17.150006 IP 10.43.18.32.67 > 10.43.18.31.67: BOOTP/DHCP, Request from 00:a0:bc:6c:7d:3b, length 311

12:41:17.150258 IP 10.43.18.31.67 > 10.43.18.48.67: BOOTP/DHCP, Request from 00:a0:bc:6c:7d:3b, length 312



DHCP server side:

12:41:16.750292 IP 10.43.18.31.67 > 10.43.18.48.67: BOOTP/DHCP, Request from 00:a0:bc:6c:7d:3a, length 312

Aug 13 12:41:16 citests01-dhcp-01-alpha radiusd[30931]: Network 0 - DEBLOGJK: Reading data from FD 51

Aug 13 12:41:16 citests01-dhcp-01-alpha radiusd[30931]: DEBLOGJK: proto_dhvpv4_udp

Aug 13 12:41:16 citests01-dhcp-01-alpha radiusd[30931]: DEBLOGJK: proto_dhcpv4_udp - Received DHCP-Discover XID 00000000 length 312 dhcpv4_udp server 10.43.18.48 port 67

Aug 13 12:41:16 citests01-dhcp-01-alpha radiusd[30931]: Network 0 - DEBLOGJK: Read 312 byte(s) from FD 51

12:41:17.150273 IP 10.43.18.31.67 > 10.43.18.48.67: BOOTP/DHCP, Request from 00:a0:bc:6c:7d:3b, length 312

Aug 13 12:41:17 citests01-dhcp-01-alpha radiusd[30931]: Network 0 - DEBLOGJK: Reading data from FD 51

Aug 13 12:41:17 citests01-dhcp-01-alpha radiusd[30931]: DEBLOGJK: proto_dhvpv4_udp

Aug 13 12:41:17 citests01-dhcp-01-alpha radiusd[30931]: DEBLOGJK: proto_dhcpv4_udp - Received DHCP-Discover XID 00000001 length 312 dhcpv4_udp server 10.43.18.48 port 67

12:41:17.350309 IP 10.43.18.31.67 > 10.43.18.48.67: BOOTP/DHCP, Request from 00:a0:bc:6c:7d:3c, length 312

Aug 13 12:41:17 citests01-dhcp-01-alpha radiusd[30931]: Network 0 - DEBLOGJK: Reading data from FD 51

Aug 13 12:41:17 citests01-dhcp-01-alpha radiusd[30931]: DEBLOGJK: proto_dhvpv4_udp

Aug 13 12:41:17 citests01-dhcp-01-alpha radiusd[30931]: DEBLOGJK: proto_dhcpv4_udp - Received DHCP-Discover XID 00000002 length 312 dhcpv4_udp server 10.43.18.48 port 67





seems packet is delivered to DHCP server, but radiusd is not responding and there are no other error logs in that time.

I tried my best to corner out the problem. But, could not tackle the exact use case as it is happening randomly.

I suspected these two logs,

ldap - Closing connection (3): Hit idle_timeout, was idle for 207.534324944s

redis_ippool - Closing connection (5): Hit idle_timeout, was idle for 212.217780777s

I tried tweaking that parameter from 60, to 600 then to infinite(0) in redis, redis_ippool and ldap. That didnt help.



Please help.




Regards,
Nagamani Chinnapaiyan



More information about the Freeradius-Users mailing list