Seeing dhcp packet drops in latest master
Chinnapaiyan, Nagamani
Nagamani.Chinnapaiyan at viasat.com
Thu Aug 13 15:11:10 CEST 2020
We are seeing packet drops frequently in latest 4.x version(#8c2e4122572a12f43d4ec294d3258a2f0da07b1d). We are not doing scale test, just ~10 DORAs. If I run some 10 testcases(10 DORAs each test case), it fails randomly at any one of the tests. The same testcases were passing around one month before.
We have Client -> Freeradius dhcp Relay agent(Load Balancer) -> freeradius dhcp server.
I could not capture issue clearly with debug mode as it is slower. So I enabled some of the debug logs in the code.
----------------
diff --git a/src/lib/io/network.c b/src/lib/io/network.c
index 7276606..3850725 100644
--- a/src/lib/io/network.c
+++ b/src/lib/io/network.c
@@ -684,6 +684,7 @@ static void fr_network_read(UNUSED fr_event_list_t *el, int sockfd, UNUSED int f
s->listen->fd, sockfd)) return;
DEBUG3("Reading data from FD %u", sockfd);
+ ERROR("DEBLOGJK: Reading data from FD %u", sockfd);
if (!s->cd) {
cd = (fr_channel_data_t *) fr_message_reserve(s->ms, s->listen->default_message_size);
@@ -751,6 +752,7 @@ next_message:
s->cd = NULL;
DEBUG3("Read %zd byte(s) from FD %u", data_size, sockfd);
+ ERROR("DEBLOGJK: Read %zd byte(s) from FD %u", data_size, sockfd);
nr->stats.in++;
s->stats.in++;
diff --git a/src/modules/proto_dhcpv4/proto_dhcpv4_udp.c b/src/modules/proto_dhcpv4/proto_dhcpv4_udp.c
index 89e7ac9..38736fe 100644
--- a/src/modules/proto_dhcpv4/proto_dhcpv4_udp.c
+++ b/src/modules/proto_dhcpv4/proto_dhcpv4_udp.c
@@ -159,6 +159,7 @@ static ssize_t mod_read(fr_listen_t *li, void **packet_ctx, fr_time_t *recv_time
&address->src_ipaddr, &address->src_port,
&address->dst_ipaddr, &address->dst_port,
&address->if_index, recv_time_p);
+ ERROR("DEBLOGJK: proto_dhvpv4_udp");
if (data_size < 0) {
DEBUG2("proto_dhvpv4_udp got read error %zd: %s", data_size, fr_strerror());
return data_size;
@@ -203,6 +204,9 @@ static ssize_t mod_read(fr_listen_t *li, void **packet_ctx, fr_time_t *recv_time
DEBUG2("proto_dhcpv4_udp - Received %s XID %08x length %d %s",
dhcp_message_types[message_type], xid,
(int) packet_len, thread->name);
+ ERROR("DEBLOGJK: proto_dhcpv4_udp - Received %s XID %08x length %d %s",
+ dhcp_message_types[message_type], xid,
+ (int) packet_len, thread->name);
return packet_len;
}
-------------------
Below are tcpdump and radiusd logs in success case and failure cases.
normal time (success case):
Client side:(tcpdump)
12:12:36.039499 IP 10.43.18.32.67 > 10.43.18.31.67: BOOTP/DHCP, Request from 00:a0:bc:6c:7d:3a, length 311
12:12:36.106401 IP 10.43.18.48.67 > 10.43.18.32.67: BOOTP/DHCP, Reply, length 768
12:12:36.107826 IP 10.43.18.32.67 > 10.43.18.31.67: BOOTP/DHCP, Request from 00:a0:bc:6c:7d:3a, length 323
12:12:36.113856 IP 10.43.18.48.67 > 10.43.18.32.67: BOOTP/DHCP, Reply, length 768
12:12:36.439471 IP 10.43.18.32.67 > 10.43.18.31.67: BOOTP/DHCP, Request from 00:a0:bc:6c:7d:3b, length 311
12:12:36.442614 IP 10.43.18.48.67 > 10.43.18.32.67: BOOTP/DHCP, Reply, length 328
12:12:36.442895 IP 10.43.18.32.67 > 10.43.18.31.67: BOOTP/DHCP, Request from 00:a0:bc:6c:7d:3b, length 323
12:12:36.445647 IP 10.43.18.48.67 > 10.43.18.32.67: BOOTP/DHCP, Reply, length 328
Relay Agent side:(tcpdump)
12:12:36.041608 IP 10.43.18.32.67 > 10.43.18.31.67: BOOTP/DHCP, Request from 00:a0:bc:6c:7d:3a, length 311
12:12:36.041851 IP 10.43.18.31.67 > 10.43.18.48.67: BOOTP/DHCP, Request from 00:a0:bc:6c:7d:3a, length 312
12:12:36.109916 IP 10.43.18.32.67 > 10.43.18.31.67: BOOTP/DHCP, Request from 00:a0:bc:6c:7d:3a, length 323
12:12:36.110206 IP 10.43.18.31.67 > 10.43.18.48.67: BOOTP/DHCP, Request from 00:a0:bc:6c:7d:3a, length 324
12:12:36.441579 IP 10.43.18.32.67 > 10.43.18.31.67: BOOTP/DHCP, Request from 00:a0:bc:6c:7d:3b, length 311
12:12:36.441894 IP 10.43.18.31.67 > 10.43.18.48.67: BOOTP/DHCP, Request from 00:a0:bc:6c:7d:3b, length 312
12:12:36.444965 IP 10.43.18.32.67 > 10.43.18.31.67: BOOTP/DHCP, Request from 00:a0:bc:6c:7d:3b, length 323
12:12:36.445130 IP 10.43.18.31.67 > 10.43.18.48.67: BOOTP/DHCP, Request from 00:a0:bc:6c:7d:3b, length 324
DHCP server side: (tcpdump + enabled radiusd logs)
12:12:36.041788 IP 10.43.18.31.67 > 10.43.18.48.67: BOOTP/DHCP, Request from 00:a0:bc:6c:7d:3a, length 312 <<tcpdump line
Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30931]: Network 0 - DEBLOGJK: Reading data from FD 51 <<< added error logs
Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30931]: DEBLOGJK: proto_dhvpv4_udp
Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30931]: DEBLOGJK: proto_dhcpv4_udp - Received DHCP-Discover XID 00000000 length 312 dhcpv4_udp server 10.43.18.48 port 67
Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30931]: Network 0 - DEBLOGJK: Read 312 byte(s) from FD 51
Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30937]: DHCP-DISCOVER giaddr=10.43.18.32 chaddr=00:a0:bc:6c:7d:3a 12= 50= 55=DHCP-Subnet-Mask,DHCP-Broadcast-Address,DHCP-Time-Offset,DHCP-Router-Address,DHCP-Domain-Name,DHCP-Domain-Name-Server,DHCP-Hostname 60= 61=0x01afac00000000 82.1=0x0000804bff66 82.2=0x00a0bc6c7d3a 82.6=00a0bc000001 at perf.test.viasat 82.100=82.1 <<< logs from our module's config
Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30931]: (77) ldap - Closing connection (3): Hit idle_timeout, was idle for 207.534324944s
Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30931]: rlm_ldap (ldap) - Directory type: OpenLDAP
Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30937]: DHCP-LDAP_FOUND LDAP option43 string found for MAC 00:a0:bc:6c:7d:3a.
Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30931]: (77) redis_ippool - Closing connection (5): Hit idle_timeout, was idle for 212.217780777s
12:12:36.108074 IP 10.43.18.48.67 > 10.43.18.32.67: BOOTP/DHCP, Reply, length 768
Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30937]: DHCP-OFFER giaddr=10.43.18.32 chaddr=00:a0:bc:6c:7d:3a yiaddr=172.21.0.181 1=255.255.0.0 3=172.21.0.1 6=72.173.31.41,72.173.31.42 43=0x010300a0bc0209555400000000000007030e55543200000000000307030a0009041155545f332e372e332e31302e392e62696e05040a4fff0f1444010002010311050044004407004300430101020103060701bd01bd0101020107008700880101020106c0a8640018070016001701010201031106c0a86400180700a100a11901012642687474703a2f2f6163732e6465762e636d742e7669617361742e696f3a393637352f6c6976652f4350454d616e616765722f435045732f67656e657269635452363927020005280207d02a0e55542e352e322e322e31303530332b040a4fff0f2c0e55542e352e322e322e3130333933330c5554320000000000000029083601ff3b0e55543200000000000307030a00093c125554325f332e372e332e31302e392e62696e3d040a4fff0f3e3c687474703a2f2f31302e37392e3235352e31362f7075622f7677615f696d616765732f7677615f55542e352e322e322e31303530332e7461722e677a3f3c687474703a2f2f31302e37392e3235352e31362f7075622f7677615f696d616765732f7677615f55542e352e322e322e31303339332e7461722e677a400100 51=120 61=0x01afac00000000 82.1=0x0000804bff66 82.2=0x00a0bc6c7d3a 82.6=00a0bc000001 at perf.test.viasat 82.100=
12:12:36.110060 IP 10.43.18.31.67 > 10.43.18.48.67: BOOTP/DHCP, Request from 00:a0:bc:6c:7d:3a, length 324
Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30931]: Network 0 - DEBLOGJK: Reading data from FD 51
Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30931]: DEBLOGJK: proto_dhvpv4_udp
Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30931]: DEBLOGJK: proto_dhcpv4_udp - Received DHCP-Request XID 00000000 length 324 dhcpv4_udp server 10.43.18.48 port 67
Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30931]: Network 0 - DEBLOGJK: Read 324 byte(s) from FD 51
Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30937]: DHCP-REQUEST giaddr=10.43.18.32 chaddr=00:a0:bc:6c:7d:3a ciaddr=10.43.18.32 12= 50=172.21.0.181 55=DHCP-Subnet-Mask,DHCP-Broadcast-Address,DHCP-Time-Offset,DHCP-Router-Address,DHCP-Domain-Name,DHCP-Domain-Name-Server,DHCP-Hostname 60= 61=0x01afac00000000 82.1=0x0000804bff66 82.2=0x00a0bc6c7d3a 82.6=00a0bc000001 at perf.test.viasat 82.100=
Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30937]: DHCP-LDAP_FOUND LDAP option43 string found for MAC 00:a0:bc:6c:7d:3a.
Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30931]: (78) Closing connection (7): Hit idle_timeout, was idle for 207.588781775s
12:12:36.115554 IP 10.43.18.48.67 > 10.43.18.32.67: BOOTP/DHCP, Reply, length 768
Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30937]: DHCP-ACK giaddr=10.43.18.32 chaddr=00:a0:bc:6c:7d:3a yiaddr=172.21.0.181 1=255.255.0.0 3=172.21.0.1 6=72.173.31.41,72.173.31.42 12= 43=0x010300a0bc0209555400000000000007030e55543200000000000307030a0009041155545f332e372e332e31302e392e62696e05040a4fff0f1444010002010311050044004407004300430101020103060701bd01bd0101020107008700880101020106c0a8640018070016001701010201031106c0a86400180700a100a11901012642687474703a2f2f6163732e6465762e636d742e7669617361742e696f3a393637352f6c6976652f4350454d616e616765722f435045732f67656e657269635452363927020005280207d02a0e55542e352e322e322e31303530332b040a4fff0f2c0e55542e352e322e322e3130333933330c5554320000000000000029083601ff3b0e55543200000000000307030a00093c125554325f332e372e332e31302e392e62696e3d040a4fff0f3e3c687474703a2f2f31302e37392e3235352e31362f7075622f7677615f696d616765732f7677615f55542e352e322e322e31303530332e7461722e677a3f3c687474703a2f2f31302e37392e3235352e31362f7075622f7677615f696d616765732f7677615f55542e352e322e322e31303339332e7461722e677a400100 51=300 61=0x01afac00000000 82.1=0x0000804bff66 82.2=0x00a0bc6c7d3a 82.6=00a0bc000001 at perf.test.viasat 82.100=
12:12:36.441821 IP 10.43.18.31.67 > 10.43.18.48.67: BOOTP/DHCP, Request from 00:a0:bc:6c:7d:3b, length 312
Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30931]: Network 0 - DEBLOGJK: Reading data from FD 51
Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30931]: DEBLOGJK: proto_dhvpv4_udp
Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30931]: DEBLOGJK: proto_dhcpv4_udp - Received DHCP-Discover XID 00000001 length 312 dhcpv4_udp server 10.43.18.48 port 67
Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30931]: Network 0 - DEBLOGJK: Read 312 byte(s) from FD 51
Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30937]: DHCP-DISCOVER giaddr=10.43.18.32 chaddr=00:a0:bc:6c:7d:3b 12= 50= 55=DHCP-Subnet-Mask,DHCP-Broadcast-Address,DHCP-Time-Offset,DHCP-Router-Address,DHCP-Domain-Name,DHCP-Domain-Name-Server,DHCP-Hostname 60= 61=0x01afac00000000 82.1=0x0000804bff66 82.2=0x00a0bc6c7d3a 82.6=00a0bc000001 at perf.test.viasat 82.100=
12:12:36.444307 IP 10.43.18.48.67 > 10.43.18.32.67: BOOTP/DHCP, Reply, length 328
Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30937]: DHCP-OFFER giaddr=10.43.18.32 chaddr=00:a0:bc:6c:7d:3b yiaddr=172.18.16.112 1=255.0.0.0 3=172.16.0.1 6=72.173.31.41,72.173.31.42 43= 51=92 61=0x01afac00000000 82.1=0x0000804bff66 82.2=0x00a0bc6c7d3a 82.6=00a0bc000001 at perf.test.viasat 82.100=
12:12:36.444982 IP 10.43.18.31.67 > 10.43.18.48.67: BOOTP/DHCP, Request from 00:a0:bc:6c:7d:3b, length 324
Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30931]: Network 0 - DEBLOGJK: Reading data from FD 51
Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30931]: DEBLOGJK: proto_dhvpv4_udp
Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30931]: DEBLOGJK: proto_dhcpv4_udp - Received DHCP-Request XID 00000001 length 324 dhcpv4_udp server 10.43.18.48 port 67
Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30931]: Network 0 - DEBLOGJK: Read 324 byte(s) from FD 51
Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30937]: DHCP-REQUEST giaddr=10.43.18.32 chaddr=00:a0:bc:6c:7d:3b ciaddr=10.43.18.32 12= 50=172.18.16.112 55=DHCP-Subnet-Mask,DHCP-Broadcast-Address,DHCP-Time-Offset,DHCP-Router-Address,DHCP-Domain-Name,DHCP-Domain-Name-Server,DHCP-Hostname 60= 61=0x01afac00000000 82.1=0x0000804bff66 82.2=0x00a0bc6c7d3a 82.6=00a0bc000001 at perf.test.viasat 82.100=
12:12:36.447348 IP 10.43.18.48.67 > 10.43.18.32.67: BOOTP/DHCP, Reply, length 328
Aug 13 12:12:36 citests01-dhcp-01-alpha radiusd[30937]: DHCP-ACK giaddr=10.43.18.32 chaddr=00:a0:bc:6c:7d:3b yiaddr=172.18.16.112 1=255.0.0.0 3=172.16.0.1 6=72.173.31.41,72.173.31.42 12= 43= 51=300 61=0x01afac00000000 82.1=0x0000804bff66 82.2=0x00a0bc6c7d3a 82.6=00a0bc000001 at perf.test.viasat 82.100=
Packet drop:
Client side:
12:41:16.748120 IP 10.43.18.32.67 > 10.43.18.31.67: BOOTP/DHCP, Request from 00:a0:bc:6c:7d:3a, length 311
12:41:17.148104 IP 10.43.18.32.67 > 10.43.18.31.67: BOOTP/DHCP, Request from 00:a0:bc:6c:7d:3b, length 311
Relay agent side:
12:41:16.750013 IP 10.43.18.32.67 > 10.43.18.31.67: BOOTP/DHCP, Request from 00:a0:bc:6c:7d:3a, length 311
12:41:16.750285 IP 10.43.18.31.67 > 10.43.18.48.67: BOOTP/DHCP, Request from 00:a0:bc:6c:7d:3a, length 312
12:41:17.150006 IP 10.43.18.32.67 > 10.43.18.31.67: BOOTP/DHCP, Request from 00:a0:bc:6c:7d:3b, length 311
12:41:17.150258 IP 10.43.18.31.67 > 10.43.18.48.67: BOOTP/DHCP, Request from 00:a0:bc:6c:7d:3b, length 312
DHCP server side:
12:41:16.750292 IP 10.43.18.31.67 > 10.43.18.48.67: BOOTP/DHCP, Request from 00:a0:bc:6c:7d:3a, length 312
Aug 13 12:41:16 citests01-dhcp-01-alpha radiusd[30931]: Network 0 - DEBLOGJK: Reading data from FD 51
Aug 13 12:41:16 citests01-dhcp-01-alpha radiusd[30931]: DEBLOGJK: proto_dhvpv4_udp
Aug 13 12:41:16 citests01-dhcp-01-alpha radiusd[30931]: DEBLOGJK: proto_dhcpv4_udp - Received DHCP-Discover XID 00000000 length 312 dhcpv4_udp server 10.43.18.48 port 67
Aug 13 12:41:16 citests01-dhcp-01-alpha radiusd[30931]: Network 0 - DEBLOGJK: Read 312 byte(s) from FD 51
12:41:17.150273 IP 10.43.18.31.67 > 10.43.18.48.67: BOOTP/DHCP, Request from 00:a0:bc:6c:7d:3b, length 312
Aug 13 12:41:17 citests01-dhcp-01-alpha radiusd[30931]: Network 0 - DEBLOGJK: Reading data from FD 51
Aug 13 12:41:17 citests01-dhcp-01-alpha radiusd[30931]: DEBLOGJK: proto_dhvpv4_udp
Aug 13 12:41:17 citests01-dhcp-01-alpha radiusd[30931]: DEBLOGJK: proto_dhcpv4_udp - Received DHCP-Discover XID 00000001 length 312 dhcpv4_udp server 10.43.18.48 port 67
12:41:17.350309 IP 10.43.18.31.67 > 10.43.18.48.67: BOOTP/DHCP, Request from 00:a0:bc:6c:7d:3c, length 312
Aug 13 12:41:17 citests01-dhcp-01-alpha radiusd[30931]: Network 0 - DEBLOGJK: Reading data from FD 51
Aug 13 12:41:17 citests01-dhcp-01-alpha radiusd[30931]: DEBLOGJK: proto_dhvpv4_udp
Aug 13 12:41:17 citests01-dhcp-01-alpha radiusd[30931]: DEBLOGJK: proto_dhcpv4_udp - Received DHCP-Discover XID 00000002 length 312 dhcpv4_udp server 10.43.18.48 port 67
seems packet is delivered to DHCP server, but radiusd is not responding and there are no other error logs in that time.
I tried my best to corner out the problem. But, could not tackle the exact use case as it is happening randomly.
I suspected these two logs,
ldap - Closing connection (3): Hit idle_timeout, was idle for 207.534324944s
redis_ippool - Closing connection (5): Hit idle_timeout, was idle for 212.217780777s
I tried tweaking that parameter from 60, to 600 then to infinite(0) in redis, redis_ippool and ldap. That didnt help.
Please help.
Regards,
Nagamani Chinnapaiyan
More information about the Freeradius-Users
mailing list