System frequently stops responding...

Mohamed Lrhazi Mohamed.Lrhazi at georgetown.edu
Tue Jul 28 16:28:52 CEST 2015


Here is the similar trace, with -T, for logging the time each syscall
took.. and apparently none actually took the missing 30 seconds!!


18916 08:38:42.908889 write(3, "Tue Jul 28 08:38:42 2015 : Debug"..., 92) =
92 <0.000010>
18916 08:38:42.908937 write(3, "Tue Jul 28 08:38:42 2015 : Debug"..., 80) =
80 <0.000009>
18916 08:38:42.908982 write(3, "Tue Jul 28 08:38:42 2015 : Debug"..., 122)
= 122 <0.000009>
18916 08:38:42.909023 sendmsg(26, {msg_name(16)={sa_family=AF_INET,
sin_port=htons(32770), sin_addr=inet_addr("10.128.254.50")},
msg_iov(1)=[{"\3\37\0\24\3558.7c\261yC\325\344+I\257\271\234\215", 20}],
msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_IP, cmsg_type=, ...},
msg_flags=0}, 0) = 20 <0.000056>
18916 08:38:42.909134 write(3, "Tue Jul 28 08:38:42 2015 : Debug"..., 59) =
59 <0.000011>
18916 *08:38:42.909176* wait4(-1, NULL, WNOHANG, NULL) = -1 ECHILD (No
child processes) <*0.000019*>
18916 *08:39:12.500099 *write(3, "Tue Jul 28 08:39:12 2015 : Debug"..., 87)
= 87 <0.000017>
18916 08:39:12.500222 write(3, "Tue Jul 28 08:39:12 2015 : Proxy"..., 85) =
85 <0.000011>
18916 08:39:12.500274 write(3, "Tue Jul 28 08:39:12 2015 : Debug"..., 85) =
85 <0.000010>


On Mon, Jul 27, 2015 at 11:07 PM, Mohamed Lrhazi <
Mohamed.Lrhazi at georgetown.edu> wrote:

> so, I run radiusd -Xx under strace... and it looks like I caught the
> event... not sure what to make of strace though :
>
> This is from the radiusd log:
>
> Mon Jul 27 21:54:39 2015 : Debug: (31451)   } # Post-Auth-Type REJECT =
> updated
> Mon Jul 27 21:54:39 2015 : Debug: (31451) Sent Access-Reject Id 250 from
> 10.212.11.14:1812 to 10.128.254.52:32770 length 0
> *Mon Jul 27 21:54:39 2015 :* Debug: (31451) Finished request
> *Mon Jul 27 21:55:09 2015 :* Debug: PING: Zombie period is over for home
> server eduroam1
> Mon Jul 27 21:55:09 2015 : Proxy: Marking home server 64.57.22.74 port
> 1812 as dead.
> Mon Jul 27 21:55:09 2015 : Debug: PING: Reviving home server eduroam1 in
> 300 seconds
> Mon Jul 27 21:55:09 2015 : Debug: (31664) <done>: Cleaning up request
> packet ID 71 with timestamp +4030
> Mon Jul 27 21:55:09 2015 : Debug: (31665) <done>: Cleaning up request
> packet ID 72 with timestamp +4030
> Mon Jul 27 21:55:09 2015 : Debug: (31666) <done>: Cleaning up request
> packet ID 73 with timestamp +4030
> Mon Jul 27 21:55:09 2015 : Debug: (31667) <done>: Cleaning up request
> packet ID 74 with timestamp +4030
> Mon Jul 27 21:55:09 2015 : Debug: (31759) <delay>: Sending delayed response
> Mon Jul 27 21:55:09 2015 : Debug: (31759) <delay>: Sent Access-Reject Id
> 208 from 10.212.11.14:1812 to 10.128.254.50:32770 length 20
>
>
> and this from the strace:
>
> 21:54:39.637272 write(3, "Mon Jul 27 21:54:39 2015 : Debug"..., 64) = 64
> 21:54:39.637307 write(3, "Mon Jul 27 21:54:39 2015 : Debug"..., 64) = 64
> 21:54:39.637343 write(3, "Mon Jul 27 21:54:39 2015 : Debug"..., 92) = 92
> 21:54:39.637379 write(3, "Mon Jul 27 21:54:39 2015 : Debug"..., 80) = 80
> 21:54:39.637426 write(3, "Mon Jul 27 21:54:39 2015 : Debug"..., 123) = 123
> 21:54:39.637476 sendmsg(26, {msg_name(16)={sa_family=AF_INET,
> sin_port=htons(32770), sin_addr=inet_addr("10.128.254.52")},
> msg_iov(1)=[{"\3\372\0\24\331\36sKR)\216\23?\263.\17\356#pR", 20}],
> msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_IP, cmsg_type=, ...},
> msg_flags=0}, 0) = 20
> 21:54:39.637593 write(3, "Mon Jul 27 21:54:39 2015 : Debug"..., 59) = 59
> *21:54:39.637630 *wait4(-1, NULL, WNOHANG, NULL) = -1 ECHILD (No child
> processes)
> *21:55:09.500134* write(3, "Mon Jul 27 21:55:09 2015 : Debug"..., 87) = 87
> 21:55:09.500311 write(3, "Mon Jul 27 21:55:09 2015 : Proxy"..., 85) = 85
> 21:55:09.500369 write(3, "Mon Jul 27 21:55:09 2015 : Debug"..., 85) = 85
> 21:55:09.500431 write(3, "Mon Jul 27 21:55:09 2015 : Debug"..., 104) = 104
>
>
> and it does look like each time this "freeze" happens, it takes exactly 30
> seconds?
>
>
> Thanks,
> Mohamed.
>
> On Sun, Jul 26, 2015 at 6:57 PM, Mohamed Lrhazi <
> Mohamed.Lrhazi at georgetown.edu> wrote:
>
>> PsPageFaults spikes for radiusd process, coinciding with the issue.
>>
>> On Sun, Jul 26, 2015 at 6:52 PM, Mohamed Lrhazi <
>> Mohamed.Lrhazi at georgetown.edu> wrote:
>>
>>> I installed collected on this system, and the one metric that seems to
>>> spike at the same time as the issue, is PsPageFaults for radiusd process.
>>>
>>> On Sun, Jul 26, 2015 at 6:41 PM, Mohamed Lrhazi <
>>> Mohamed.Lrhazi at georgetown.edu> wrote:
>>>
>>>> Oh, Alan. B suggested strace... I'll see how I can grab that... I am
>>>> sure I could not possibly run radiusd under tracing the whole time right?
>>>>
>>>> On Sun, Jul 26, 2015 at 6:36 PM, Mohamed Lrhazi <
>>>> Mohamed.Lrhazi at georgetown.edu> wrote:
>>>>
>>>>>
>>>>> I can observe this while running in -Xx mode (via a bash script that
>>>>>> kills it evey 1 minute, and starts it anew, also starts annd stops tcpdump
>>>>>> captures)
>>>>>
>>>>>
>>>>>
>>>>> No, every 3600 seconds, actually. Sorry.
>>>>>
>>>>>
>>>>
>>>
>>
>


More information about the Freeradius-Users mailing list