FR 2.1.7 Exits for no reason

McNutt, Justin M. McNuttJ at missouri.edu
Wed Mar 9 05:12:22 CET 2011


You must realize that "gdb" by itself is an answer that is of very little use.  While I am aware that gdb is the GNU Debugger, you have no way of knowing that I do, and you gave no other context or other information that would help me use gdb to gather anything.

So let me be more clear:

What EXACTLY do I need to do to get more information about this phenomenon, and under what circumstances do I need to do it, and once I have some output, what should I be looking for in it?  Running production RADIUS servers with "strace radiusd -X" is probably impractical (and highly insecure), and may even alter the runtime environment such that the fatal event never occurs.  I've never observed the failure in either of the two test servers I run, and their configurations are identical, so I must assume that radiusd dies after receiving some sort of improper/unexpected data, or when it gets into some weird state, or other such thing.

But it can't be fixed if I can't figure out how to reproduce it.  It'll happen eventually, but a server that is no longer running doesn't tell me much either.  How is gdb going to help me figure out why something isn't working any more?

--J

________________________________
From: freeradius-users-bounces+mcnuttj=missouri.edu at lists.freeradius.org [mailto:freeradius-users-bounces+mcnuttj=missouri.edu at lists.freeradius.org] On Behalf Of Gary Gatten
Sent: Tuesday, March 08, 2011 5:06 PM
To: 'freeradius-users at lists.freeradius.org'
Subject: Re: FR 2.1.7 Exits for no reason

Gdb

From: McNutt, Justin M. [mailto:McNuttJ at missouri.edu]
Sent: Tuesday, March 08, 2011 04:59 PM
To: freeradius-users at lists.freeradius.org <freeradius-users at lists.freeradius.org>
Subject: FR 2.1.7 Exits for no reason

Hey all,

So the host-based auth stuff is working well now, but we've discovered another problem.

We have four FR 2.1.7 servers running on RHEL 5 (fully patched).  Every now and then, for no apparent reason, radiusd just stops.  It exits with "Exiting normally." to syslog.  They don't all exit at the same time.  Since there are four of them behind a load balancer, it usually doesn't result in a service outage, and we've been lucky so far that only a couple of them have been down at once.  But it's still disconcerting.

The servers tend to all be started within a minute of each other, since I make changes to Server #1, and then use an rsync script to replicate /etc/raddb to the other servers and restart them.  So they all start within seconds of one another.  This week, Server #3 stopped within about 8 hours of being started (went from 1130 to 1930).  Server #1 failed last week at 2330.  Server #4 hasn't failed yet.  It's very odd.

Any ideas on how I can troubleshoot this?

Thanks!

Justin McNutt
Network Systems Analyst - Ninja
DNPS, Mizzou Telecom
(573) 882-5183

"Do you have a concussion?"

Ping is NOT a service.  You don't need it.  Use a real test.


"This email is intended to be reviewed by only the intended recipient and may contain information that is privileged and/or confidential. If you are not the intended recipient, you are hereby notified that any review, use, dissemination, disclosure or copying of this email and its attachments, if any, is strictly prohibited. If you have received this email in error, please immediately notify the sender by return email and delete this email from your system."
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freeradius.org/pipermail/freeradius-users/attachments/20110308/331a57a4/attachment.html>


More information about the Freeradius-Users mailing list