1.1.7 rlm_detail locking is broken
Nicolas Baradakis
nbk at sitadelle.com
Tue Dec 4 01:40:56 CET 2007
Brian De Wolf wrote:
> Here's the patch, as promised.
Many thanks.
> Now, instead of radrelay's previous behavior, radrelay will attempt to
> open and lock <detail>.work. If it can, it will then read/transmit all
> of the entries and delete <detail>.work. If it can't lock, it will
> simply try locking later.
The new algorith doesn't require a non-blocking lock in a loop. After the
file is renamed rlm_detail will not try to lock it anymore.
> If it doesn't exist, it will check if <detail> has entries and if so
> it will move <detail> to <detail>.work (and make a new <detail>).
rlm_detail will create the new detail file. I think the function
detail_move() is not needed anymore. Just use a normal rename().
> It seems to move faster, but that's because I removed some of the sleeps to
> make sure writing had completed since that should be handled by the locks
> now. Luckily, my relay destinations aren't production yet, so I can keep
> testing with a decent load.
That's good news.
> --- src/main/radrelay.c.orig 2007-03-16 06:22:03.000000000 -0700
> +++ src/main/radrelay.c 2007-12-03 12:24:46.000000000 -0800
>
> [...]
>
> @@ -537,11 +514,13 @@
> if (rename(from, to) < 0)
> return -1;
>
> +
> oldmask = umask(0);
> if ((n = open(from, O_CREAT|O_RDWR, st.st_mode)) >= 0)
> close(n);
> umask(oldmask);
>
> +
> return 0;
> }
This is only cosmetics.
> @@ -607,19 +586,47 @@
> if (got_sigterm) state = STATE_SHUTDOWN;
>
> /*
> - * Open detail file - if needed, and if we can.
> + * Try to open our work file first. If not, check if the
> + * actual detail file has stuff in it. If so, open it,
> + * move it to the work file, and replace it with an empty
> + * file. Then keep the file handle for locking/reading.
> */
Maybe re-use the comments from Alan in src/main/detail.c:
/*
* Open detail.work first, so we don't lose
* accounting packets. It's probably better to
* duplicate them than to lose them.
*
* Note that we're not writing to the file, but
* we've got to open it for writing in order to
* establish the lock, to prevent rlm_detail from
* writing to it.
*/
> + if ((fp = fopen(work, "r+")) == NULL) {
/*
* Try reading the detail file. If it
* doesn't exist, we can't do anything.
*
* Doing the stat will tell us if the file
* exists, even if we don't have permissions
* to read it.
*/
> + if(!stat(r_args->detail, &st)) {
> + if(st.st_size > 0) {
> + detail_move(r_args->detail, work);
As said above, I think you could just use rename() and then you don't
need to check st_size > 0.
> + continue;
I have a question about this "continue": please check that radrelay will
not use 100% cpu when there is no detail file to read. (as many of the
sleep() are gone)
> + }
> + }
> }
>
> + /*
> + * Try to lock the detail-file.
> + * If lockf is used we want to lock the _whole_ file, hence the
Please no line of more than 80 characters in comments.
> + * fseek to the start of the file.
> + *
> + * If we can't lock the detail file, we close it and
> + * will try it again on the next iteration. Note that
> + * we will never unlock the file, on purpose, to stop
> + * rlm_detail from reading it.
> + */
The last part is confusing. rlm_detail must not try to acquire the lock
anymore after the file is renamed to detail.work.
> +
> + if(fp) {
> + fseek(fp, 0L, SEEK_SET);
> + do {
> + x = rad_lockfd_nonblock(fileno(fp), 0);
> + if (x == -1)
> + ms_sleep(100);
> + } while (x == -1 && i++ < 20);
> +
> + if (x == -1) {
> + fclose(fp);
> + fp = NULL;
> + } else {
> + state = STATE_BACKLOG;
> + }
No need to lock in a loop, so you can simplify that part. The lock is
only there to catch a corner case where the rename() happened while
the current rlm_detail was still writing in the file. (but new rlm_detail
execution will create and use a new file)
--
Nicolas Baradakis
More information about the Freeradius-Devel
mailing list