Discussion:
deferring operation: pending operations
Ben Beuchler
2007-04-07 13:07:55 UTC
Permalink
I'm getting a ton of these errors in a row, all on the same connection:

Apr 5 11:32:48 swizzle slapd[1626]: connection_input: conn=6232680
deferring operation: pending operations
Apr 5 11:32:51 swizzle slapd[1626]: connection_input: conn=6232680
deferring operation: pending operations
Apr 5 11:32:51 swizzle slapd[1626]: connection_input: conn=6232680
deferring operation: pending operations
Apr 5 11:32:52 swizzle slapd[1626]: connection_input: conn=6232680
deferring operation: pending operations

What does "deferring operation: pending operations" actually indicate?

Thanks!
Howard Chu
2007-04-07 15:06:57 UTC
Permalink
Where does it say there's an error here?
Post by Ben Beuchler
Apr 5 11:32:48 swizzle slapd[1626]: connection_input: conn=6232680
deferring operation: pending operations
Apr 5 11:32:51 swizzle slapd[1626]: connection_input: conn=6232680
deferring operation: pending operations
Apr 5 11:32:51 swizzle slapd[1626]: connection_input: conn=6232680
deferring operation: pending operations
Apr 5 11:32:52 swizzle slapd[1626]: connection_input: conn=6232680
deferring operation: pending operations
What does "deferring operation: pending operations" actually indicate?
It means that an operation that was just received on that connection has
been deferred, because there are other operations ahead of it that are
still pending. I.e., there's a bunch of operations on that connection
waiting to execute, and the newcomer is last in line.
--
-- Howard Chu
Chief Architect, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
Ben Beuchler
2007-04-08 17:39:58 UTC
Permalink
Post by Howard Chu
Where does it say there's an error here?
Fair enough. s/errors/messages/g
Post by Howard Chu
Post by Ben Beuchler
Apr 5 11:32:52 swizzle slapd[1626]: connection_input: conn=6232680
deferring operation: pending operations
What does "deferring operation: pending operations" actually indicate?
It means that an operation that was just received on that connection has
been deferred, because there are other operations ahead of it that are
still pending. I.e., there's a bunch of operations on that connection
waiting to execute, and the newcomer is last in line.
Hmmm. That particular connection rarely makes more than a couple
search or bind requests per second. Well within what OpenLDAP should
be able to handle on a nice, fast server like this. Is there a class
of mistakes that the client could be making to cause operations to
queue up like this?

-Ben
Howard Chu
2007-04-09 01:54:19 UTC
Permalink
Post by Ben Beuchler
Post by Howard Chu
Post by Ben Beuchler
Apr 5 11:32:52 swizzle slapd[1626]: connection_input: conn=6232680
deferring operation: pending operations
What does "deferring operation: pending operations" actually indicate?
It means that an operation that was just received on that connection has
been deferred, because there are other operations ahead of it that are
still pending. I.e., there's a bunch of operations on that connection
waiting to execute, and the newcomer is last in line.
Hmmm. That particular connection rarely makes more than a couple
search or bind requests per second. Well within what OpenLDAP should
be able to handle on a nice, fast server like this. Is there a class
of mistakes that the client could be making to cause operations to
queue up like this?
The only time I've seen this occur on a frequent basis is with poorly
written clients like Sendmail, which sends off a stream of a hundred or
so search requests on a connection before it checks for any replies. The
problem there is that the search replies build up in the network and
socket buffers and once the socket buffer gets full we can't send
anything else. At that point we stop executing operations on the
connection; we just wait for the client to drain it.
--
-- Howard Chu
Chief Architect, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
Pierangelo Masarati
2007-04-10 22:02:34 UTC
Permalink
Post by Ben Beuchler
Apr 5 11:32:48 swizzle slapd[1626]: connection_input: conn=6232680
deferring operation: pending operations
I've upgraded the client that was triggering the above failures. That
client is still intermittently hanging on its auth attempts. The
slapd frequently logs this message on connections owned by this
Apr 10 13:14:39 swizzle slapd[5944]: connection_input: conn=1696
deferring operation: binding
The client is making auth binds (anonymous search for dn, followed by
binding with that dn) for numerous users. It does all of this over a
single LDAP session that is kept open more or less until either the
client or slapd is restarted. It rarely makes more than 3 - 5
requests per second, so it seems unlikely that OpenLDAP is actually
unable to keep up.
What type of errors could the client make that would result in
repeated deferrals?
I'd note that if your client is performing search + bind on the same
connection in an asynchronous manner, then the server behavior as per
RFC4511 is undefined, since a server is not supposed to be willing to
handle operations intermixed with binds until a bind is concluded. That
is, any operation can be multiplexed on a single LDAP connection
__except__ binds, which must be serialized. A more appropriate client
for that type of operations would need to use at least 2 connections,
one for searches and one for binds, and the one for binds should take
care of either performing synchronous binds, or in any case to wait for
response to a bind before performing another. In case of failure (e.g.
timeout) while waiting for bind response, the connection should be
trashed (see ldap_unbind_ext(3)) and recreated. This is what is done,
for example, by slapd-ldap(5).

p.



Ing. Pierangelo Masarati
OpenLDAP Core Team

SysNet s.r.l.
via Dossi, 8 - 27100 Pavia - ITALIA
http://www.sys-net.it
---------------------------------------
Office: +39 02 23998309
Mobile: +39 333 4963172
Email: ***@sys-net.it
---------------------------------------
Ben Beuchler
2007-04-10 20:47:04 UTC
Permalink
Post by Ben Beuchler
Apr 5 11:32:48 swizzle slapd[1626]: connection_input: conn=6232680
deferring operation: pending operations
I've upgraded the client that was triggering the above failures. That
client is still intermittently hanging on its auth attempts. The
slapd frequently logs this message on connections owned by this
client:

Apr 10 13:14:39 swizzle slapd[5944]: connection_input: conn=1696
deferring operation: binding

The client is making auth binds (anonymous search for dn, followed by
binding with that dn) for numerous users. It does all of this over a
single LDAP session that is kept open more or less until either the
client or slapd is restarted. It rarely makes more than 3 - 5
requests per second, so it seems unlikely that OpenLDAP is actually
unable to keep up.

What type of errors could the client make that would result in
repeated deferrals?

Thanks!
Hallvard B Furuseth
2007-04-10 21:50:56 UTC
Permalink
Apr 10 13:14:39 swizzle slapd[5944]: connection_input: conn=1696
deferring operation: binding
Include 256 in your loglevel, if you haven't already, and look for the
'<date> swizzle slapd[<pid>]: conn=<num> ...' lines it produces.
In particular, look for 'conn=<num> op=<num> RESULT' lines _after_
the 'deferring operation' line, where the matched received operation
(same conn and op) was _before_ that line. Notice what the operation
was, and what the result code was (RESULT ... err=). You'll find the
result code descriptions in RFC 4511.

Note that logging is a bit sloppy - for requests with syntax errors or
some other problems, only the response is logged. But since these
requests fail without processing they shouldn't take time either.
--
Regards,
Hallvard
Ben Beuchler
2007-04-12 00:59:48 UTC
Permalink
Post by Hallvard B Furuseth
Include 256 in your loglevel, if you haven't already, and look for the
'<date> swizzle slapd[<pid>]: conn=<num> ...' lines it produces.
In particular, look for 'conn=<num> op=<num> RESULT' lines _after_
the 'deferring operation' line, where the matched received operation
(same conn and op) was _before_ that line. Notice what the operation
was, and what the result code was (RESULT ... err=). You'll find the
result code descriptions in RFC 4511.
The examples I could find all look like this:

Apr 11 15:06:43 cliff slapd[15355]: conn=1 op=36051 BIND anonymous
mech=implicit ssf=0
Apr 11 15:06:43 cliff slapd[15355]: conn=1 op=36051 BIND dn="" method=128
Apr 11 15:06:43 cliff slapd[15355]: connection_input: conn=1 deferring
operation: pending operations
Apr 11 15:06:43 cliff slapd[15355]: conn=1 op=36051 RESULT tag=97 err=0 text=

I can't find a reference to '97' anywhere in the above mentioned RFC.
Is there somewhere else I should look?

-Ben
Ben Beuchler
2007-04-12 01:03:16 UTC
Permalink
Post by Ben Beuchler
Apr 11 15:06:43 cliff slapd[15355]: conn=1 op=36051 BIND anonymous
mech=implicit ssf=0
Apr 11 15:06:43 cliff slapd[15355]: conn=1 op=36051 BIND dn="" method=128
Apr 11 15:06:43 cliff slapd[15355]: connection_input: conn=1 deferring
operation: pending operations
Apr 11 15:06:43 cliff slapd[15355]: conn=1 op=36051 RESULT tag=97 err=0 text=
I can't find a reference to '97' anywhere in the above mentioned RFC.
Is there somewhere else I should look?
My apologies; I neglected to include an example of the "deferring
operation: binding" log entries:

Apr 11 15:14:49 cliff slapd[15355]: conn=1 op=36640 BIND anonymous
mech=implicit ssf=0
Apr 11 15:14:49 cliff slapd[15355]: conn=1 op=36640 BIND
dn="uid=bbrogan,dc=users,dc=accounts,dc=ldap,dc=mcad,dc=edu"
method=128
Apr 11 15:14:49 cliff slapd[15355]: conn=1 op=36640 BIND
dn="uid=bbrogan,dc=users,dc=accounts,dc=ldap,dc=mcad,dc=edu"
mech=SIMPLE ssf=0
Apr 11 15:14:49 cliff slapd[15355]: conn=1 op=36640 RESULT tag=97 err=0 text=
Apr 11 15:14:50 cliff slapd[15355]: connection_input: conn=1 deferring
operation: binding
Apr 11 15:14:50 cliff slapd[15355]: conn=1 op=36641 BIND anonymous
mech=implicit ssf=0
Apr 11 15:14:50 cliff slapd[15355]: conn=1 op=36641 BIND dn="" method=128
Apr 11 15:14:50 cliff slapd[15355]: conn=1 op=36641 RESULT tag=97 err=0 text=

It appears to my untrained eye that the operation both before and
after the deferral completed successfully. Any idea what type of
client behavior could trigger messages like this?

-Ben
Howard Chu
2007-04-12 02:13:14 UTC
Permalink
Post by Ben Beuchler
My apologies; I neglected to include an example of the "deferring
Apr 11 15:14:49 cliff slapd[15355]: conn=1 op=36640 BIND anonymous
mech=implicit ssf=0
Apr 11 15:14:49 cliff slapd[15355]: conn=1 op=36640 BIND
dn="uid=bbrogan,dc=users,dc=accounts,dc=ldap,dc=mcad,dc=edu"
method=128
Apr 11 15:14:49 cliff slapd[15355]: conn=1 op=36640 BIND
dn="uid=bbrogan,dc=users,dc=accounts,dc=ldap,dc=mcad,dc=edu"
mech=SIMPLE ssf=0
Apr 11 15:14:49 cliff slapd[15355]: conn=1 op=36640 RESULT tag=97 err=0 text=
Apr 11 15:14:50 cliff slapd[15355]: connection_input: conn=1 deferring
operation: binding
Apr 11 15:14:50 cliff slapd[15355]: conn=1 op=36641 BIND anonymous
mech=implicit ssf=0
Apr 11 15:14:50 cliff slapd[15355]: conn=1 op=36641 BIND dn="" method=128
Apr 11 15:14:50 cliff slapd[15355]: conn=1 op=36641 RESULT tag=97 err=0 text=
It appears to my untrained eye that the operation both before and
after the deferral completed successfully. Any idea what type of
client behavior could trigger messages like this?
In this particular case it looks like the client has sent two Bind
requests back to back. Possibly without waiting for the result from the
first one to arrive. As Ando already explained in a previous response,
Binds are required to be fully serialized - when a Bind is active on a
connection, no other operations are allowed to run. Furthermore, if
non-Bind operations are active on a connection and a Bind is received,
all of those active operations are required to be immediately abandoned
before the Bind proceeds.

One thing you never mentioned - what version of slapd are you running?

I still don't see any particular failures occurring here. If your client
isn't having any problems I wouldn't waste any more time worrying about
it. If you feel like rewriting your client, you can follow Ando's advice
and use a dedicated connection for Binds and a separate connection for
the normal operations, but you'll probably need to use proxy
authorization in that case.
--
-- Howard Chu
Chief Architect, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
Ben Beuchler
2007-04-12 06:07:26 UTC
Permalink
Post by Howard Chu
One thing you never mentioned - what version of slapd are you running?
2.3.32
Post by Howard Chu
I still don't see any particular failures occurring here. If your client
isn't having any problems I wouldn't waste any more time worrying about
it.
It's entirely possible that I'm chasing a red herring. My primary
goal in posting here was to find out what, exactly, the deferral
messages indicated and, from that, whether or not they could be
responsible for the problems I'm seeing.

Thanks to the very clear, helpful responses from you, Pierangelo,
Hallvard, and others I'm fairly confident I'll be able to rule the
deferrals either in or out as relevant to my problem with a few more
troubleshooting steps.

Thanks!

-Ben
Howard Chu
2007-04-12 01:34:12 UTC
Permalink
Post by Ben Beuchler
Post by Hallvard B Furuseth
Include 256 in your loglevel, if you haven't already, and look for the
'<date> swizzle slapd[<pid>]: conn=<num> ...' lines it produces.
In particular, look for 'conn=<num> op=<num> RESULT' lines _after_
the 'deferring operation' line, where the matched received operation
(same conn and op) was _before_ that line. Notice what the operation
was, and what the result code was (RESULT ... err=). You'll find the
result code descriptions in RFC 4511.
Apr 11 15:06:43 cliff slapd[15355]: conn=1 op=36051 BIND anonymous
mech=implicit ssf=0
Apr 11 15:06:43 cliff slapd[15355]: conn=1 op=36051 BIND dn="" method=128
Apr 11 15:06:43 cliff slapd[15355]: connection_input: conn=1 deferring
operation: pending operations
Apr 11 15:06:43 cliff slapd[15355]: conn=1 op=36051 RESULT tag=97 err=0 text=
I can't find a reference to '97' anywhere in the above mentioned RFC.
Is there somewhere else I should look?
Is there an actual problem showing up on your client? That looks like a
pretty normal sequence, the received operations are still getting
executed and returning success.
--
-- Howard Chu
Chief Architect, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
Ben Beuchler
2007-04-12 06:00:12 UTC
Permalink
Post by Howard Chu
Is there an actual problem showing up on your client? That looks like a
pretty normal sequence, the received operations are still getting
executed and returning success.
There is an actual problem (intermittent hanging of the authentication
process under load) but I have nothing more than anecdote to tie it to
the deferrals. At this point they are the only lead I have. I'm
hoping that whatever the client's doing to cause the deferrals is,
eventually, producing the auth failures.

-Ben
Aaron Richton
2007-04-12 01:39:15 UTC
Permalink
It looks like that client is trying a SASL bind, then issuing a simple
bind prior to receiving the SASL bind response. I don't think that's
permitted per RFC; hence, slapd defers the simple bind?
Post by Ben Beuchler
Post by Hallvard B Furuseth
Include 256 in your loglevel, if you haven't already, and look for the
'<date> swizzle slapd[<pid>]: conn=<num> ...' lines it produces.
In particular, look for 'conn=<num> op=<num> RESULT' lines _after_
the 'deferring operation' line, where the matched received operation
(same conn and op) was _before_ that line. Notice what the operation
was, and what the result code was (RESULT ... err=). You'll find the
result code descriptions in RFC 4511.
Apr 11 15:06:43 cliff slapd[15355]: conn=1 op=36051 BIND anonymous
mech=implicit ssf=0
Apr 11 15:06:43 cliff slapd[15355]: conn=1 op=36051 BIND dn="" method=128
Apr 11 15:06:43 cliff slapd[15355]: connection_input: conn=1 deferring
operation: pending operations
Apr 11 15:06:43 cliff slapd[15355]: conn=1 op=36051 RESULT tag=97 err=0 text=
I can't find a reference to '97' anywhere in the above mentioned RFC.
Is there somewhere else I should look?
-Ben
Hallvard B Furuseth
2007-04-12 17:34:16 UTC
Permalink
Post by Ben Beuchler
Apr 11 15:06:43 cliff slapd[15355]: conn=1 op=36051 BIND anonymous
mech=implicit ssf=0
Apr 11 15:06:43 cliff slapd[15355]: conn=1 op=36051 BIND dn="" method=128
Apr 11 15:06:43 cliff slapd[15355]: connection_input: conn=1 deferring
operation: pending operations
Apr 11 15:06:43 cliff slapd[15355]: conn=1 op=36051 RESULT tag=97 err=0 text=
I can't find a reference to '97' anywhere in the above mentioned RFC.
Is there somewhere else I should look?
97 is BindResponse. But it's easier to take the conn=1 op=36051 in the
RESULT, and find the matching conn=1 op=36051 request. Each request on
a connection gets its own op number. Don't be confused by the several
BINDs with the same conn+op - that means several things are happening
with that request or slapd wanted to log more than one line's worth of
data. In this case, the mech=implicit Bind is because any Bind first
reverts the connection to anonymous if it is not already anonymous.

BTW, I notice the requests/responses you've posted were less than 2 or
often 1 seconds apart - is that how long clients hang, or are there some
really long hangs?

On another note, maybe you can dispatch the Binds to e.g. 4 processes
and thus 4 connections, thus reducing much of the sequencing.
--
Regards,
Hallvard
Ben Beuchler
2007-04-12 19:19:01 UTC
Permalink
Post by Hallvard B Furuseth
BTW, I notice the requests/responses you've posted were less than 2 or
often 1 seconds apart - is that how long clients hang, or are there some
really long hangs?
Really long hangs, until I restart the client. Based on feedback from
this list, the author found a bug in his auth bind code, which fixed
the sequencing on the binds and cleared up the deferrals. So far it
appears to have fixed the prolonged hangs, too. It's possible,
though, that when I re-linked the daemon against 2.3.32 that fixed
something.
Post by Hallvard B Furuseth
On another note, maybe you can dispatch the Binds to e.g. 4 processes
and thus 4 connections, thus reducing much of the sequencing.
That's an interesting idea; I will propose it to the author.

Thanks!

-Ben

Continue reading on narkive:
Loading...