Discussion:
syncrepl failure over time
Aaron Bennett
2010-03-10 21:06:33 UTC
Permalink
Hi,

openldap-2.3.43
db4-4.3.29
CentOS 5.3

We're observing that syncrepl with refreshandpersist is failing at some
point after an ldap restart. I've written a script to try to figure out
exactly when, but it's sometime within 24 hours.

A slapd restart on the consumer always picks up any pending changes.

Here's our configuration:

syncrepl provider:
database bdb
overlay ppolicy
ppolicy_default "cn=default,ou=policies,dc=clarku,dc=edu"
#readonly on
overlay syncprov
suffix "dc=clarku,dc=edu"
rootdn "cn=Manager,ou=Services,dc=clarku,dc=edu"
rootpw --snip--
directory /var/lib/ldap
checkpoint 5120 60
cachesize 50000
idlcachesize 150000
dbconfig set_cachesize 0 524288000 1
dbconfig set_lg_regionmax 262144
dbconfig set_lg_bsize 2097152
dbconfig set_flags DB_LOG_AUTOREMOVE

sizelimit 30
conn_max_pending 400
concurrency 25
threads 25
limits dn="cn=Replicator,ou=Services,dc=clarku,dc=edu" size=none

syncrepl consumer:
database bdb
overlay ppolicy
ppolicy_default "cn=default,ou=policies,dc=clarku,dc=edu"
#readonly on
suffix "dc=clarku,dc=edu"
rootdn "cn=Manager,ou=Services,dc=clarku,dc=edu"
rootpw --snip--
directory /var/lib/ldap
checkpoint 5120 60
cachesize 50000
idlcachesize 150000
dbconfig set_cachesize 0 524288000 1
dbconfig set_lg_regionmax 262144
dbconfig set_lg_bsize 2097152
dbconfig set_flags DB_LOG_AUTOREMOVE

syncrepl rid=001
provider=ldap://nyx.clarku.edu
type=refreshandpersist
searchbase="dc=clarku,dc=edu"
scope=sub
retry= 30 10 120 +
bindmethod=simple
binddn="cn=Replicator,ou=Services,dc=clarku,dc=edu"
credentials=--snip--


best,

Aaron Bennett
Clark University ITS
Dieter Kluenter
2010-03-12 06:54:52 UTC
Permalink
Post by Aaron Bennett
Hi,
openldap-2.3.43
db4-4.3.29
CentOS 5.3
[...]

I wonder that you got it compiled at all, db-4.3 is not supported.

-Dieter
--
Dieter Klünter | Systemberatung
http://dkluenter.de
GPG Key ID:8EF7B6C6
53°37'09,95"N
10°08'02,42"E
Quanah Gibson-Mount
2010-03-12 08:36:04 UTC
Permalink
--On Wednesday, March 10, 2010 4:06 PM -0500 Aaron Bennett
Post by Aaron Bennett
Hi,
openldap-2.3.43
db4-4.3.29
CentOS 5.3
We're observing that syncrepl with refreshandpersist is failing at some
point after an ldap restart. I've written a script to try to figure out
exactly when, but it's sometime within 24 hours.
A slapd restart on the consumer always picks up any pending changes.
database bdb
overlay ppolicy
ppolicy_default "cn=default,ou=policies,dc=clarku,dc=edu"
# readonly on
overlay syncprov
suffix "dc=clarku,dc=edu"
rootdn "cn=Manager,ou=Services,dc=clarku,dc=edu"
rootpw --snip--
directory /var/lib/ldap
checkpoint 5120 60
cachesize 50000
idlcachesize 150000
dbconfig set_cachesize 0 524288000 1
dbconfig set_lg_regionmax 262144
dbconfig set_lg_bsize 2097152
dbconfig set_flags DB_LOG_AUTOREMOVE
sizelimit 30
conn_max_pending 400
concurrency 25
threads 25
limits dn="cn=Replicator,ou=Services,dc=clarku,dc=edu" size=none
database bdb
overlay ppolicy
ppolicy_default "cn=default,ou=policies,dc=clarku,dc=edu"
# readonly on
suffix "dc=clarku,dc=edu"
rootdn "cn=Manager,ou=Services,dc=clarku,dc=edu"
rootpw --snip--
directory /var/lib/ldap
checkpoint 5120 60
cachesize 50000
idlcachesize 150000
dbconfig set_cachesize 0 524288000 1
dbconfig set_lg_regionmax 262144
dbconfig set_lg_bsize 2097152
dbconfig set_flags DB_LOG_AUTOREMOVE
syncrepl rid=001
provider=ldap://nyx.clarku.edu
type=refreshandpersist
searchbase="dc=clarku,dc=edu"
scope=sub
retry= 30 10 120 +
This retry line should be quoted I'd think.

In any case, use delta-syncrepl with 2.3 if you want it to work.

--Quanah

--

Quanah Gibson-Mount
Principal Software Engineer
Zimbra, Inc
--------------------
Zimbra :: the leader in open source messaging and collaboration
Buchan Milne
2010-03-12 08:09:44 UTC
Permalink
Post by Aaron Bennett
Hi,
openldap-2.3.43
db4-4.3.29
The db4-utils package shipped with RHEL5/CentOS 5 has *nothing* to do with the
database library the openldap package uses, please check the output of:

ldd /usr/sbin/slapd|grep db

You should get something like this:
$ ldd /usr/sbin/slapd|grep db
libslapd_db-4.4.so => /usr/lib64/tls/libslapd_db-4.4.so
(0x00002b7dd6dde000)
Post by Aaron Bennett
CentOS 5.3
We're observing that syncrepl with refreshandpersist is failing at some
point after an ldap restart.
There are some syncrepl monitoring scripts. I provide one for Xymon, which
also does some performance trending, and there are some for Nagios.
Post by Aaron Bennett
I've written a script to try to figure out
exactly when, but it's sometime within 24 hours.
A slapd restart on the consumer always picks up any pending changes.
You haven't said anything about your topology. For example, is there a firewall
between the provider and the consumer? syncrepl in 2.3 doesn't recover from a
stale connection.

Regards,
Buchan
Aaron Bennett
2010-03-15 20:18:23 UTC
Permalink
Post by Buchan Milne
You haven't said anything about your topology. For example, is there a firewall
between the provider and the consumer? syncrepl in 2.3 doesn't recover from a
stale connection.
Interesting, we do have iptables running on each host but that's it.
Would iptables itself cause the problem?

Loading...