Discussion:
syncrepl: large datasets and expediting consumer's initialization
Paul Fardy
2010-04-05 16:58:49 UTC
Permalink
I'm having trouble getting the consumer synced in reasonable time. My
tests were with fewer than 20 entries in the datastore and I saw no
problems.

But we have 260,000 inetOrgPersons (with only a few attributes for
each user: uid cn sn givenName mail userPassword).

I've set up syncrepl:

PROVIDER
# Indices to maintain for this database
index objectclass,entryCSN,entryUUID eq
index ou,cn,mail,surname,givenname eq,sub
index uidNumber,gidNumber,loginShell eq
index uid,memberUid eq,sub
index nisMapName,nisMapEntry eq,sub
overlay syncprov
syncprov-checkpoint 100 1
syncprov-sessionlog 100
limits dn.children="ou=replicators,dc=service,dc=utoronto,dc=ca"
size=unlimited time=unlimited
(I index attributes I'm not currently using. I presume that's not the
problem.)

CONSUMER
syncrepl rid=123
provider=ldap://PROVIDER:389
type=refreshAndPersist
interval=00:00:10:00
retry="60 10 300 +"
searchbase="dc=service,dc=utoronto,dc=ca"
filter="(objectClass=*)"
scope=sub
schemachecking=off
starttls=critical
bindmethod=simple
binddn="uid=replicator,ou=replicators,dc=service,dc=utoronto,dc=ca"
I've tried with and without slapcat/slapadd to initialize the
consumer. On our slower system, slapadd took 98 minutes to rebuild the
database; the faster was 35 minutes (and I have only one consumer
right now).
# < /var/log/daemon egrep 'bdb_add: added id=' | cut -b1-50 | uniq -
c | tail -10
10 Apr 1 10:57:31 ldap2 slapd[3126]: bdb_add: added
11 Apr 1 10:57:32 ldap2 slapd[3126]: bdb_add: added
9 Apr 1 10:57:33 ldap2 slapd[3126]: bdb_add: added
9 Apr 1 10:57:34 ldap2 slapd[3126]: bdb_add: added
10 Apr 1 10:57:35 ldap2 slapd[3126]: bdb_add: added
9 Apr 1 10:57:36 ldap2 slapd[3126]: bdb_add: added
10 Apr 1 10:57:37 ldap2 slapd[3126]: bdb_add: added
10 Apr 1 10:57:38 ldap2 slapd[3126]: bdb_add: added
10 Apr 1 10:57:39 ldap2 slapd[3126]: bdb_add: added
5 Apr 1 10:57:40 ldap2 slapd[3126]: bdb_add: added
With the consumer using the slapadded initial database, syncrepl seems
# tail -10000 /var/log/daemon | egrep 'entry unchanged' | cut -
b1-83 | uniq -c
8 Apr 1 11:30:57 ldap2 slapd[3782]: syncrepl_entry: rid=123
entry unchanged, ignored
15 Apr 1 11:30:58 ldap2 slapd[3782]: syncrepl_entry: rid=123
entry unchanged, ignored
14 Apr 1 11:30:59 ldap2 slapd[3782]: syncrepl_entry: rid=123
entry unchanged, ignored
15 Apr 1 11:31:00 ldap2 slapd[3782]: syncrepl_entry: rid=123
entry unchanged, ignored
15 Apr 1 11:31:01 ldap2 slapd[3782]: syncrepl_entry: rid=123
entry unchanged, ignored
15 Apr 1 11:31:02 ldap2 slapd[3782]: syncrepl_entry: rid=123
entry unchanged, ignored
14 Apr 1 11:31:03 ldap2 slapd[3782]: syncrepl_entry: rid=123
entry unchanged, ignored
14 Apr 1 11:31:04 ldap2 slapd[3782]: syncrepl_entry: rid=123
entry unchanged, ignored
15 Apr 1 11:31:05 ldap2 slapd[3782]: syncrepl_entry: rid=123
entry unchanged, ignored
2 Apr 1 11:31:06 ldap2 slapd[3782]: syncrepl_entry: rid=123
entry unchanged, ignored
I can only hope it'll be done in 5 hours. The datastore isn't active,
so the consumer is up to date, for now, but this needless work is time
consuming.

Is this normal?
What happens when I restart the consumer? Why should I expect it to be
faster restarting?

I changed one entry (adding displayName to my own entry) after the
slapcat, so the consumer did not have the change when the consumer
started 90 minutes ago. That update has not yet propagated.

Can I prime the consumer's syncrepl cookie (if that's an appropriate
term)? Is that a solution? And how would I do that?

Thanks for your time,

Paul
Quanah Gibson-Mount
2010-04-05 18:20:58 UTC
Permalink
--On Monday, April 05, 2010 2:28 PM -0230 Paul Fardy
Post by Paul Fardy
I'm having trouble getting the consumer synced in reasonable time. My
tests were with fewer than 20 entries in the datastore and I saw no
problems.
Have you configured a DB_CONFIG file? Did you use the -q flag with
slapadd? Your load times seem very abnormally long. I can load a 3
million entry LDIF file that's very large with quite a number of indices in
about 2 hours using a correctly tuned system.

--Quanah



--

Quanah Gibson-Mount
Principal Software Engineer
Zimbra, Inc
--------------------
Zimbra :: the leader in open source messaging and collaboration
Paul Fardy
2010-04-06 00:16:53 UTC
Permalink
set_cachesize 0 268435456 1
set_lg_regionmax 262144
set_lg_bsize 2097152
set_lg_dir logs
The filesystem is ext3 on RHEL5.
-q enable quick (fewer integrity checks) mode. Does fewer
consis-
tency checks on the input data, and no consistency checks
when
writing the database. Improves the load time but if any
errors
or interruptions occur the resulting database will be
unusable.
That last bit was enough for me not to use the -q, but it did reduce
load time to 17 minutes.

The performance of slapadd is significant, but what about syncrepl?
Why is the consumer reviewing every object? Reviewing "-q", I discovered
-w write syncrepl context information. After all entries
are
added, the contextCSN will be updated with the greatest
CSN in
the database.
And that looks like an option that would prime my syncrepl info. So

slapadd -q -w -l SLAPCAT.LDIF

took 14 minutes to build and then 3 minutes to close the databases.
This consumer has the same hardware as the provider that took 35
minutes to rebuild the database.

That "slapadd -w" looks like the fix. Would someone confirm or reject
that?

The provider's log file still shows it's reviewing many records. I
guess it's not returning them. Will the log file show the DNs of
results (as opposed to visited)?

I restarted the provider with less logging; logs of a full syncrepl
scans are sucking up disk space. Only 5 or 6 records would have changed.

Is it normal for the provider to visit many (all?) objects even when
the consumer would have a very current CSN?

Thanks for your help,

Paul
m***@aero.polimi.it
2010-04-06 05:14:21 UTC
Permalink
Post by Paul Fardy
set_cachesize 0 268435456 1
set_lg_regionmax 262144
set_lg_bsize 2097152
set_lg_dir logs
The filesystem is ext3 on RHEL5.
-q enable quick (fewer integrity checks) mode. Does fewer
consis-
tency checks on the input data, and no consistency checks
when
writing the database. Improves the load time but if any
errors
or interruptions occur the resulting database will be
unusable.
That last bit was enough for me not to use the -q, but it did reduce
load time to 17 minutes.
The performance of slapadd is significant, but what about syncrepl?
Why is the consumer reviewing every object? Reviewing "-q", I discovered
-w write syncrepl context information. After all entries
are
added, the contextCSN will be updated with the greatest
CSN in
the database.
And that looks like an option that would prime my syncrepl info. So
slapadd -q -w -l SLAPCAT.LDIF
took 14 minutes to build and then 3 minutes to close the databases.
This consumer has the same hardware as the provider that took 35
minutes to rebuild the database.
That "slapadd -w" looks like the fix. Would someone confirm or reject
that?
The provider's log file still shows it's reviewing many records. I
guess it's not returning them. Will the log file show the DNs of
results (as opposed to visited)?
I restarted the provider with less logging; logs of a full syncrepl
scans are sucking up disk space. Only 5 or 6 records would have changed.
Is it normal for the provider to visit many (all?) objects even when
the consumer would have a very current CSN?
If you slapadd to the consumer the output of slapcat from the producer,
the CSNs will be consistent, and no refresh will occur. Did you by chance
slapadd to the consumer a fresh LDIF, with no UUID/CSN information? What
-w does is simply to set the contextCSN to the latest entryCSN found in
the database. If you slapcat from the producer, the suffix entry will
have a valid contextCSN and -w is not needed.

p.
Quanah Gibson-Mount
2010-04-06 05:48:28 UTC
Permalink
--On Monday, April 05, 2010 9:46 PM -0230 Paul Fardy
set_cachesize 0 268435456 1
set_lg_regionmax 262144
set_lg_bsize 2097152
set_lg_dir logs
Are you sure this is sufficient?

That's only 256MB of cache. What is the size of du -c -h *.bdb in the
database directory?

--Quanah


--

Quanah Gibson-Mount
Principal Software Engineer
Zimbra, Inc
--------------------
Zimbra :: the leader in open source messaging and collaboration
Paul Fardy
2010-04-06 05:44:02 UTC
Permalink
Post by m***@aero.polimi.it
If you slapadd to the consumer the output of slapcat from the
producer,
the CSNs will be consistent, and no refresh will occur. Did you by
chance
slapadd to the consumer a fresh LDIF, with no UUID/CSN information?
What
-w does is simply to set the contextCSN to the latest entryCSN found
in
the database. If you slapcat from the producer, the suffix entry will
have a valid contextCSN and -w is not needed.
I'm setting up a highly available LDAP. I ran slapcat on the active
LDAP server and used that as the source of slapadd for the new
producer and its consumers. Every entry in the LDIF has entryUUID,
creatorsName, createTimestamp, entryCSN, modifiersName, and
modifyTimestamp. I expect entryUUID and entryCSN to be sufficient.

The entryCSN is eq-indexed on the producer, so syncrepl a simple filter

entryCSN >= consumer.contextCSN

should efficiently find only new/modified entries.

After "slapadd -w", it looks like the syncrepl works quickly, but the
producer's log file suggests that syncrepl (since I see the
replicator's DN) is visiting a lot of entries that have not changed.

How do I determine which entries are actually returned to the syncrepl
client?

Thanks again,

Paul

Dieter Kluenter
2010-04-05 19:34:46 UTC
Permalink
Am Mon, 5 Apr 2010 14:28:49 -0230
Post by Paul Fardy
I'm having trouble getting the consumer synced in reasonable time.
My tests were with fewer than 20 entries in the datastore and I saw
no problems.
But we have 260,000 inetOrgPersons (with only a few attributes for
each user: uid cn sn givenName mail userPassword).
PROVIDER
# Indices to maintain for this database
index objectclass,entryCSN,entryUUID eq
index ou,cn,mail,surname,givenname eq,sub
index uidNumber,gidNumber,loginShell eq
index uid,memberUid eq,sub
index nisMapName,nisMapEntry eq,sub
overlay syncprov
syncprov-checkpoint 100 1
syncprov-sessionlog 100
limits dn.children="ou=replicators,dc=service,dc=utoronto,dc=ca"
size=unlimited time=unlimited
(I index attributes I'm not currently using. I presume that's not
the problem.)
CONSUMER
syncrepl rid=123
provider=ldap://PROVIDER:389
type=refreshAndPersist
interval=00:00:10:00
retry="60 10 300 +"
searchbase="dc=service,dc=utoronto,dc=ca"
filter="(objectClass=*)"
scope=sub
schemachecking=off
starttls=critical
bindmethod=simple
binddn="uid=replicator,ou=replicators,dc=service,dc=utoronto,dc=ca"
I've tried with and without slapcat/slapadd to initialize the
consumer. On our slower system, slapadd took 98 minutes to rebuild
the database; the faster was 35 minutes (and I have only one
consumer right now).
[...]

What filesystem are you running?

-Dieter
--
Dieter Klünter | Systemberatung
sip: +49.40.20932173
http://www.dpunkt.de/buecher/2104.html
GPG Key ID:8EF7B6C6
Loading...