Discussion:
Memory Usage Spikes
Mathew Rowley
2010-03-18 16:46:05 UTC
Permalink
I have been running into some huge memory spikes, and was wondering if its
normal, or if anyone has seen it before.

Archtecture:
OpenLDAP 2.4.21 Running on RH4
Ulimit open files upped to 4096

Masters:
auth01.cmc, auth01.inflow
Running n-way multimaster

Slaves:
rsa01.inflow, rsa02.inflow, rsa03.cmc, rsa04.cmc
Syncrepl running off both masters refreshAndPersist with retry=²60 +²

Here is a graph of the spikes we are seeing on auth01.cmc and auth01.inflow


It also looks like the slave servers are only connecting to auth01; when
tcp 0 0 auth01. inflow:ldap rsa04.cmc:46851 ESTABLISHED
tcp 0 0 auth01.inflow:ldap rsa01.inflow:61648 ESTABLISHED
tcp 0 0 auth01.inflow:ldap rsa01.inflow:61686 ESTABLISHED
tcp 0 0 auth01.inflow:ldap rsa01.inflow:61683 ESTABLISHED
tcp 0 0 auth01.inflow:48882 rsa02.inflow.:ldap ESTABLISHED
tcp 0 109500 auth01.inflow:ldap rsa03.cmc:45798 ESTABLISHED
tcp 0 0 auth01.inflow:ldap rsa02.inflow.:8773 ESTABLISHED
tcp 0 0 auth01.cmc:ldap rsa02.inflow:8885 ESTABLISHED
tcp 1 0 auth01.cmc:24310 rsa03.cmc:ldap CLOSE_WAIT
tcp 0 0 auth01.cmc:ldap rsa01.inflow:61657 ESTABLISHED


With refreshAndPersist, shouldn¹t each slave be connected to each host
configured in syncRepl, and keep that connection?


This is a pretty big issue ­ today we had a master crash; got:

Mar 18 14:39:43 auth01 slapd[17498]: bdb(dc=comcast,dc=com): malloc: 16422:
Cannot allocate memory
Mar 18 14:39:43 auth01 slapd[17498]: bdb(dc=comcast,dc=com): malloc: 32000:
Cannot allocate memory
Mar 18 14:39:43 auth01 slapd[17498]: bdb(dc=comcast,dc=com): PANIC: Cannot
allocate memory
Mar 18 14:39:43 auth01 slapd[17498]: bdb(dc=comcast,dc=com): PANIC: fatal
region error detected; run recovery
Mar 18 14:39:43 auth01 slapd[17498]: slap_graduate_commit_csn: removing
0x9b5d3ec8 20100318143943.437380Z#000000#001#000000
Mar 18 14:39:43 auth01 slapd[17498]: bdb(dc=comcast,dc=com):
uniqueMember.bdb: write failed for page 35805
Mar 18 14:39:43 auth01 slapd[17498]: bdb(dc=comcast,dc=com):
uniqueMember.bdb: unable to flush page: 35805


These boxes have 8gig ram; I am trying to figure out if this is normal and I
just need to up the ram.

Thanks for any help in advance.
Quanah Gibson-Mount
2010-03-18 19:22:32 UTC
Permalink
--On Thursday, March 18, 2010 10:46 AM -0600 Mathew Rowley
Post by Mathew Rowley
I have been running into some huge memory spikes, and was wondering if
its normal, or if anyone has seen it before.
How many entries are in your database? What is the size of your database
on disk? What are your cachesize settings for BDB? What are your cachesize
settings for OpenLDAP (cachesize, idlcachesize, dncachesize)?

--Quanah

--

Quanah Gibson-Mount
Principal Software Engineer
Zimbra, Inc
--------------------
Zimbra :: the leader in open source messaging and collaboration
Quanah Gibson-Mount
2010-03-18 19:56:22 UTC
Permalink
--On Thursday, March 18, 2010 3:47 PM -0400 "Beuerlein, Edward"
Hi Quanah,
Thanks for replying! Please let us know what other info you might need
to help us figure out these large increases in memory usage.
I did a dump of the database and then ran this on it(please let me know
# cat auth01.031610.ldif |grep dn |wc -l
60753
So you have 60,753 entries. Just to be sure, I would have used grep ^dn:
to be exact. ;)
set_cachesize 0 268435456 1
You have 256MB allocated to BDB
69M cn.bdb
26M dn2id.bdb
3.7M entryCSN.bdb
2.2M entryUUID.bdb
160K gidNumber.bdb
2.1G id2entry.bdb
840K ipHostNumber.bdb
44K memberNisNetgroup.bdb
304K memberUid.bdb
3.1M nisNetgroupTriple.bdb
2.9M objectClass.bdb
8.0K ou.bdb
8.0K sudoUser.bdb
3.0M uid.bdb
364K uidNumber.bdb
792M uniqueMember.bdb
2.9G total
Your BDB database is 2.9GB.
olcDbCacheSize 1000
olcDbCacheFree 1
olcDbDNcacheSize 0
olcDBIDLcacheSize 1000
You're only allowing the first 1000 entries to be cached in OpenLDAP, and
your IdlCache is quite small as well. dncachesize is fine (unlimited).

While none of this explains your memory spikes, your server is definitely
poorly tuned. It may be that your massive discrepancy between BDB cache
allocation and actual BDB size may be causing the problem (in looking at
the bdb error messages in your log).

I would highly advise the following changes.

For DB_CONFIG

set_cachesize 4 0 1

This will increase the BDB cache to 4GB (which can hold your 2.9GB DB
easily)

For cn=config:

olcDbCacheSize 80000
olcDbCacheFree 1000
olcDBIDLcacheSize 240000

This will allow all entries to be cached in OpenLDAP, free up a reasonable
number if you exceed the cache, and allow a decent IDL size.

I forgot to ask which slapd backend you are using (hdb or bdb), and which
version of BDB you are using (and if it is fully patched) which is also
useful information.

You may also wish to read over
<http://wiki.zimbra.com/index.php?title=OpenLDAP_Performance_Tuning_6.0>, I
think you guys have some experience with Zimbra... ;)

--Quanah

--

Quanah Gibson-Mount
Principal Software Engineer
Zimbra, Inc
--------------------
Zimbra :: the leader in open source messaging and collaboration
Mathew Rowley
2010-03-18 20:02:55 UTC
Permalink
Thanks for the suggestions...

Bdb, BerkelyDB-4.7.25, yes, fully patched.

MAT
Post by Quanah Gibson-Mount
--On Thursday, March 18, 2010 3:47 PM -0400 "Beuerlein, Edward"
Hi Quanah,
Thanks for replying! Please let us know what other info you might need
to help us figure out these large increases in memory usage.
I did a dump of the database and then ran this on it(please let me know
# cat auth01.031610.ldif |grep dn |wc -l
60753
to be exact. ;)
set_cachesize 0 268435456 1
You have 256MB allocated to BDB
69M cn.bdb
26M dn2id.bdb
3.7M entryCSN.bdb
2.2M entryUUID.bdb
160K gidNumber.bdb
2.1G id2entry.bdb
840K ipHostNumber.bdb
44K memberNisNetgroup.bdb
304K memberUid.bdb
3.1M nisNetgroupTriple.bdb
2.9M objectClass.bdb
8.0K ou.bdb
8.0K sudoUser.bdb
3.0M uid.bdb
364K uidNumber.bdb
792M uniqueMember.bdb
2.9G total
Your BDB database is 2.9GB.
olcDbCacheSize 1000
olcDbCacheFree 1
olcDbDNcacheSize 0
olcDBIDLcacheSize 1000
You're only allowing the first 1000 entries to be cached in OpenLDAP, and
your IdlCache is quite small as well. dncachesize is fine (unlimited).
While none of this explains your memory spikes, your server is definitely
poorly tuned. It may be that your massive discrepancy between BDB cache
allocation and actual BDB size may be causing the problem (in looking at
the bdb error messages in your log).
I would highly advise the following changes.
For DB_CONFIG
set_cachesize 4 0 1
This will increase the BDB cache to 4GB (which can hold your 2.9GB DB
easily)
olcDbCacheSize 80000
olcDbCacheFree 1000
olcDBIDLcacheSize 240000
This will allow all entries to be cached in OpenLDAP, free up a reasonable
number if you exceed the cache, and allow a decent IDL size.
I forgot to ask which slapd backend you are using (hdb or bdb), and which
version of BDB you are using (and if it is fully patched) which is also
useful information.
You may also wish to read over
<http://wiki.zimbra.com/index.php?title=OpenLDAP_Performance_Tuning_6.0>, I
think you guys have some experience with Zimbra... ;)
--Quanah
--
Quanah Gibson-Mount
Principal Software Engineer
Zimbra, Inc
--------------------
Zimbra :: the leader in open source messaging and collaboration
Loading...