Mathew Rowley
2010-03-18 16:46:05 UTC
I have been running into some huge memory spikes, and was wondering if its
normal, or if anyone has seen it before.
Archtecture:
OpenLDAP 2.4.21 Running on RH4
Ulimit open files upped to 4096
Masters:
auth01.cmc, auth01.inflow
Running n-way multimaster
Slaves:
rsa01.inflow, rsa02.inflow, rsa03.cmc, rsa04.cmc
Syncrepl running off both masters refreshAndPersist with retry=²60 +²
Here is a graph of the spikes we are seeing on auth01.cmc and auth01.inflow
It also looks like the slave servers are only connecting to auth01; when
tcp 0 0 auth01. inflow:ldap rsa04.cmc:46851 ESTABLISHED
tcp 0 0 auth01.inflow:ldap rsa01.inflow:61648 ESTABLISHED
tcp 0 0 auth01.inflow:ldap rsa01.inflow:61686 ESTABLISHED
tcp 0 0 auth01.inflow:ldap rsa01.inflow:61683 ESTABLISHED
tcp 0 0 auth01.inflow:48882 rsa02.inflow.:ldap ESTABLISHED
tcp 0 109500 auth01.inflow:ldap rsa03.cmc:45798 ESTABLISHED
tcp 0 0 auth01.inflow:ldap rsa02.inflow.:8773 ESTABLISHED
tcp 0 0 auth01.cmc:ldap rsa02.inflow:8885 ESTABLISHED
tcp 1 0 auth01.cmc:24310 rsa03.cmc:ldap CLOSE_WAIT
tcp 0 0 auth01.cmc:ldap rsa01.inflow:61657 ESTABLISHED
With refreshAndPersist, shouldn¹t each slave be connected to each host
configured in syncRepl, and keep that connection?
This is a pretty big issue today we had a master crash; got:
Mar 18 14:39:43 auth01 slapd[17498]: bdb(dc=comcast,dc=com): malloc: 16422:
Cannot allocate memory
Mar 18 14:39:43 auth01 slapd[17498]: bdb(dc=comcast,dc=com): malloc: 32000:
Cannot allocate memory
Mar 18 14:39:43 auth01 slapd[17498]: bdb(dc=comcast,dc=com): PANIC: Cannot
allocate memory
Mar 18 14:39:43 auth01 slapd[17498]: bdb(dc=comcast,dc=com): PANIC: fatal
region error detected; run recovery
Mar 18 14:39:43 auth01 slapd[17498]: slap_graduate_commit_csn: removing
0x9b5d3ec8 20100318143943.437380Z#000000#001#000000
Mar 18 14:39:43 auth01 slapd[17498]: bdb(dc=comcast,dc=com):
uniqueMember.bdb: write failed for page 35805
Mar 18 14:39:43 auth01 slapd[17498]: bdb(dc=comcast,dc=com):
uniqueMember.bdb: unable to flush page: 35805
These boxes have 8gig ram; I am trying to figure out if this is normal and I
just need to up the ram.
Thanks for any help in advance.
normal, or if anyone has seen it before.
Archtecture:
OpenLDAP 2.4.21 Running on RH4
Ulimit open files upped to 4096
Masters:
auth01.cmc, auth01.inflow
Running n-way multimaster
Slaves:
rsa01.inflow, rsa02.inflow, rsa03.cmc, rsa04.cmc
Syncrepl running off both masters refreshAndPersist with retry=²60 +²
Here is a graph of the spikes we are seeing on auth01.cmc and auth01.inflow
It also looks like the slave servers are only connecting to auth01; when
tcp 0 0 auth01. inflow:ldap rsa04.cmc:46851 ESTABLISHED
tcp 0 0 auth01.inflow:ldap rsa01.inflow:61648 ESTABLISHED
tcp 0 0 auth01.inflow:ldap rsa01.inflow:61686 ESTABLISHED
tcp 0 0 auth01.inflow:ldap rsa01.inflow:61683 ESTABLISHED
tcp 0 0 auth01.inflow:48882 rsa02.inflow.:ldap ESTABLISHED
tcp 0 109500 auth01.inflow:ldap rsa03.cmc:45798 ESTABLISHED
tcp 0 0 auth01.inflow:ldap rsa02.inflow.:8773 ESTABLISHED
tcp 0 0 auth01.cmc:ldap rsa02.inflow:8885 ESTABLISHED
tcp 1 0 auth01.cmc:24310 rsa03.cmc:ldap CLOSE_WAIT
tcp 0 0 auth01.cmc:ldap rsa01.inflow:61657 ESTABLISHED
With refreshAndPersist, shouldn¹t each slave be connected to each host
configured in syncRepl, and keep that connection?
This is a pretty big issue today we had a master crash; got:
Mar 18 14:39:43 auth01 slapd[17498]: bdb(dc=comcast,dc=com): malloc: 16422:
Cannot allocate memory
Mar 18 14:39:43 auth01 slapd[17498]: bdb(dc=comcast,dc=com): malloc: 32000:
Cannot allocate memory
Mar 18 14:39:43 auth01 slapd[17498]: bdb(dc=comcast,dc=com): PANIC: Cannot
allocate memory
Mar 18 14:39:43 auth01 slapd[17498]: bdb(dc=comcast,dc=com): PANIC: fatal
region error detected; run recovery
Mar 18 14:39:43 auth01 slapd[17498]: slap_graduate_commit_csn: removing
0x9b5d3ec8 20100318143943.437380Z#000000#001#000000
Mar 18 14:39:43 auth01 slapd[17498]: bdb(dc=comcast,dc=com):
uniqueMember.bdb: write failed for page 35805
Mar 18 14:39:43 auth01 slapd[17498]: bdb(dc=comcast,dc=com):
uniqueMember.bdb: unable to flush page: 35805
These boxes have 8gig ram; I am trying to figure out if this is normal and I
just need to up the ram.
Thanks for any help in advance.