Isolating a chronic rpmdb corruption problem.
triss at dreamlibrarian.com
Tue Feb 11 23:50:08 UTC 2014
Let me make it more degenerate; maybe you can help me make sense of whether
this is a side issue.
I sat down and made a tool to run db_verify against all my systems via
mcollective. Large suite of systems, I'm getting a noticeable number of
broken systems even just a few days after I've run a full patch via
spacewalk's direct-rpm method.
Cleaning them up, I noticed _more_ systems showing up as broken.
Then noticed that db_verify's man page says it doesn't do proper locking.
So I turn on auditctl watching for anything touching /var/lib/rpm/*, run
db_verify in a while-1 loop against all tables to see what happens, and
after about a minute (so, 120 calls or so to db_verify later), it starts
smoking, segfaulting while opening __db.002.
This whole story is giving me the proper horrors of someone who's seeing
how the sausage is made for the first time.
It does raise two more questions:
1) Should I expect that db_verify does the opposite of what it's supposed
to do now and again - that is, should it, all by its lonesome, occasionally
ruin the __db files?
2) Is there a proper way that doesn't have this no-lock-ruins-everything
risk for me to check on the current health of an RPM database?
On Mon, Feb 10, 2014 at 2:35 PM, Tristan Smith <triss at dreamlibrarian.com>wrote:
> Hiya, folks.
> I'm having a bit of a time in my CentOS 6 environment with what I'm
> guessing is some kind of knuckleheaded behavior in one or more of my
> We have Spacewalk and Puppet working in general harmony, but I have a
> chronic issue with a significant percentage (call it... 10%) of my hosts
> turning up with rpmdb problems on a regular basis. Not the same hosts
> every time, either. There's some correlation I'm drawing to relatively idle
> systems, but it may be BS.
> When yum tries to install on a borked systems, I get error 12s; db_verify
> comes up with 'Cannot allocate memory' for Basenames (or sometimes just
> Packages). rpm --rebuilddb almost universally makes them okayish again,
> but not entirely; I'm enjoying lost dependencies here and there (yum check
> dependencies crying into its beer a lot, and I've got an xargs nightmare to
> re-install the missing packages)
> Basically, I've got a handle on an ever lengthening list of mitigation
> methods, but what I can't seem to isolate is whodunit. I have no idea
> what's reaching into the DB hamfisted and making a mess quite so often.
> Does anyone have suggestions as to what in hell I should be doing to
> narrow down causes?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Rpm-list