[Rpm-maint] Fingerprinting skipDir() brokenness ponderings

Panu Matilainen pmatilai at redhat.com
Wed Jun 13 08:33:48 UTC 2007


I suppose pretty much everybody here knows the issue from the subject line 
already, but if not, see the following bugs (and their duplicates) for 
full description:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=140055
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=209306

I personally think this is something we *must* address in 4.4.2.1 somehow. 
It's not like there aren't options, several of them. It just that I'm not 
too happy with any of them. List follows (in no particular order) with 
some pros and cons pointed out:

1) Remove the "temporary" skipDir() hack dating back to 2002 completely.
+ Is really the responsible and right thing to do.
+ Fixes the shared files problems. 
- Memory consumption goes sky-high and performance degrades badly. This
   might not be that much of a problem in most modern systems but for eg
   OLPC is likely to be a showstopper.

2) Further band-aid around skipDir(): disable it on multilib systems
+ Only causes the memory + performance hit on modern systems that are
   likely to survive it
+ Fixes the shared files removal problems where it hits the worst
- Ugly as sin
- Leaves non-multilib systems affected with the problems in some cases
   like the one described in rhbz#140055

3) Apply the findfpexclude patch + hack that uses it to not to skipDir() 
on erase. The pros and cons are largely the same as in 1) with a twist: 
memory use isn't terrible on install but is on erase, so it'd still be 
problematic for low-end systems.

4) Apply findfpexclude + taggedfileindex patches, remove skipDir() hack.
+ Performs extremely well, both from wallclock and memory consumption POV
   in all cases
+ Fixes the shared files removal problems
- Breaks fingerprinting semantics, other concerns raised by jbj in
   https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=140055#c11

5) Fix the fingerprinting algorithm to not to consume ungodly amounts of 
memory.
+ The right thing to do
- If it was that easy, it'd probably had been done already (it's only like
   9 years old...)

6) Get rid of fingerprinting completely.
- Not the kind of change one wants to do for a maintenance release

7) Do nothing.
- "If it ain't broken..." doesn't apply, so this is not really an option at
   all, it doesn't address the issue.

8) Variant of 1-2: make skipDirs runtime configurable, defaulting to empty
+ Default behavior is correct
+ Lets vendors and users tweak it as necessary without rebuilding
- Default behavior performs hideously
- Doesn't really fix the problem, only pushes responsibility elsewhere

Having done a bit of fingerprinting torture-testing, I have to say 4) 
looks very attractive, but deliberately breaking fingerprinting semantics 
on a maintenance release is ... um, not nice. OTOH, the semantics are 
totally broken already because of skipDir() kludgery! So it'd be trading a 
very broken behavior to more correct (not 100% correct) behavior in the 
typical cases while improving performance a lot. It wouldn't seem like a 
bad tradeoff at all, but it doesn't just feel quite right still. And I 
wonder about jbj's concerns like the > 65K files in package in 
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=140055#c11 - what 
exactly happens then with the tagged fileindexes?

Thoughts / opinions? At the moment, only consider 4.4.2.1, we'll probably 
want to revisit this issue afterwards regardless of the decision taken 
now.

 	- Panu -




More information about the Rpm-maint mailing list