[Rpm-maint] File fingerprinting
Florian Festi
ffesti at redhat.com
Mon Apr 28 14:39:43 UTC 2008
Hi!
I had some look into the file fingerprinting code. The purpose of this
code is to find file conflicts. The main problem it has to solve is the
fact that a file can be visible at different place in the directory tree
if there are symlinks pointing to any of the parent directories. To
solve this it does check for identical fingerprints of all files with
the same basename.
A finger print consists of the device and inode number of the parent
directory and if this directory doesn't exist yet the finger print of
that down most directory that does exist and the path from that
directory to the file.
Finger prints are used for two things:
1. Check all files in the transaction against each other
To do that the finger prints of all files in the transaction are put
into a big hash table.
2. Check the files in the transaction against those in the rpmdb
For each pkg:
For each file:
search the db for files with the same basename and create the
fingerprints for them
Finger printing actually doesn't work as soon as there are some symlinks
involved that are not installed yet. I attached 3 spec files that show that.
Install the README rpm that then install FOO and FOO-DOC at once. As you
can see /usr/share/README-1/README gets over writen without warning.
If README and FOO are already installed installing FOO-DOC raises an
error as expected.
The packages contain:
README: /usr/share/README-1/README
FOO: /usr/share/FOO-1 -> README-1
FOO-DOC: /usr/share/FOO-1/README
The 2. part is also quite inefficient as it checks the same basename
over and over again for each pkg containing a file with the same
basename (Good candidates are README, Copying, ...). This leads to a
kind of quadratic runtime behavior (installed files * to be installed
files with the same basename). To get rid of this the file finger prints
need to be checked for the whole transaction and not only per pkg. This
requires changing the data structures involved (pointing back from the
installed files to the "to be installed files") and reversing the way
the files are traversed.
This will require that we keep all matching files in memory at once. As
they can be represented (pkg num, file num) + (*pkg, file num) of the to
be installed pkgs this should be possible in a few dozen bytes per file
which means a few MB for some 100,000 files.
Florian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: README.spec
Type: text/x-rpm-spec
Size: 689 bytes
Desc: not available
Url : http://lists.rpm.org/pipermail/rpm-maint/attachments/20080428/ad79b17b/attachment-0003.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: FOO.spec
Type: text/x-rpm-spec
Size: 508 bytes
Desc: not available
Url : http://lists.rpm.org/pipermail/rpm-maint/attachments/20080428/ad79b17b/attachment-0004.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: FOO-DOC.spec
Type: text/x-rpm-spec
Size: 565 bytes
Desc: not available
Url : http://lists.rpm.org/pipermail/rpm-maint/attachments/20080428/ad79b17b/attachment-0005.bin
More information about the Rpm-maint
mailing list