<p>Let me preface this by saying we are doing something unorthodox: we are running RPM 4.12.90 on MacOS 10.12.</p>
<p>It turns out that on Linux, querying and writing to the database can cause corruption. On MacOS, just querying in parallel can cause it. We can replicate it by doing <code>for i in {1..30}; do /bin/rpm -qa & done</code>. I have some info about how and why this happens. Using sandbox-exec, I was able to trace what <code>rpm -qa</code> does and what <code>rpm --rebuilddb</code> does to fix corruption.</p>
<p>Bdb <code>mmaps</code> regions of the db to increase performance, but then backs the regions using the filesystem. I'm not sure why it does this, as I would imagine mmap already takes care of flushing changes back to the db. Perhaps the db regions are "decompressed" and more performant? Source: <a href="https://web.stanford.edu/class/cs276a/projects/docs/berkeleydb/ref/env/region.html">https://web.stanford.edu/class/cs276a/projects/docs/berkeleydb/ref/env/region.html</a><br>
What is happening is that <code>rpm -qa</code> is actually writing to the files of these file-backed mmaped regions:</p>
<pre><code>[root@redacted ~]# grep write /tmp/trace/trace_output.sb
(allow file-write-data (path "/dev/dtracehelper"))
(allow sysctl-write (sysctl-name "kern.procname"))
(allow file-write-data (path "/opt/yum/var/lib/rpm/.dbenv.lock"))
(allow file-write-data (path "/opt/yum/var/lib/rpm/__db.001"))
(allow file-write-data (path "/opt/yum/var/lib/rpm/__db.001"))
(allow file-write-data (path "/opt/yum/var/lib/rpm/__db.002"))
(allow file-write-data (path "/opt/yum/var/lib/rpm/__db.003"))
(allow file-write-data (path "/opt/yum/var/lib/rpm/__db.004"))
(allow file-write-data (path "/opt/yum/var/lib/rpm/.dbenv.lock"))
</code></pre>
<p>The way <code>rpm --rebuilddb</code> fixes this is by unlinking the regions:</p>
<pre><code>(allow file-write-unlink (path "/opt/yum/var/lib/rpm/__db.001"))
(allow file-write-unlink (path "/opt/yum/var/lib/rpm/__db.002"))
(allow file-write-unlink (path "/opt/yum/var/lib/rpm/__db.003"))
(allow file-write-unlink (path "/opt/yum/var/lib/rpm/__db.004"))
</code></pre>
<p>Turns out if you unlink them by hand, it also fixes the corruption. I haven't figured out why the corrupted regions don't flush their changes to the real db, corrupting that as well.</p>
<p>I've written a sandbox profile that disallows writes to the file-backed mmaped regions. This means that we can call <code>sandbox-exec $sandbox_profile rpm -qa</code> to safely read, with zero chance of corrupting the db:</p>
<pre><code>[root@redacted ~]# sandbox-exec -f rpm-query-nowrite.sb -- /bin/rpm -qa &>/dev/null
[root@redacted ~]# ls -la /var/lib/rpm/__db.00*
-rw-r--r--  1 root  root    24576 Jun  7 10:19 /var/lib/rpm/__db.001
-rw-r--r--  1 root  root   507904 Jun  7 10:19 /var/lib/rpm/__db.002
-rw-r--r--  1 root  root  1318912 Jun  7 10:19 /var/lib/rpm/__db.003
-rw-r--r--  1 root  root   811008 Jun  7 10:19 /var/lib/rpm/__db.004
[root@redacted ~]# sandbox-exec -f rpm-query-nowrite.sb -- /bin/rpm -qa &>/dev/null
[root@redacted ~]# ls -la /var/lib/rpm/__db.00*
-rw-r--r--  1 root  root    24576 Jun  7 10:19 /var/lib/rpm/__db.001
-rw-r--r--  1 root  root   507904 Jun  7 10:19 /var/lib/rpm/__db.002
-rw-r--r--  1 root  root  1318912 Jun  7 10:19 /var/lib/rpm/__db.003
-rw-r--r--  1 root  root   811008 Jun  7 10:19 /var/lib/rpm/__db.004
</code></pre>
<p>Is it possible there is a bug in the way you file-back your mmap'ed regions?</p>

<p style="font-size:small;-webkit-text-size-adjust:none;color:#666;">—<br />You are receiving this because you are subscribed to this thread.<br />Reply to this email directly, <a href="https://github.com/rpm-software-management/rpm/issues/232">view it on GitHub</a>, or <a href="https://github.com/notifications/unsubscribe-auth/ANb80z2TuScSOzbEx4RgH_-yf10sYr4Eks5sBuhagaJpZM4NzFoB">mute the thread</a>.<img alt="" height="1" src="https://github.com/notifications/beacon/ANb807xDtXRYVAlmKk25ejSzkY5uA5chks5sBuhagaJpZM4NzFoB.gif" width="1" /></p>
<div itemscope itemtype="http://schema.org/EmailMessage">
<div itemprop="action" itemscope itemtype="http://schema.org/ViewAction">
  <link itemprop="url" href="https://github.com/rpm-software-management/rpm/issues/232"></link>
  <meta itemprop="name" content="View Issue"></meta>
</div>
<meta itemprop="description" content="View this Issue on GitHub"></meta>
</div>

<script type="application/json" data-scope="inboxmarkup">{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/rpm-software-management/rpm","title":"rpm-software-management/rpm","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/rpm-software-management/rpm"}},"updates":{"snippets":[{"icon":"DESCRIPTION","message":"Rpm query causes corruption in the file-backed mmaped bdb regions (#232)"}],"action":{"name":"View Issue","url":"https://github.com/rpm-software-management/rpm/issues/232"}}}</script>