Monday, February 2, 2009

Puzzling over 'du' (disk usage)

Something (with 'du') that had me puzzled for a while:

I was looking at how much space was taken up by my MacPorts installation (on OS X 10.5.6).
I used the following alias to give me a breakdown by directory:
% alias sudodiskspace='sudo du -x -h -d 1'

% sudodiskspace /opt/local
9.1M /opt/local/bin
284K /opt/local/etc
12M /opt/local/include
83M /opt/local/lib
32K /opt/local/libexec
69M /opt/local/Library
68K /opt/local/sbin
107M /opt/local/share
175M /opt/local/var
455M /opt/local


% sudodiskspace /opt/local/var
1.0M /opt/local/var/cache
447M /opt/local/var/macports
448M /opt/local/var


That seemed strange - above it was reporting that the /opt/local/var directory was taking 175 MB, but now it is reporting that it takes up 448 MB.
I.e. 'du' seemed to be misreporting the space taken by sub-directories.

I thought something must be wrong and even resorted to looking at the source code for 'du' and I was considering building a version of 'du' that I could add debugging statements to.
But I decided to first look closer at what was in the /opt/local/var directory and so I did:
% ls -lR /opt/local/var | more
and scanning through the output, I started to notice that most of the files were listed as having 2 hard-links. Hmmm - that's unusual.
Sure enough, the files under /opt/local/var also occur (via hard-links) under /opt/local/include, /opt/local/lib, /opt/local/share, ...
So 'du' is giving me an accurate breakdown after all. But this points out that the breakdown from 'du' will be somewhat arbitrarily distributed across the sub-directories if they are sharing files via hard-links. (It depends on the order of traversal of the directories.)