This is the BSDA Study Guide Book written via a
wiki collaboration.
This is a work in progress. You may contribute to or discuss this specific page at http://bsdwiki.reedmedia.net/wiki/Determine_disk_capacity_and_which_files_are_consuming_the_most_disk_space.html.
Determine disk capacity and which files are consuming the most disk space
Concept
- Be able to combine common Unix command line utilities to quickly determine which files are consuming the most disk space.
Introduction
As disk sizes have increased over the years, so have the amount of data that we seem to want to keep on them. At one time or another, you may be faced with the "too much data/not enough space" problem. How can you quickly find the "disk hogs"?
Use the tools!
The BSD systems are full of tools that can assist with this problem, including:
df(1) - "disk free"
du(1) - "disk usage"
find(1) - "walk a file hierarchy"
If you're using NetBSD, you can also get a df type reading from systat(1) . And with any BSD variant, using common "Unix-fu" (in particular, find and shell pipes), these commands can quickly produce useful information about disk usage.
df and du
For a quick summary of disk space, simply call df . Using "-c" with df provides an "overall total"; using "-h" with either df or du produces "human readable" output: that is, calculated into K, M, G (kilobytes, megabytes, gigabyes), etc., instead of "blocks" as indicated by the environment variable $BLOCKSIZE.
Unlike df , you probably don't want to simply call du . Without arguments, du lists the size of every file and subdirectory (and its files and subfiles, ad infinitum) of your CWD, roughly in the order of inodes --- if you happen to be in "/", you'd be a long time reading the output of du . Usually it's better to use du with "-s", possibly even with a specific file or file "glob" argument, or with "-h" and maybe "-c", and pipe the output through sort(1); look for a rather convoluted (yet effective) example below.
du can also read the sizes of files listed to its standard input, which makes find a fairly useful "frontend" to du on occasion (but see the section on find below before you scratch your head too hard on this).
Note: under certain conditions, df and du may disagree somewhat about the amount of free space on a filesystem. Generally, this occurs when a program is holding an open file descriptor to a file that has been unlinked; in such a case, du wouldn't count the file's size, but the blocks are still unavailable as "free blocks" (df ="disk free", remember?) In such cases, you can use fstat(1) to see currently open files.
find and the "size" primary
The complete use of find is beyond the scope of this section; please see Find a file with a given set of attributes for complete information. However, using the "size" primary and an expression representing a given filesize, you can quickly produce a list of "disk hogs". See the Examples below.
Examples
Are any partitions nearing "full"?
$ df
/dev/ad0s1a 1978 977 842 54% /
/dev/ad0s1e 67765 49502 12841 79% /usr
/dev/ad0s1d 3962 2182 1463 60% /var
Display all the *.mp3 files in my homedir, and their sizes with a total:
$ du -sc *mp3 $HOME
List all files in the current directory, in order of size (almost):
$ du -h | sort -n | more
Here's a pretty wild set of pipes for "du", showing the largest disk hogs (unless files are >999MB - if so change "M" to "G" in the regular expression); to see the smallest files, use "head" rather than "tail", or for a complete listing pipe it to $PAGER instead of either. The "-n" option to sort(1) ensures that the filesizes are in numeric rather than alphabetical order:
[root@server][/usr/src]
# du -hc * | sort -n | grep "[0-9]M" | tail
26M crypto
27M contrib/binutils
28M release
40M sys/dev
47M contrib/gcc
105M sys
204M contrib
458M total
But this brings us to the relative power of find(1). A similar report could be produced like this ("find all files in the cwd greater than approximately 900MB in size"):
# find . -size +940000000c
The main difference between this statement's output and that of the "piped arrangement" above is that find doesn't report the actual sizes and the list isn't "sorted". Note that if you're using FreeBSD, you can use "[KMGTP]" with the size designation, thus: "find . -size +900M".
Practice Exercises
- Use
df to see if your hard drives are nearing "full".
- Use
find to find out whom in /home/ is the biggest "disk hog". (Optional: Use grep to see if any of these files are "mp3"s).
- Use
du along with sort(1) and grep(1) to produce lists of files by size.
More information
du(1), df(1), find(1), sort(1), and, for NetBSD systat(1)
|