Stephen D. Cope
2018-03-07
or: using df
safely
(but in red, with flashing)
Typical series of steps to resolve:
ssh root@busyserver
cd /
du | sort -n
cd /var/log
ls -sS | head
gzip wtmp
There is only one correct step shown above. Do not use this pattern.
If you run out of disk space you can't write log files, you can't write spool files, you can't create lock files. Everything stops.
Presuming you do actually want to write any files, which most programs do.
df
vs du
df
will quickly read a file system and tell you how much space is free.
du
will traverse this directory and all sub-directories and count up every file within it: a lot of disk I/O and slow. It will miss held open deleted files.
du -x
Use -x
to make sure you don't traverse across file systems.
We are interested in this file system.
du -xh | sort -hr
Use -h
to make it human-readable.
The -h
option for sort
is a relative newcomer.
| head
because you only want the largest.
Not all of the file exists, so it doesn't use much space.
ls -l
reports the sparse size.
ls -s
and du -h
report the size in use. (This is what matters.)
Hint: wtmp
is a sparse file.
Once deleted, files still retain space until closed.
Open file handles keep the file open: restart the process.
lsof
If your NFS mount has gone AWOL, don't use df
or du
on it.
Processes in state D are waiting for I/O. Good luck with that.
You can use lsof
and look for any file names with log
in the name.
If you roughly know the size of the log files, but don't know where they are, try:
find -xdev -type f -mtime -1 -size +10M
And remember -xdev
is so you don't traverse file systems.
Always use find -print0 | xargs -r0
when piping filenames through.
Don't choke on files with spaces in the names.
You've found the files. What do you do?
Compress them.
Presume the files may not be deleted due to regulatory requirements.
Use another file system:
/dev/shm
- limited by RAM, all contents lost if system restarts
/tmp
- other users writing there, make sure names don't clash, might also be tmpfs
Don't use a remote file system unless you must. The cost of transferring is roughly equivalent to compressing locally.
gzip is fastest, xz is best for uncompressing, bzip2 good but superceded (uses more memory to compress).
You want to compress faster than logs are written, which means, gzip!
Don't impact running tasks: use nice
, and if you don't use isolcpu
add taskset
.
After a file is compressed, the source file is deleted. Oops!
lsof | grep deleted
tail -f /proc/PID/fd/N
Can you send a signal to trigger re-opening of log files?
Legacy thing enshrined in policy and monitoring guidelines.
Maybe it affects fragmentation: is it still a problem where typical file size is much smaller than 1% of a disk's capacity? Flash memory doesn't care about file layout.
What about with massive file systems? Wasted money? Throw away 20% of your storage budget.
Sample now, wait, sample again. Congratulations!
List your log directory, wait, check again.
Simple mathematics.
df -k . ; sleep 60 ; df -k .
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/tipene--vg 200620295 104453868 85905744 55% /
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/tipene--vg 200620295 104978160 85381452 56% /
$ expr 85905744 - 85381452
524292
$ expr 524292 / 60
8738
$ expr 85381452 / 8793
9710
In this example, 3 hours.
ssh busyserver # don't use root
cd /
du -xh | sort -hr | head # don't cross filesystems
cd /misplacedlog
ls -ltr | head # oldest logs first
nice gzip oldlog.log # be nice
Nine largest directories on this file system:
du -xh | sort -hr | head
Shift files into an "archived" sub-directory then compress them.
mkdir -p archived \
&& find -maxdepth 1 -type f -mmin +60 -print0 \
| xargs -r0 mv -t archived/ \
&& nice gzip archived/*.log
Copyright (c) 2018 Stephen D. Cope