Autodetect noisy files
There are a couple categories of files to consider:
- binaries (e.g. pictures / audio / video)
- files written in natural languages not recognized by the dictionary
Consider keeping stats on a per file / line basis...
Look for new filename extensions?
If this feature trips on files you need scanned, see: