Quantcast
Viewing latest article 22
Browse Latest Browse All 25

The power of a good visualization

Image may be NSFW.
Clik here to view.
I just found a program called Grand Perspectivethat present your disk usage as an interactive mipmap (see pic on right). Helping web nerds save hard drive space isn't finding hidden heart defects or keeping planes in the air, but I was struck by how well this program demonstrates the power of intelligent data exploration tools. Here are the Tufte criteria for information presentation:
Documentary · Comparative · Causal · Explanatory · Quantified · Multivariate · Exploratory · Skeptical
Each box is a file, and each top-level directory takes a continuous rectangular portion of the view. Scanning a 350GB disk with a /lot/ of tiny files (5+ million for just the far top left corner, the MLB gameday dataset) took < 5 minutes. You may highlight any box in a segment and navigate "down" to make that segment fill the screen, and may choose to color files by location, depth, name or extension (exploratory, multivariate).
The giant orange box in the top left was 15GB of pure junk -- apparently a CGI-script generating some page I was screenscraping went crazy and sent me 15GB of junk data, the same line repeated almost billions of times. I had /no/ idea it was sitting there. That dataset was supposed to be huge, so I had never drilled into the directory beyond my standard du -sc | sort -n on the containing directory. The picture, however, showed at a glance what a table of numbers dramatically failed to do: that the directory consumed twice as much as it should. The simple metaphor of diskspace=area and the whole-disk view (explanatory, documentary) - highlighted something important I'd never noticed. The giant cluster in the bottom right corner is a huge (~51GB) collection of video ephemera I only kinda cared about. I planned, someday, to sort them -- but for that effort and 51GB usage, it was clearly not worth it. By enforcing comparisons, the data display made me reconsider the value vs. resource consumption of that project and make a more sound decision. In all, I freed up almost 100GB and put a few bucks in his tip jar. Joe Bob says Check it out. (Similar programs exist for Linux (Baobab) and Windows (WinDirStat) too.)

Viewing latest article 22
Browse Latest Browse All 25

Trending Articles