Show aging of directories.
The main motivation of this tool is to find out, for a given set of data directories, how much of the storage capacity is still in active use. This information can be used on data processing systems, like High-Performance Computing (HPC) clusters, to motivate users to migrate their unused data to long-term storage facilities.
The file system metadata available for this analysis are the access time and the modification time of the files in the data directory. Using this metadata and given a set of ages, the percentage of accessed and modified capacity of the total can be calculated for each of these ages and directories. Doing this over time, a usage profile of the data directories can be created. Users can be shown, when their data on the system gets stale and should be migrated to a storage facility suited for long-term archival.
Show how much of your directories have been lying around unused:
$ stor-age 90 365 -- ~/media/pics ~/projects
Directory Age Bytes Accessed Percent Modified Percent Files Accessed Percent Modified Percent
/home/user/projects 90 5.8 GiB 5.7 GiB 97.8% 5.6 GiB 95.12% 72596 71582 98.6% 39914 54.98%
365 5.8 GiB 98.85% 5.6 GiB 96.38% 71981 99.15% 48734 67.13%
/home/user/media/pics 90 483.2 MiB 107.0 MiB 22.15% 488.7 kiB 0.1% 2299 328 14.27% 7 0.3%
365 219.1 MiB 45.35% 3.0 MiB 0.63% 2119 92.17% 13 0.57%
Note: The two dashes --
are required because you can supply both
multiple ages and mulitple directories and the command-line argument parser
needs a way to distinguish these two lists.
Iterate over a set of directories with find
-like tools:
find /data/ -mindepth 1 -maxdepth 1 -type d |
stor-age --format prometheus 90 365
The output is in valid Prometheus metric exposition format:
# HELP stor_age_bytes_total Total size in bytes.
# TYPE stor_age_bytes_total gauge
stor_age_bytes_total{dir="/data/foo"} 132904506033
stor_age_bytes_total{dir="/data/bar"} 52451763095
stor_age_bytes_total{dir="/data/baz"} 38525158426
# HELP stor_age_bytes_accessed Accessed size in bytes.
# TYPE stor_age_bytes_accessed gauge
stor_age_bytes_accessed{dir="/data/foo",age="90"} 770700907
stor_age_bytes_accessed{dir="/data/foo",age="365"} 8013210318
stor_age_bytes_accessed{dir="/data/bar",age="90"} 1003231299
stor_age_bytes_accessed{dir="/data/bar",age="365"} 27936338982
stor_age_bytes_accessed{dir="/data/baz",age="90"} 4534759665
stor_age_bytes_accessed{dir="/data/baz",age="365"} 38525158426
# HELP stor_age_bytes_modified Modified size in bytes.
# TYPE stor_age_bytes_modified gauge
stor_age_bytes_modified{dir="/data/foo",age="90"} 3309
stor_age_bytes_modified{dir="/data/foo",age="365"} 8013127399
stor_age_bytes_modified{dir="/data/bar",age="90"} 964846566
stor_age_bytes_modified{dir="/data/bar",age="365"} 4738171482
stor_age_bytes_modified{dir="/data/baz",age="90"} 3641814237
stor_age_bytes_modified{dir="/data/baz",age="365"} 13704189585
# HELP stor_age_files_total Total number of files.
# TYPE stor_age_files_total gauge
stor_age_files_total{dir="/data/foo"} 1913
stor_age_files_total{dir="/data/bar"} 1516
stor_age_files_total{dir="/data/baz"} 2023
# HELP stor_age_files_accessed Accessed number of files.
# TYPE stor_age_files_accessed gauge
stor_age_files_accessed{dir="/data/foo",age="90"} 11
stor_age_files_accessed{dir="/data/foo",age="365"} 262
stor_age_files_accessed{dir="/data/bar",age="90"} 553
stor_age_files_accessed{dir="/data/bar",age="365"} 1402
stor_age_files_accessed{dir="/data/baz",age="90"} 711
stor_age_files_accessed{dir="/data/baz",age="365"} 2023
# HELP stor_age_files_modified Modified number of files.
# TYPE stor_age_files_modified gauge
stor_age_files_modified{dir="/data/foo",age="90"} 2
stor_age_files_modified{dir="/data/foo",age="365"} 250
stor_age_files_modified{dir="/data/bar",age="90"} 553
stor_age_files_modified{dir="/data/bar",age="365"} 1339
stor_age_files_modified{dir="/data/baz",age="90"} 558
stor_age_files_modified{dir="/data/baz",age="365"} 1894
The following list contains the crate features to be enabled via e.g.
cargo build --feature <FEATURES>
. The list contains whether or not the
feature is enabled by default and what the rationale of this feature is.
-
table (default)
Adds an output format, i.e.
--format table
, that pretty-prints the report as a table. This is intended for interactive command-line usage. It is the default output format if this feature is enabled.If you only need the output formats that are useful to be included as metrics in monitoring systems (e.g.
--format prometheus
), you can disable this feature to minimize dependencies. -
spectrum-scale
Adds an optional file system iteration mode specific to IBM Spectrum Scale file systems. This iteration mode uses the
mmapplypolicy
command instead of universal directory traversal (std::fs::read_dir).This can be considerably faster, especially for large directories, because it uses file system internals and can use extensive parallelism. See the respective command-line options in
--help
output for more information.Note: The policies used with
mmapplypolicy
write temporary lists of files to plain text files. These can get quite large, scaling with the amount of files. Depending on where these lists are kept, you will either need large amounts of memory (tmpfs) or disk space, approximately 150 MiB per million files.
Install the stor-age AUR package:
pacaur -S stor-age
cargo install stor-age
git clone https://github.com/idiv-biodiversity/stor-age.git
cd stor-age
cargo build --release
install -Dm755 target/release/stor-age ~/bin/stor-age