Vaulting things to tape process

When the write_to_tape directory in the dps volume is 5TB, the Digital Stewardship Technical Associate performs the vault request and the subsequent removal of things from dps.

What you need:

A Unix-like terminal with SSH capabilities. Like a Mac Terminal or Windows Linux Subsystem.

A dps mounted volume in your desktop environment, otherwise secure shell into the development server to access the mounted share.

What you do:

Check the size in bytes in dps write_to_tape roughly equals 5TB.

Figure 1 - Windows Subsytem for Linux, Ubuntu Bash

Figure 2 - Windows properties sheet

⌘ Cmd+I

Send a message to write_to_tape contributors ‘closing write to tape’ and requesting notification when all in progress additions have completed:

When contributors acknowledge they are no longer adding to write_to_tape, check the write_to_tape size again in bytes. If it roughly equals 5TB, then rename the folder to vault. If it exceeds 5TB, then move the first 5TB in write_to_tape, in ascending order by name, to a new directory named vault.

When vault has 5TB, and the dps volume has available capacity for new additions[1], create a write_to_tape folder if it does not exist, then send a message to contributors that write_to_tape is open for business again. Otherwise, run this step when dps has available capacity for new additions.

In your terminal, run in the background a no hang up process for finding all files in /dps/vault, executing a check sum command for each file, and outputting the results to file name dps-vault-manifest.txt in your temp directory. This process takes a while and may run over night, so either shell into dev to run this process, or lock, instead of logging out of, your machine at the end of day.

Example:

# nohup find vault/ -type f -exec cksum {} \; >> /mnt/c/LOCAL/temp/dps-vault-manifest.txt &

Figure 3 - Substitute paths where appropriate

To check your long running process, in the terminal, either run top

and look for the find or cksum command or ps -f | grep cksum

You can view the manifest progress with tail

or in your desktop’s previewer, or watch the file size grow in a finder or file explorer view.

When the process is no longer running, count the lines in the manifest and compare that with the number of files in the vault directory. They should match. Total the number of bytes in the manifest and that should match the total number of bytes in the vault directory.

Find the last tape label used so you can prescribe the next tape label. For example, when you list all the directories starting with misc in /home/vault/data/dcoll, you find the last label used was misc-set85.

The next tape label would be misc-set86. Use this tape label in your vault request.

Submit a vault to tape request to DevOps, with the subject, request vault to tape n files occupying 5TB. Attach the manifest and in the message body, specify the source directory as /dps/vault. Include the next misc-set number as the tape label.

DevOps will rsync from /dps/vault to the vault staging area. If the dps volume is near capacity, request confirmation from DevOps when the first tape copy has been produced. Once the first tape copy has been made, it’s ok to remove the /dps/vault directory, allowing snapshots to clear, and freeing capacity sooner than later.[2] After removing 5TB, it can take 10 calendar days for snapshots to clear, and 5TB of available capacity to appear in the volume. So, after deleting the vault directory send a team message with the subject “5TB removed from DPS” and in the message body include “in about 10 days we can expect to see 5TB available capacity once the snapshots have cleared.”

When Brandon completes a vault request, he will update the tape_inventory spreadsheet with a tape number identifier, i.e. 0236[3] that corresponds to our descriptive tape label i.e. misc-set85-2018-04-03.

In a terminal or text editor cut the SIP directory names from the dps-vault-manifest.txt, sorting them uniquely, and optionally outputting the results to a txt file named tape-number-0nnn.txt.

# cut -f3 dps-vault-manifest.txt | cut -d '/' -f4 | sort -u >> tape-no-0236.txt

Lastly, with the SIPs listed in your terminal or text editor, update the SIPs spreadsheet with the corresponding tape number. Use ‘fill down’ by dragging the cell marquee to fill all adjacent cells:

                      



[1] We generally want 10% (1.7TB) or more available capacity. A useful Unix command to display the capacity of a mounted volume is df –h /dps.

[2] Technically it’s ok to remove /dps/vault after rsync completion, but only do so in critical capacity situations. In very critical situations, we can always request a storage capacity increase through Michael Ackermann as a last resort.

[3] The _01 and _02 that trails a tape number is the tape copy number.