Compliance
CSV file inputs
The CSV file input will have the details of the compliance model.
The script should be able to take the following inputs (the order of listing in this document is inconsequential):
- a CSV file containing a list of compliance details
Compliance metadata
Write about the compliance metadata.
Requirements for the Python Script
- The script should be executable from the command-line, and should take inputs from the command line.
- The documents should be available in the database with series and/or subseries as the key.
Interface
The script will be a command-line utility that will serve the requirements as outlined above.
Invocations of the script would look like this:
python3 compliance.py [OPTION]... -f CSV
CSV refers to the CSV file that can be supplied as argument to process. The ellipsis refers to any switch-specific arguments that might be needed.
OPTION represents a the following set of switches that can be applied to modify the default behavior of the script:
Switch | Argument | Description |
|---|---|---|
| -f | Path to CSV file | Path to CSV file for batch processing. |
| -q | (none) | Quiet mode. Disables all informational prints. All exception and error related prints will still be output. |
| -h | (none) | The script displays a help document on the screen and exits. |
Architecture
- Changes in the globalvars.py file to incorporate global variables. Added global variables which is used extensively in the python script.
- Build the compliance metadatautils file based on the metadata schema. The attributes of the metadata schema is formed in the compliance file under metadautils folder.
Behavior and Implementation
The script (compliance.py) will perform the following high-level operations:
- Parse command line arguments
- set variables in accordance with the arguments
- inform user about errors in the arguments, print help, and exit
- Read CSV file
- validate header structure
- parse 'arrange' information from header for series, sub-series
- parse 'compliance' information from header
- Read metadata property names from labels.json
- store all labels within a Python object
- Read controlled vocabulary from vocab.json
- store the vocabulary as a Python object
- Create a connection to the database
- For each row of series, sub-series combination in the CSV file:
- create a compliance record
- read compliance information from CSV file
- store information from file into an object
- create a compliance property using this information
- Query database for all documents specified by the series, sub-series
- For each document in the list:
- read the document, if profile is there, save information in an object
- attach compliance property to document
- add preservation event for successfully updated documents
- metadata event - include new and old fields
- create a compliance record
Output(s)
Print helpful information in a csv file when errors were encountered. Errors to be reported include errors in command usage, as well as any errors encountered while carrying out property extraction. Error csv name: "compliance_profile_errors_<timestamp>.csv"
Test cases / Validation
- No header in the csv file should print an error.
- Update an existing record with compliance metadata by passing both series and subseries information.
- Update an existing record with compliance metadata by passing only series information.
- Update an existing record with compliance metadata by passing only subseries information.
- Update an existing record with compliance metadata by not passing both series and subseries information.
- Update a record which has already a compliance data.