Technical
CSV file inputs
The CSV file input will consist of series and sub-series values of the files
The script should be able to take the following inputs (the order of listing in this document is inconsequential):
a CSV file containing a list of series and sub-series values
the CSV file should have a header row at the top by default.
prefix the name of label column with "arrange:<label>"(case-sensitive, exclude the quotes)
the text string following "arrange:<label>" will be used as a property name to query the database
Technical metadata
The technical metadata schema designed concentrates on capturing the technical properties of the files that have been captured by a camera or scanned. These files contain the details of the medical records of the patients from the medical books, patient records, and doctor records.
Requirements for the Python Script
The script should be executable from the command-line, and should take inputs from the command line.
The documents should be available in the database with filename as the key.
Installation of imagemagick in the system.
Interface
The script will be a command-line utility that will serve the requirements as outlined above.
Invocations of the script would look like this:
python3 technical.py [OPTION]... -f CSV -e EXTENSION
CSV refers to the CSV file that can be supplied as argument to process. The ellipsis refers to any switch-specific arguments that might be needed.
OPTION represents a the following set of switches that can be applied to modify the default behavior of the script:
Switch | Argument | Description |
|---|---|---|
-f | Path to CSV file | Path to CSV file for batch processing. |
-e | define file extension | Specify file EXTENSION for files that need to be migrated. |
-q | (none) | Quiet mode. Disables all informational prints. All exception and error related prints will still be output. |
-h | (none) | The script displays a help document on the screen and exits. |
Architecture
Changes in the globalvars.py file to incorporate global variables. Added global variables which is used extensively in the python script.
Build the technical metadatautils file based on the metadata schema. The attributes of the metadata schema is formed in the technical file under metadautils folder which includes the attributes captured from the scanned files and stored in the database.
Behavior and Implementation
The script (technical.py) will perform the following high-level operations:
Parse command line arguments
set variables in accordance with the arguments
inform user about errors in the arguments, print help, and exit
print message if imagemagick is not installed.
Read input CSV file
validate header structure
parse 'arrange' information from header
Read metadata property names from labels.json
store all labels within a Python object
Read controlled vocabulary from vocab.json
store the vocabulary as a Python object
Create a connection to the database
For each row in CSV file:
extract the 'arrange' info
Query the database for all the records of given series and sub-series value in the admin profile.
print the message in error csv with the series and sub-series values if no record is found.
read all the tif (extension passed in the command line) files in the filepath.
For each document in the record list,
if the document is found, check if the technical profile is present
if the profile is present, move on to the next file
if the profile is not present then,
find and store the value of the filename in the premis->eventlist→event{eventType = fileNameChange}
strip the extension from the filename.
create a technical property object for the document.
extract image properties by executing the command "identify -verbose <filename>"
populate the properties extracted with the technical property object.
update document in the database with technical properties.
Output(s)
Print helpful information in a csv file when errors were encountered. Errors to be reported include errors in command usage, as well as any errors encountered while carrying out property extraction. Error csv name: "technical_profile_errors_<timestamp>.csv"
Test cases / Validation
No header in the csv file should print an error.
One entry for series and sub-series value in the csv file.
Multiple entries of series and sub-series values in the csv file.
Document not available in the database should print an error with the filepath and filename.
Technical Profile already present for the document in the database.