Technical

Technical

CSV file inputs

The CSV file input will consist of series and sub-series values of the files

The script should be able to take the following inputs (the order of listing in this document is inconsequential):

  1. a CSV file containing a list of series and sub-series values
    1. the CSV file should have a header row at the top by default.
    2. prefix the name of label column with "arrange:<label>"(case-sensitive, exclude the quotes)
    3. the text string following "arrange:<label>" will be used as a property name to query the database

Technical metadata

The technical metadata schema designed concentrates on capturing the technical properties of the files that have been captured by a camera or scanned. These files contain the details of the medical records of the patients from the medical books, patient records, and doctor records.

Requirements for the Python Script

  1. The script should be executable from the command-line, and should take inputs from the command line.
  2. The documents should be available in the database with filename as the key.
  3. Installation of imagemagick in the system.

Interface

The script will be a command-line utility that will serve the requirements as outlined above.

Invocations of the script would look like this:

python3 technical.py [OPTION]... -f CSV -e EXTENSION

CSV refers to the CSV file that can be supplied as argument to process. The ellipsis refers to any switch-specific arguments that might be needed.

OPTION represents a the following set of switches that can be applied to modify the default behavior of the script:

SwitchArgumentDescription
-fPath to CSV filePath to CSV file for batch processing.
-edefine file extensionSpecify file EXTENSION for files that need to be migrated.
-q(none)Quiet mode. Disables all informational prints. All exception and error related prints will still be output.
-h(none)The script displays a help document on the screen and exits.

Architecture

  1. Changes in the globalvars.py file to incorporate global variables. Added global variables which is used extensively in the python script.
  2. Build the technical metadatautils file based on the metadata schema. The attributes of the metadata schema is formed in the technical file under metadautils folder which includes the attributes captured from the scanned files and stored in the database.

Behavior and Implementation

The script (technical.py) will perform the following high-level operations:

  1. Parse command line arguments
    1. set variables in accordance with the arguments
    2. inform user about errors in the arguments, print help, and exit
    3. print message if imagemagick is not installed.
  2. Read input CSV file
    1. validate header structure
    2. parse 'arrange' information from header
  3. Read metadata property names from labels.json
    1. store all labels within a Python object
  4. Read controlled vocabulary from vocab.json
    1. store the vocabulary as a Python object
  5. Create a connection to the database
  6. For each row in CSV file:
    1. extract the 'arrange' info
    2. Query the database for all the records of given series and sub-series value in the admin profile.
    3. print the message in error csv with the series and sub-series values if no record is found.
    4. read all the tif (extension passed in the command line) files in the filepath.
    5. For each document in the record list,
      1. if the document is found, check if the technical profile is present
      2. if the profile is present, move on to the next file
      3. if the profile is not present then,
        1. find and store the value of the filename in the premis->eventlist→event{eventType = fileNameChange}
        2. strip the extension from the filename.
        3. create a technical property object for the document.
        4. extract image properties by executing the command "identify -verbose <filename>"
        5. populate the properties extracted with the technical property object.
        6. update document in the database with technical properties.

Output(s)

Print helpful information in a csv file when errors were encountered. Errors to be reported include errors in command usage, as well as any errors encountered while carrying out property extraction. Error csv name: "technical_profile_errors_<timestamp>.csv"

Test cases / Validation

  1. No header in the csv file should print an error.
  2. One entry for series and sub-series value in the csv file.
  3. Multiple entries of series and sub-series values in the csv file.
  4. Document not available in the database should print an error with the filepath and filename.
  5. Technical Profile already present for the document in the database.