5. Language analysis process/overview

5. Language analysis process/overview

  • Prior to running Kesha’s pipeline in SpaCY, typically we need to extract features via CLAN CLAN-derived linguistic features

    • Subsequently, we need to strip the samples of coding from CLAN that SpaCy and Kesha’s pipeline cannot tolerate/ can negatively impact output derived from SpaCy/Kesha’s pipeline. Therefore you will need to strip the samples following these steps CLAN Folder Stripping. The details of what gets remvoed is provided on this page.

  • Current dictionary of features from Kesha’s pipeline: MultilingualFeatureDictionary-Editable, Connected_Speech_Feature_Dictionary_Final

  • How to extract acoustic features: 8. Acoustic Derivations Guide

  • How to extract linguistic features and word-level parameters (both extracted by linguistic pipeline and checked in the Sanity Check pipeline)

  • How to extract CLIP: Multilingual image-text similarity scores Tutorial Guide

  • Current versions stored on MADRlab Repo/Git:



Comments