Atlassian uses cookies to improve your browsing experience, perform analytics and research, and conduct advertising. Accept all cookies to indicate that you agree to our use of cookies on your device.
Atlassian uses cookies to improve your browsing experience, perform analytics and research, and conduct advertising. Accept all cookies to indicate that you agree to our use of cookies on your device. Atlassian cookies and tracking notice, (opens new window)
The grammatical complexity index (GCI) measure that we use in spaCy has already been found to be highly correlated with other grammatically relevant linguistic variables such as subordination measures and clitic usage. However, it’s also important for us to know exactly what the GCI is measuring within the Spanish transcripts.
The measure is comprised of a ratio of complex dependencies over number of words, as each word is ascribed a dependency. There are two relevant sets of dependency tags for us to be aware of. There are universal and language-specific tags that can be found at this website: https://v2.spacy.io/api/annotation#dependency-parsing
Notably, only languages like English and German have additional tags. For the rest of the languages, these are the universal tags (cells highlighted are those considered complex for GCI):
Summary: This seems to be working, though the majority of the cases are restricted to one specific use within the xcomp definition (“look”, “seem” / “parecer”). So, this seems to be accurate.
For example, “parecen las sierras de montserrat”.
Missing dependencies from list Kesha provided: acomp, csubjpass, pobj, complm, infmod, partmod