296 N.C. J.L. & TECH. [VOL. 23: 2
3. Processing Tools, NLP Computations, and Exclusions
All decisions were processed through the Simple Natural
Language Processing (“SiNLP”) software application,
which is
freely available for both Mac and Windows operating systems,
to
measure decision length (in words) for each decision. At this stage,
any decisions that contained less than 260 words (including any
remaining front- and back-end text) were excluded from the study
(N = 16).
These decisions were excluded because both the SMOG
and CAREC-M comprehensive readability formulae are intended for
use with larger text samples.
The remaining decisions were then
processed through the Tool for the Automatic Analysis of Lexical
Sophistication (“TAALES”),
which is also freely available for
Scott Crossley et al., Analyzing Discourse Processing Using a Simple
Natural Language Processing Tool (SiNLP), 51 DISCOURSE PROCESSES 511,
520–24 (2014). This application provides seven different simple linguistic
measures, such as number of words, sentences and paragraphs, and average word
and sentence lengths, for all text files processed by the software. See id.
See NLP Tools for the Social Sciences – SiNLP: The Simple Natural
Language Processing Tool, NLP TOOLS FOR THE SOC. SCIS. [hereinafter NLP
Tools for the Social Sciences], https://www.linguisticanalysistools.org/sinlp.html
[https://perma.cc/5MRG-F74N] (last visited Oct. 4, 2021).
From Australia, N=1; from Canada, N=13; from the United States, N=2; and
from both South Africa and the United Kingdom, N=0. The excluded decisions
tended to be ones wherein a lower court’s decision was upheld or overturned by
the apex court in a very short opinion that expressed full agreement with the lower
court (or a judge of that lower court) without further explanation.
SMOG calculations are based on a minimum of thirty sentences of text
(which would equate to approximately 600–900 words of text from a typical
judicial decision). See McLaughlin, supra note 55. CAREC-M calculations are
intended for text samples of more than 200 words. See J.S. Choi & S.A. Crossley,
NLP Tools for the Social Sciences - ARTE: Automatic Readability Tool for
English, NLP TOOLS FOR THE SOC. SCIS., https://www.linguistic
analysistools.org/arte.html [https://perma.cc/E5G9-LT9L] (last visited Oct. 4,
2021). Given that front- and back-end matter comprised approximately 60 words
in many decisions within the present study, a minimum threshold of 260 words
was selected as an inclusion criterion.
Kristopher Kyle, Scott Crossley & Cynthia Berger, The Tool for the
Automatic Analysis of Lexical Sophistication (TAALES): Version 2.0, 50 BEHAV.
RES. METHODS 1030, 1032–37 (2018). This application provides over 250
different linguistic measures, including range and frequency for words and N-
gram from multiple corpora, psycholinguistic properties of words, and many other
related measures, for all text files processed by the software. Id.