Dictionaries

THE REFERENTIAL PROCESS DICTIONARIES

There are three weighted dictionaries currently in use to measure psychological functions of the Referential Process. These are the Weighted Referential Activity Dictionary (WRAD), the Weighted Reflecting/Reorganizing List (WRRL), and the Weighted Arousal List (WRSL), providing measures of the RP functions: Symbolizing/Narrative, Reflecting/Reorganizing and Arousal/Activation, respectively.

These dictionaries were all constructed using different versions of the same overall procedure. A set of texts were chosen, each about 100 – 300 words long, and each scored by judges for relevant dimensions of language style based on their reactions to the texts.  This scoring involved discussions among the judges as to the definitions of the relevant dimensions and computations of reliability. The crucial point here is that the scoring was based on judges’ reactions to text segments as a whole following descriptions in a manual rather than on categories of individual words.  A version of supervised machine learning was then used to generate the words, with their weights, for each of the relevant dictionaries. The judges scored each segment on a scale of 0 to 10, with a score of 5 regarded as neutral. Then, as a final step in the construction of each dictionary, the weights associated with the individual words were adjusted, keeping their order intact,  so as to lie between 0 and 1, and then adjusted further, again keeping their order intact, so that the mean dictionary score of the segments used to make the dictionary was at the neutral value of 0.5.  Details of the constructions of the English versions of the three weighted referential process dictionaries and relevant references are presented in the following sections and in references cited there.  Descriptions of the Italian and Spanish versions are given on separate pages.

Construction of the WRAD.  Referential Activity (RA) of the segments used to construct the WRAD was rated by trained and reliable judges following a scoring manual (Bucci et al. xxx).  Each segment was scored on each of the four scales, Concreteness, Clarity, Specificity and Imagery on a scale of 0 to 10; the overall  RA score of the segment was the mean of these four scale scores.

The first computer measure of RA was the Computerized Referential Activity (CRA) of Mergenthaler and Bucci (1999), which used Mergenthaler’s Text Analysis System (TAS). The CRA consisted of two lists of words; one list of words occurring very frequently in texts rated as high RA; the other a list of words occurring very frequently in texts rated as low in RA. The CRA score of a segment was then computed as the number of words matching the high RA list minus the number of words matching the low RA list (Mergenthaler & Bucci, 1999)

The Weighted Referential Activity Dictionary (WRAD) of Bucci & Maskit (2006) was designed using the Discourse Attributes Analysis Program (DAAP) developed by Maskit as outlined in this website. The DAAP enables the production of weighted dictionaries to model judges’ scoring of segments.   The program identifies the extent to which a word is associated with a particular weight or spread out across the range of possible weights.  Word  used more frequently in segments with a narrow spread of weights are entered into the dictionary with the median of the weights of segments in which it is used as the dictionary value. If there is no such clustering of the weights of segments in which this word is used, then the word is not entered into the dictionary.  The final WRAD developed in this way consists of xxx words.  See Bucci & Maskit (2005) for details concerning this procedure.

Construction of the WRRL. The weighting program used for construction of the Weighted Reflecting/Reorganizing List (WRRL), like the WRAD, was based on judges’ scoring of text segments, but differed from the procedure used for the WRAD in that neither a precise definition of the dimension nor a scoring manual had previously been developed to guide judges’s scoring.   The construction of the WRRL therefore required simultaneous development of the definition of this dimension based on the underlying theory, development of a scoring manual that represents the varying levels of the psychological dimension, and training of judges to achieve reliability.     This development began with a general description of the Reflecting/Reorganizing function as outlined by Bucci (2021, production of a scoring manual through an iterative process involving scoring of a series of items and discussion of these. The process is described in Zhou et al. (2021), which also contains the final version of the scoring manual.

In broad outline, the procedures for constructing this dictionary were as follows. A set of 296 segments was scored by judges for the relevant dimension on a scale of 0 to 10. This set of segments was divided into two parts, the dictionary set consisting of 266 segments and the test set consisting of 30 segments. A list of the distinct words (types) appearing in the dictionary texts was compiled, and for each word the judges’ scores of the segments in which this word appears, including repetitions were listed. Two measures of this list of scores were then computed for each word; the median M, and the spread, S, which is the absolute difference between the first and third quartiles.

A set of tentative dictionaries based on the Spread parameter was constructed, where each word in each tentative dictionary received the Median M as its weight. Each of these tentative dictionaries was used by DAAP to compute a tentative WRRL score for each of the segments in the dictionary set; then the correlation between these tentative WRRL dictionary scores and the judges’ actual scores was computed. The coverage – the proportion of words in the data set that are matched by the tentative dictionary was also recorded. These scores were examined for the tentative dictionary that best maximized both the correlation with judges’ scores and the coverage.

After this maximal tentative dictionary was chosen, several inappropriate words, such as proper nouns, were eliminated and the weights were rescaled so that the dictionary contained a word with weight +1, and it contained a word with weight -1.

It was then observed that the dictionary scores of the segments in the dictionary set had a rather narrow range of values. As a final step, the weights were rescaled so that the mean of the scores of the dictionary segments was at the neutral value of 0.5. Then the weights of words above and below this neutral value were independently rescaled so as to maximize the variance of the set of dictionary scores. The mathematical details of these adjustments will appear in the forthcoming DAAP Technical Manual.

As a final check, after these adjustments were made, the correlation between judges’ scores and dictionary scores was computed for the set of test segments, (n = 30, r = 0.735, and the coverage was 0.812).

Construction of the WRSL. 

The current version of the Weighted Arousal List (WRSL), which contains 242 words or other items, followed procedures similar to that of the development of the WRRL, as briefly described in Tocatly (2021, 2023). The DAAP Technical Manual also contains the details of this construction. It is expected that an enhanced version of the WRSL will be produced in the near future.

UNWEIGHTED DICTIONARIES

The WRRL and WRSL are both relatively new. Previously, the Reflecting/Reorganizing function was measured primarily by the unweighted Reflection Dictionary (REF). The previous measure that was related to the Arousal/Activation dimension was the Disfluency Measure (DF), which is partially defined by an unweighted dictionary as described below, and also counts repeated words and broken words. The unweighted dictionaries were constructed using standard procedures by which judges are given lists of individual words and instructed to select words related to particular content or function categories.  Words for which a specified number of judges agree are then included in the dictionaries. As noted this procedure contrasts with the methods based on judgments of language segments used for our weighted dictionaries.

REF words concern how people think and communicate thoughts. This dictionary includes words referring to cognitive or logical functions (e.g., assume, think, plan) or entities (e.g., reason, cause, consequence); problems or failures of cognitive or logical functions (e.g., confuse); complex verbal communicative functions (e.g., comment, convince, argue, obfuscate); features of mental functioning (e.g., creative, logical). The REF dictionary currently contains 1436 items.

Disfluency (DF) The DF Dictionary contains the disfluent usage of the words kind, know, like, mean and well, The selection of disfluent usage of these words is based on an automatic disambiguation operator, the DAAP Disambiguator,  which separates the usage of each of these words into several categories. The words kind, know and mean are each separated into disfluent and non-disfluent categories; like is separated into three categories, according as it is used as a comparative, a verb or a disfluency; well is separated into four categories according to usage as an adverb, a comparative, a noun referring to storage of water, or a disfluency. The DF dictionary also includes an item consisting of non-word sounds, such as aah and umh which are converted into the single item, mm .  In addition to these items, incomplete or broken words indicated by a word ending in a hyphen, and repeated words are counted by the DAAP program as indicators of disfluency. (See xxx and the DAAP Technical Manual for details concerning the construction and contents of the DF dictionary.

There are several other unweighted dictionaries that are included with the DAAP distribution and that appear to be related to the Referential Process. These include the unweighted affect dictionaries, Positive Affect (AFFP), containing 613 items; Negative Affect (AFFN), containing 1460 items; Neutral Affect (AFFZ), containing 359 items; and Total Affect (AFFS) containing 2,432 items.

The Negation Dictionary (NEG) containing 26 items, such as negative, none and nothing; and the Sensory-Somatic Dictionary (SenS), containing 1,882 items such as juicy, mammography, and masturbation are also included in the DAAP distribution.

The Affect dictionaries, REF and SenS were all produced by having at least three judges rate words in a set of texts for inclusion in these dictionaries. A word was included (or excluded) if at least three judges agreed that it should be. If there was no agreement as to inclusion or exclusion among three judges, then a fourth judge was included in the final determination. The 26 items in the Negation dictionary (NEG) were chosen by a single judge.    There are four subdictionaries of AFFN, and five subdictionaries of AFFP. These subdictionaries were chosen by consensus between two judges.

The subdictionaries of AFFN are Depression (AND), containing 76 words such as crying, drab and grief; Fear (ANF), containing 91 words such as frantic, horror and scary; Hostility (ANH), containing 483 words such as injure, mad and obnoxious; and Pain (ANP), containing 20 words such as damage, grumpy and suffer.

The five subdictionaries of AFFP are Admiration (APA), containing 142 words such as angel, charm and sweet; Happiness (APH), containing 147 words, such as elated, fun and smile; Love (APL), containing 126 words, such as amorous, hug and pleased; Success (APS), containing 47 words, such as brave, prize and valiant; and Wonder (APW), containing 44 words, such as amazing, faith and miracle.

The subdictionaries of AFFN and AFFP are available on request from DAAP@daapwrad.org.

MAKING NEW DICTIONARIES

The DicStore folder contains the current versions of the Referential Process dictionaries, both weighted and unweighted. New dictionaries can be added to this collection provided the files and their names apply the following formatting rules.

An unweighted dictionary is just a list of words, where each word is on a separate line. Words should consist of lowercase letters and numbers; they may also contain dashes and underscores, but no spaces. The dictionary file should be saved in text format (.txt) and contain no blank lines. The name of an unweighted dictionary file must consist only of upper and lower case letters, beginning with an upper case letter in the range A – U. (The remaining letters of the alphabet are reserved for weighted and other special dictionaries.) The name should not contain any dots or periods.

A weighted dictionary is a list of words, consisting of lowercase letters and numbers following all features as above. Each word is followed by exactly one space and then a decimal number between -1 and +1, inclusive. DAAP assigns the weight of 0 to a word not in the dictionary; then linearly transforms the weights so that they lie between 0 and 1, instead of between -1 and +1 and so that the neutral value of 0 is transformed to 0.5. The dictionary file should be saved in text format (.txt) and contain no blank lines. The filename of a weighted dictionary file must consist only of upper and lower case letters, beginning with an upper case letter in the range W – Y; the filename ends with the extension, “.Wt”. The filename should not otherwise contain any dots or periods.