Editor Introduction: This chapter elaborates the development of the most general polytomous IRT model covered in this book. It is the only model in this book that does not assume ordered polytomous response data and can therefore be used to measure traits and abilities with items that have unordered response categories. It can be used to identify the empirical ordering of response categories where that ordering is unknown a priori but of interest, or it can be used to check whether the expected ordering of response categories is supported in data. The authors present a new parameterization of this model that may serve to expand the model and to facilitate a more widespread use of the model. Also discussed are various derivations of the model and its relationship to other models. The chapter concludes with a special section by Bock, where he elaborates on the background of the nominal model.
Editor Introduction: This chapter elaborates the development of the most general polytomous IRT model covered in this book. It is the only model in this book that does not assume ordered polytomous response data and can therefore be used to measure traits and abilities with items that have unordered response categories. It can be used to identify the empirical ordering of response categories where that ordering is unknown a priori but of interest, or it can be used to check whether the expected ordering of response categories is supported in data. The authors present a new parameterization of this model that may serve to expand the model and to facilitate a more widespread use of the model. Also discussed are various derivations of the model and its relationship to other models. The chapter concludes with a special section by Bock, where he elaborates on the background of the nominal model.
The nominal categories model (Bock, 1972, 1997) was originally proposed shortly after Samejima (1969, 1997) described the first general item response theory (IRT) model for polytomous responses. Samejima’s graded models (in normal ogive and logistic form) were designed for item responses that have some a priori order as they relate to the latent variable being measured (θ); the nominal model was designed for responses with no predetermined order.
Samejima (1969) illustrated the use of the graded model with the analysis of data from multiplechoice items measuring academic proficiency. The weakness of the use of a graded model for that purpose arises from the fact that the scoring order, or relative degree of correctness, of multiplechoice response alternatives can only rarely be known a priori. That was part of the motivation for the development of the nominal model. Bock’s (1972) presentation of the nominal model also used multiplechoice items measuring vocabulary to illustrate its application. Ultimately, neither Samejima’s (1969, 1997) graded model nor Bock’s (1972, 1997) nominal model has seen widespread use as a model for the responses to multiplechoice items, because, in addition to the aforementioned difficulty prespecifying order for multiplechoice alternatives, neither the graded nor the nominal model makes any provision for guessing. Elaborating a suggestion by Samejima (1979), Thissen and Steinberg (1984) described a generalization of the nominal model that does take guessing into account, and that multiplechoice model is preferable if IRT analysis of all of the response alternatives for multiplechoice items is required.
Nevertheless, the nominal model is in widespread use in item analysis and test scoring. The nominal model is used for three purposes: (1) as an item analysis and scoring method for items that elicit purely nominal responses, (2) to provide an empirical check that items expected to yield ordered responses have actually done so (Samejima, 1988, 1996), and (3) to provide a model for the responses to testlets. Testlets are sets of items that are scored as a unit (Wainer & Kiely, 1987); often testlet response categories are the patterns of response to the constituent items, and those patterns are rarely ordered a priori.
Bock’s (1972) original formulation of the nominal model wasin which T, the curve tracing the probability that the item response u is in category k is a function of the latent variable θ with vector parameters a and c. In what follows we will often shorten the notation for the trace line to T _{(k)}, and in this presentation we number the response alternatives k = 0,1,…,m − 1 for an item with m response categories. The model itself is the socalled multivariate logistic function, with arguments in which z_{k} is a response process (value) for category k, which is a (linear) function of θ with slope parameter a_{k} and intercept c_{k} . Equations 3.1 and 3.2 can be combined and made more compact as
As stated in Equation 3.3, the model is twice not identified: The addition of any constant to either all of the a_{k}s or all of the c_{k}s yields different parameter sets but the same values of T(k). As identification constraints, Bock (1972) suggestedimplemented by reparameterizing, and estimating the parameter vectors α and γ usingin which “deviation” contrasts from the analysis of variance were used:
With the T matrices defined as in Equation 3.6, the vectors (of length m − 1) α and β may take any value and yield vectors a and c with elements that sum to zero. As is the case in the analysis of variance, other contrast (T) matrices may be used as well (see Thissen and Steinberg (1986) for examples); for reasons that will become clear, in this presentation we will use systems that identify the model with the constraints instead of the original a _{0} = c _{0} = 0 identification constraints.
Figure 3.1 shows four sets of trace lines that illustrate some of the range of variability of item response functions that can be obtained with the nominal
Figure 3.1 Upper left: Trace lines for an artificially constructed fouralternative item. Upper right: Trace lines for the “Identify” testlet described by Thissen and Steinberg (1988). Lower left: Trace lines for the number correct on questions following a passage on a reading comprehension test, using parameter estimates obtained by Thissen, Steinberg, and Mooney (1989). Lower right: Trace lines for judgescored constructedresponse item M075101 from the 1996 administration of the NAEP mathematics assessment
model. The corresponding values of the parameter vectors a and c are shown in Table 3.1.
The curves in the upper left panel of Figure 3.1 artificially illustrate a maximally ordered, centered set of item responses: As seen in the leftmost two columns of Table 3.1 (for Item 1) the values of a_{k} increase by 1.0 as k increases; as we will see in a subsequent section, that produces an ordered variant of the nominal model. All of the values of c_{k} are identically 0.0, so the trace lines all cross at that value of θ. The upper right panel of Figure 3.1
Response Category (k) 
Item 1 
Item 2 
Item 3 
Item 4 


a 
c 
a 
c 
a 
c 
a 
c 

0 
0.0 
0.0 
0.0 
0.0 
0.0 
0.0 
0.00 
0.0 
1 
1.0 
0.0 
0.0 
−0.9 
0.2 
0.5 
0.95 
1.2 
2 
2.0 
0.0 
1.1 
−0.7 
0.7 
1.8 
1.90 
0.2 
3 
3.0 
0.0 
2.7 
0.7 
1.3 
3.0 
2.85 
−1.4 
4 
2.2 
3.3 
3.80 
−2.7 
shows trace lines that correspond to parameter estimates (marked Item 2 in Table 3.1) obtained by Thissen and Steinberg (1988) (and subsequently by Hoskens and Boeck (1997); see Baker and Kim (2004) for the details of maximum marginal likelihood parameter estimation) for a testlet comprising two items from Bergan and Stone’s (1985) data obtained with a test of preschool mathematics proficiency. The two items required the child to identify the numerals 3 and 4; the curves are marked 0 for neither identified, 1 for 3 identified but not 4, 2 for 4 identified but not 3, and 3 for both identified correctly. This is an example of a testlet with semiordered responses: The 0 and 1 curves are proportional because their a_{k} estimates are identical, indicating that, except for an overall difference in probability of endorsement, they have the same relation to proficiency: Both may be taken as incorrect. If a child can identify 4 but not 3 (the 2 curve), that indicates a moderate, possibly developing, degree of mathematical proficiency, and both correct (the 3 curve) increases as θ increases.
The lower left panel of Figure 3.1 shows trace lines that correspond to parameter estimates (marked Item 3 in Table 3.1) obtained by Thissen, Steinberg, and Mooney (1989) fitting the nominal model to the numbercorrect score for the questions following each of four passages on a reading comprehension test. Going from left to right, the model indicates that the responses are increasingly ordered for this numbercorrect scored testlet: Summed scores of 0 and 1 have nearly the same trace lines, because 0 (of 4) and 1 (of 4) are both scores that can be obtained with nearly equal probability by guessing on fivealternative multiplechoice items. After that, the trace lines look increasingly like those of a graded model. The lower right panel of Figure 3.1 is for a set of graded responses: It shows the curves that correspond to the parameter estimates for an extended constructed response mathematics item administered as part of the National Assessment of Educational Progress (NAEP) (Allen, Carlson, & Zelenak, 1999). The judged scores (from 0 to 4) were fitted with Muraki’s (1992, 1997) generalized partial credit (GPC) model, which is a constrained version of the nominal model. In Table 3.1, the parameters for this item (Item 4 in the two rightmost columns) have been converted into values of a_{k} and c_{k} for comparability with the other items’ parameters. The GPC model is an alternative to Samejima’s (1969, 1997) graded model for such ordered responses; the two models generally yield very similar trace lines for the same data. In subsequent sections of this chapter we will discuss the relation between the GPC and nominal models in more detail.
There are several lines of reasoning that lead to Equation 3.3 as an item response model. In this section we describe three kinds of theoretical argument that lead to the nominal model as the result, because they exist, and because different lines of reasoning appeal to persons with different backgrounds.
Certainly the simplest development of the nominal model is essentially atheoretical, treating the problem as abstract statistical model creation. To do this, we specify only the most basic facts: that we have categorical item responses in several (>2) categories, that we believe those item responses depend on some latent variable (θ) that varies among respondents, and that the mutual dependence of the item responses on that latent variable explains their observed covariance. Then “simple” mathematical functions are used to complete the model.
First, we assume that the dependence of some response process (value) for each person, for each item response alternative, is a linear function of thetawith unknown slope and intercept parameters a_{k} and c_{k} . Such a set of straight lines for a fivecategory item is shown in the left panel of Figure 3.2, using the parameters for Item 3 from Table 3.1.
To change those straight lines (z_{k} ) into a model that yields probabilities (between 0 and 1) for each response, as functions of θ, we use the socalled multivariate logistic link function
This function (Equation 3.8) is often used in statistical models to transform a linear model into a probability model for categorical data. It can be characterized as simple mathematical mechanics: Exponentiation of the values of z_{k} makes them all positive, and then division of each of those positive line values by the sum of all of them is guaranteed to transform the straight lines in the left panel of Figure 3.2 into curves such as those shown in the right panel of Figure 3.2. The curves are all between 0 and 1, and sum to 1
Figure 3.2 Left panel: Linear regressions of the response process zk on θ for five response alternatives. Right panel: Multivariate logistic transformed curves corresponding to the five lines in the left panel
at all values of θ, as required. (The curves in the right panel of Figure 3.2 are those from the lower left panel of Figure 3.1. de Ayala (1992) has presented a similar graphic as his Figure 1.)
For purely statistically trained analysts, with no background in psychological theory development, this is a sufficient line of reasoning to use the nominal model for data analysis. Researchers trained in psychology may desire a more elaborated theoretical rationale, of which two are offered in the two subsequent sections.
However, it is of interest to note at this point that the development in this section, specifically Equation 3.7, invites the questions: Why linear? Why not some higherorder polynomial, like quadratic? Indeed, quadratic functions of θ have been suggested or used for special purposes as variants of the nominal model: Upon hearing a description of the multiplechoice model (Thissen & Steinberg, 1984) D. B. Rubin (personal communication, December 15, 1982) suggested that an alternative to that model would be a nominal model with quadratic functions replacing Equation 3.7. Ramsay (1995) uses a quadratic term in Equation 3.7 for the correct response alternative for multiplechoice items when the multivariate logistic is used to provide “smooth” information curves for the nonparametric trace lines in the TestGraf system. Sympson (1983) also suggested the use of quadratic, and even higherorder, polynomials in a more complex model that never came into implementation or usage.
Nevertheless, setting aside multiplechoice items, for most uses of the nominal model the linear functions in Equation 3.7 are sufficient.
Relationship to Other Models: The term Thurstone models in polytomous IRT typically refers to models where response category thresholds characterize all responses above versus below a given threshold. In contrast, Rasch type models only characterize responses in adjacent categories. However, the Thurstone case V model, which is related to the development of the nominal categories model, is a very different type of Thurstone model–one without thresholds–highlighting the nominal categories model’s unique place among polytomous IRT models.
The original development of the nominal categories model by Bock (1972) was based on an extension of Thurstone’s (1927) case V model for binary choices, generalized to become a model for the first choice among three or more alternatives. Thurstone’s model for choice made use of the concept of a response process that followed a normal distribution, one value (process in Thurstone’s language) for each object. The idea was that the object or alternative selected was that with the larger value. In practice, a “comparatal” process is computed as the difference between the two response processes, and the first object is selected if the value of the comparatal process is greater than zero.
Bock and Jones (1968) describe many variants and extensions of Thurstone’s models for choice, including generalizations to the first choice from among several objects. The obvious generalization of Thurstone’s binary choice model to create a model for the first choice from among three or more objects would use a multivariate normal distribution of m − 1 comparatal processes for object or alternative j, each representing a comparison of object j with one of the others of m objects. Then the probability of selection of alternative j would be computed as a multiple integral over that (m − 1)dimensional normal density, computing a value known as an orthant probability. However, multivariate normal orthant probabilities are notoriously difficult to compute, even for simplified special cases. Bock and Jones suggest substitution of the multivariate logistic distribution, showing that the bivariate logistic yields probabilities similar to those obtained from a bivariate normal (these would be used for the first choice of three objects). The substitution of the logistic here is analogous with the substitution of the logistic function for the normal ogive in the twoparameter logistic IRT model (Birnbaum, 1968). Of course, the multivariate logistic distribution function is Equation 3.1.
In the appendix to this chapter, Bock provides an updated and detailed description of the theoretical development of the nominal categories model as an approximation to the multivariate generalization of Thurstone’s model for choice. In addition, the appendix describes the development of the model that is obtained by considering first choices among three or more objects as an “extreme value” problem, citing the extension of Dubey’s (1969) derivation of the logistic distribution to the multivariate case that has been used and studied by Bock (1970), McFadden (1974), and Malik and Abraham (1973). This latter development also ties the nominal categories model to the socalled BradleyTerryLuce (BTL) model for choice (Bradley & Terry, 1952; Luce & Suppes, 1965).
Thus, from the point of view of mathematical models for choice, the nominal categories model is both an approximation to Thurstone (normal) models for the choice of one of three or more alternatives, and the multivariate version of the BTL model.
Another derivation of the nominal model involves its implications for the conditional probability of a response in one category (say k) given that the response is in one of two categories (k or k′). This derivation is analogous in some respects to the development of Samejima’s (1969, 1997) graded model, which is built up from the idea that several conventional binary item response models may be concatenated to construct a model for multiple responses. In the case of the graded model, accumulation is used to transform the multiple category model into a series of dichotomous models: The conventional normal ogive or logistic model is used to describe the probability that a response is in category k or higher, and then those cumulative models are subtracted to produce the model for the probability the response is in a particular category. This development of the graded model rests, in turn, on the theoretical development of the normal ogive model as a model for the psychological response process, as articulated by Lord and Novick (1968, pp. 370–373), and then on Birnbaum’s (1968) reiteration for test theory of Berkson’s (1944, 1953) suggestion that the logistic function could usefully be substituted for the normal ogive. (See Thissen and Orlando (2001, pp. 84–89) for a summary of the argument by Lord and Novick and the story behind the logistic substitution.)
The nominal model may be derived in a parallel fashion, assuming that the conditional probability of a response in one category (say k), given that the response is in one of two categories (k or k′), can be modeled with the twoparameter logistic (2PL). The algebra for this derivation “frontwards” (from the 2PL for the conditional responses to the nominal model for all of the responses) is algebraically challenging as test theory goes, but it is sufficient to do it “backwards,” and that is what is presented here. (We note in passing that Masters (1982) did this derivation frontwards for the simpler route from the Rasch or oneparameter logistic (1PL) to the partial credit model.)
If one begins with the nominal model as stated in Equation 3.3, and writes the conditional probability for a response in category k given that the response is in one of categories k or k′,then only a modest amount of algebra (cancel the identical denominators, and then more cancellation to change the three exponential terms into one) is required to show that this conditional probability is, in fact, a twoparameter logistic function:withand
Placing interpretation on the algebra, what this means is that the nominal model assumes that if we selected the subsample of respondents who selected either alternative k or k′, setting aside respondents who made other choices, and analyzed the resulting dichotomous item in that subset of the data, we would use the 2PL model for the probability of response k in that subset of the data. This choice, like the choice of the normal ogive or logistic model for the cumulative probabilities in the graded model, then rests on the theoretical development of the normal ogive model as a psychological response process model as articulated by Lord and Novick (1968), and Birnbaum’s (1968) argument for the substitution of the logistic. The difference between the two ways of dividing multiple responses into a series of dichotomies (cumulative vs. conditional) has been discussed by Agresti (2002).
An interesting and important feature of the nominal model is obtained by specializing the conditional probability for any pair of responses to adjacent response categories (k or k − 1; adjacent is meaningful if the responses are actually ordered); the same twoparameter logistic is obtained:withand
It is worth noting at this point that the threshold b^{c} _{k} for the slopethreshold form of the conditional 2PL curve,iswhich is also the crossing point of the trace lines for categories k and k − 1 (de Ayala, 1993; Bock, 1997). These values are featured in some parameterizations of the nominal model for ordered data.
This fact defines the concept of order for nominal response categories: Response k is “higher” than response k − 1 if and only if a_{k} > a _{ k−1}, which means that a^{c} is positive, and so the conditional probability of selecting response k (given that it is one of the two) increases as θ increases. Basically this means that item analysis with the nominal model tells the data analyst the order of the item responses. We have already made use of this fact in discussion of order and the a_{k} parameters in Figure 3.1 and Table 3.1 in the introductory section.
Figure 3.3 Trace lines corresponding to item parameters obtained by Huber (1993) in his analysis of the item “Count down from 20 by 3s” on the Short Portable Mental Status Questionnaire (SPMSQ)
Two additional examples serve to illustrate the use of the nominal model to determine the order of response categories, and the way the model may be used to provide trace lines that can be used to compute IRT scale scores (see Thissen, Nelson, Rosa, and McLeod, 2001) using items with purely nominal response alternatives.
Figure 3.3 shows the trace lines corresponding to item parameters obtained by Huber (1993) in his analysis of the item “Count down from 20 by 3s” on the Short Portable Mental Status Questionnaire (SPMSQ), a brief diagnostic instrument used to detect dementia. For this item, administered to a sample of aging individuals, three response categories were recorded: correct, incorrect (scored positively for this “cognitive dysfunction” scale), and refusal (NA). Common practice scoring the SPMSQ in clinical and research applications was to score NA as incorrect, based on a belief that respondents who refused to attempt the task probably could not do it. Huber fitted the three response categories with the nominal model and obtained the parameters a′ = [0.0, 1.56, 1.92] and c′ = [0.0, −0.52, 0.85]; the corresponding curves are shown in Figure 3.3. As expected, the a_{k} parameter for NA is much closer to the a_{k} parameter for the incorrect response, and the curve for NA is nearly proportional to the – curve in Figure 3.3. This analysis lends a degree of justification to the practice of scoring NA as incorrect. However, if the IRT model is used to compute scale scores, those scale scores reflect the relative evidence of failure provided by the NA response more precisely.
The SPMSQ also includes items that many item analysts would expect to be locally dependent. One example involves a pair of questions that require the respondent to state his or her age, and then his or her date of birth. Huber (1993) combined those two items into a testlet with four response categories: both correct (++), age correct and date of birth incorrect (+−), age incorrect and date of birth correct (−+), and both incorrect (−−). Figure 3.4 shows the
Figure 3.4 Nominal model trace lines for the four response categories for Huber’s (1993) SPMSQ testlet scored as reporting both age and date of birth correctly (++), age correctly and date of birth incorrectly (+−), age incorrectly and date of birth correctly (−+), and both incorrectly (−−)
nominal model trace lines for the four response categories for that testlet. While one may confidently expect that the −− response reflects the highest degree of dysfunction and the ++ response the lowest degree of dysfunction, there is a real question about the scoring value of the +− and −+ responses. The nominal model analysis indicates that the trace lines for +− and −+ are almost exactly the same, intermediate between good and poor performance. Thus, after the analysis with the nominal model one may conclude that this testlet yields four response categories that collapse into three ordered scoring categories: ++, [+− or −+], and −−.
Thissen and Steinberg (1986) showed that a number of other item response models may be obtained as versions of the nominal model by imposing constraints on the nominal model’s parameters, and further that the canonical parameters of those other models may be made the αs and γs estimated for the nominal model with appropriate choices of T matrices. Among those other models are Masters’ (1982) partial credit (PC) model (see also Masters and Wright, 1997) and Andrich’s (1978) rating scale (RS) model (see also Andersen (1997) for relations with proposals by Rasch (1961) and Andersen (1977)). Thissen and Steinberg (1986) also mentioned in passing that a version of the nominal model like the PC model, but with discrimination parameters that vary over items, is also within the parameter space of the nominal model. That latter model was independently developed and used in the 1980s by Muraki (1992) and called the generalized partial credit (GPC) model, and by Yen (1993) and called the twoparameter partial credit (2PPC) model.
Notational Difference: Remember this model was presented slightly differently in Chapter 2:
Muraki (1992, 1997) has used several parameterizations to describe the GPC model, among themwith the constraint thatand alternativelyin which
Muraki’s parameterization of the GPC model is closely related to Masters’ (1982) specification of the PC model:
Notational Difference: Here the authors use θ to refer to the latent variable of interest where Masters (see Equations 5.22 and 5.23 in Chapter 5) and Andrich (see Equations 6.24 and 6.25 in Chapter 6) typically refer to the latent variable using β. This θ/β notational difference will be seen in other chapters and is common in IRT literature.
with the constraint
Andrich’s (1978) RS model iswith the constraintsand
Thissen and Steinberg (1986) described the use of alternative T matrices in the formulation of the nominal model. For example, when formulated for marginal estimation following Thissen (1982), Masters’ (1982) PC model and Andrich’s (1978) RS model use a single slope parameter that is the coefficient for a linear basis function:
Masters’ (1982) PC model used a parameterization for the threshold parameters that can be duplicated, up to proportionality, with this T matrix for the cs:
Terminology Note: The authors use the term threshold here, whereas in other chapters these parameters are sometimes referred to as step or boundary parameters.
Andrich’s RS model separated an overall item location parameter from a set of parameters describing the category boundaries for the item response scale; the latter were constrained equal across items, and may be obtained, again up to proportionality, with
Andrich (1978, 1985) and Thissen and Steinberg (1986) described the use of a polynomial basis for the cs as an alternative to T_{c(RSC)} that “smooths” the category boundaries; the overall item location parameter is the coefficient of the first (linear) column, and the coefficients associated with the other columns describe the response category boundaries:
Polynomial contrasts were used by Thissen et al. (1989) to obtain the trace lines for summed score testlets for a passagebased reading comprehension test; the trace lines for one of those testlets are shown as the lower left panel of Figure 3.1 and the right panel of Figure 3.2. The polynomial contrast set included only the linear term for the a_{k} s and the linear and quadratic terms for the c_{k} s for that testlet; that was found to be a sufficient number of terms to fit the data. This example illustrates the fact that, although the nominal model may appear to have many estimated parameters, in many situations a reduction of rank of the T matrix may result in much more efficient estimation.
After three decades of experience with the nominal model and its applications, a revision to the parameterization of the model would serve several purposes: Such a revision could be used first of all to facilitate the extension of the nominal model to become a multidimensional IRT (MIRT) model, a first for purely nominal responses. In addition, a revision could make the model easier to explain. Further, by retaining features that have actually been used in data analysis, and discarding suggestions (such as many alternative T matrices) that have rarely or never been used in practice, the implementation of estimation algorithms for the model in software could become more straightforward.
Thus, while the previous sections of this chapter have described the nominal model as it has been, and as it has been used, this section presents a new parameterization that we expect will be implemented in the next generation of software for IRT parameter estimation. This is a look into the future.
The development of the new parameterization for the nominal model was guided by several goals, combining a new insight with experience gained over the last 30 years of applications of the model:
Obtaining Goals 3 and 4 requires two distinct parameterizations, both expressed as sets of T matrices; Goals 1 and 2 are maintained in both parameterizations.
The new parameterization isin whichand a ^{*} is the overall slope parameter, a^{s} _{ k+1} is the scoring function for response k, and c _{ k+1} is the intercept parameter as in the original model. The equating following restrictions for identification,are implemented by reparameterizing, and estimating the parameters α and γ:
To accomplish Goals 1 to 3, we use a Fourier basis as the T matrix, augmented with a linear column:in which f_{ki} isand α_{1} = 1. Figure 3.5 shows graphs of the linear and Fourier functions for four categories (left panel) and six categories (right panel). The Fourierbased terms functionally replace quadratic and higherorder polynomial terms that
Figure 3.5 Graphs of the linear and Fourier basis functions for the new nominal model parameterization, for four categories (left panel) and six categories (right panel); the values of T at integral values on the Response axis are the elements of the T matrix of Equations 3.35 and 3.36
we have often used to smooth sequences of a_{k} and c_{k} parameters with a more numerically stable, symmetrical orthogonal basis.
The new parameterization, using the Fourier T matrix, provides several useful variants of the nominal model: When a ^{*},{α_{2,}…,α_{ m−1}}, and γ are estimated parameters, this is the fullrank nominal model. If {α_{2},…,α_{ m−1}} are restricted to be equal to zero, this is a reparameterized version of the GPC model. The Fourier basis provides a way to create models between the GPC and nominal model, as were used by Thissen et al. (1989), Wainer, Thissen, and Sireci (1991), and others.
When the linearFourier basis T_{F} is used for bothwith α_{1} = 1 and α_{2},…,α_{ m−1}= 0, then the parameters of the GPC modelmay be computed as and for k = 1,…,m − 1 (noting that d _{0} = 0 and c _{0} = 0 as constraints for identification). (Childs and Chen (1999) provided formulae to convert the parameters of the original nominal model into those of the GPC model, but they used the T matrices in the computations, which is not essential in the simpler methods given here.)
Also note that if it desired to constrain the GPC parameters d_{k} to be equal across a set of items, that is accomplished by setting the parameter sets γ_{2},…, γ_{ m−1} equal across those items. This kind of equality constraint really only makes sense if the overall slope parameter a ^{*} is also set equal across those items, in which case reflects the overall difference in difficulty, which still varies over items i. (Another way to put this is that the linearFourier basis separates the parameter space into a (first) component for and a remainder that parameterizes the “spacing” among the thresholds or crossover points of the curves.)
The alternative parameterization of the GPCin whichsimply substitutes K_{k} parameters that may be computed from the values of d_{i} . Note that the multiplication of the parameter b by the scoring function T_{k} provides another explanation of the fact that with the linearFourier basis
To provide translations of the parameters for Rasch family models, some accommodation must be made between the conventions that the scale of the latent variable is usually set for more general models by specifying the θ is distributed with mean zero and variance one, versus many implementations of Rasch family models with the specification that some item’s difficulty is zero, or the average difficulty is zero, and the slope is one, leaving the mean and variance of the θ distribution unspecified, and estimated.
If we follow the approach taken by Thissen (1982) that a version of Rasch family models may be obtained with the specification that θ is distributed with mean zero and variance one, estimating a single common slope parameter (a ^{*} in this case) for all items, and all items’ difficulty parameters, then the δ parameters of Masters’ PC model are (in terms of the parameters of Muraki’s GPC model) up to a linear transformation of scale, and the δ and θ parameters of Andrich’s RS model areandagain up to a linear transformation of scale.
To accomplish Goals 1, 2, and 4, involving equality constraints, we use T matrices for a ^{s} as of the formwith the constraint that α_{1} = 1. If it is desirable to impose equality constraints in addition on the c_{s} , we use the following T matrix:
This arrangement provides for the following variants of the nominal model, among others: When a*, {α_{2},…,α_{ m1}}, and γ are estimated parameters, this is again the fullrank nominal model. If α_{ i } = i for {α_{2},…,α_{ m1}}, this is a reparameterized version of the generalized partial credit model.
The restriction is imposed by setting α_{2} = 0. The restriction is imposed by setting α_{(m1)} = m − 1. For the other value of a^{s} the restriction is imposed by setting α _{k′} = α _{k} .
Table 3.2 shows the values of the new nominal model parameters for the items with trace lines in Figure 3.1 and the original parameters in Table 3.1. Note that the scoring parameters in a ^{s} for Items 1 and 4 are [0,1,2,…,m − 1], indicating that the nominal model for those two items is one for strictly ordered responses. In addition, we observe that the lower discrimination of Item 3 (with trace lines shown in the lower left panel of Figure 3.1) is now clearly indicated by the relatively lower value of a*; the discrimination
Parameter 
Item 1 
Item 2 
Item 3 
Item 4 


a* 
1.0 
0.9 
0.55 
0.95 


c _{1} 
0.0 
0.0 
0.0 
0.0 
0.0 
0.0 
0.00 
0.0 

c _{2} 
1.0 
0.0 
0.0 
−0.9 
0.36 
0.5 
1.00 
1.2 

c _{3} 
2.0 
0.0 
1.2 
−0.7 
1.27 
1.8 
2.00 
0.2 

c _{4} 
3.0 
0.0 
3.0 
0.7 
2.36 
3.0 
3.00 
−1.4 

c _{5} 
4.00 
3.3 
4.00 
−2.7 
parameter for Item 3 is only 0.55, relative to values between 0.9 and 1.0 for the other three items. The values of the c parameters are unchanged from Table 3.1. If the item analyst wishes to convert the parameters for Item 3 in Table 3.2 to those previously used for the GPC model, Equations 3.39 to 3.41 may be used.
The new parameterization of the nominal model is designed to facilitate multidimensional item factor analysis (or MIRT analysis) for items with nominal responses, something that has not heretofore been available (Cai, Bock, & Thissen, in preparation). A MIRT model has a vectorvalued θ—two or more dimensions in the latent variable space that are used to explain the covariation among the item responses. Making use of the separation of the new nominal model parameterization of overall item discrimination parameter (a ^{*}) from the scoring functions (in a ^{s}), the multidimensional nominal model has a vector of discrimination parameters a ^{*}, one value indicating the slope in each direction of the θspace. This vector of discrimination parameters taken together indicates the direction of highest discrimination of the item, which may be along any of the θ axes or between them.
The parameters in a ^{s} remain unchanged: Those represent the scoring functions of the response categories and are assumed to be the same in all directions in the θspace. So the model remains nominal in the sense that the scoring functions may be estimated from the data. The intercept parameter c also remains unchanged, taking the place of the standard unitary intercept parameter in a MIRT model.
Assembled in notation, the nominal MIRT model ismodified from Equation 3.31 with vector a ^{*} and vector θ, in which
This is a nominal response model in the sense that, for any direction in the θ space, a cross section of the trace surfaces may take the variety of shapes provided by the unidimensional nominal model. Software to estimate the parameters of this model is currently under development. When completed this model will permit the empirical determination of response alternative order in the context of multidimensional θ. If an ordered version of the model is used, with scoring functions [0,1,2,…,m − 1], this model is equivalent to the multidimensional partial credit model described by Yao and Schwarz (2006).
Reasonable questions may be raised about why the new parameterization of the nominal model has been designed as described in the preceding section; we try to answer some of the more obvious of those questions here:
Why is the linear term of the T matrix scaled between zero and m − 1, as opposed to some other norming convention? It is planned that the implementation of estimation for this new version of the nominal model will be in general purpose computer software that, among other features, can “mix models,” for example, for binary and multiplecategory models. We also assume that the software can fix parameters to any specified value, or set equal any subset of the parameters. Some users may want to use Rasch family (Masters and Wright, 1984) models, mixing the original Rasch (1960) model for the dichotomous items and the PC or RS models for the polytomous items. To accomplish a close approximation of that in a marginal maximum likelihood estimation system, with a N(0,1) population distribution setting scale for the latent variable, a common slope (equal across items) must be specified for all items (Thissen, 1982). For the dichotomous items that scope parameter is for the items scored 0,1; for the polytomous items it is for item scores 0,1,…,(m − 1). Thus, scaling the linear component of the scoring function with unit steps facilitates the imposition of the equality constraints needed for mixed Rasch family analysis. It also permits meaningful equality constraints between discrimination parameters for different item response models that are not in the Rasch family.
In the MIRT version of the model, the a* parameters may be rescaled after estimation is complete, to obtain values that have the properties of factor loadings, much as has been done for some time for the dichotomous model in the software TESTFACT (du Toit, 2003).
Why does the user need to prespecify both the lowest and highest response category (to set up the T matrix) for a nominal model? This is not as onerous as it may first appear: When fitting the fullrank nominal model, one does not have to correctly specify highest and lowest response categories. If the data indicate another order, estimated values of may be less than zero or exceed m − 1, indicating the empirical scoring order. It is only necessary that the item analyst prespecify two categories that are differently related to θ, such that one is relatively lower and the other relatively higher—but even which one is which may be incorrect, and that will appear as a negative value of . Presumably, when fitting a restricted (ordered) version of the model, the user would have already fitted the unrestricted nominal model to determine or check the empirical order of the response categories, or the user would have confidence from some other source of information about the order.
Why not parameterize the model in slopethreshold form, instead of slopeintercept form? Aren’t threshold parameters easier to interpret in IRT? While we fully understand the attraction, in terms of interpretability, for thresholdstyle parameters in IRT models, there are several good reasons to parameterize with intercepts for estimation. The first (oldest historically) reason is that the slopeintercept parameterization is a much more numerically stable arrangement for estimating the parameters of logistic models, due to a closer approximation of the likelihood to normality and less error correlation among the parameters. A second reason is that the threshold parameterization does not generalize to the multidimensional case in any event; there is no way in a MIRT model to “split” the threshold among dimensions, rendering a threshold parameterization more or less meaningless. We note here that, for models for which it makes sense, we can always convert the intercept parameters into the corresponding item location and threshold values for reporting, and in preceding sections we have given formulas for doing so for the GPC model.
Why not use polynomial contrasts to obtain intermediate models, as proposed by Thissen and Steinberg (1986) and implemented in MULTILOG (du Toit, 2003), instead of the Fourier basis? An equally compelling question is to ask: Why polynomials? The purpose of either basis is to provide smooth trends in the a_{s} or c_{s} across a set of response categories. Theory is not sufficient at this time to specify a particular mathematic formulation for smoothness across categories in the nominal model. The Fourier basis accomplishes that goal as well as polynomials, and is naturally orthogonal, which (slightly) simplifies the implementation of the estimation algorithm.
In this chapter we have reviewed the development of Bock’s (1972) nominal model, described its relation with other commonly used item response models, illustrated some of its unique uses, and provided a revised parameterization for the model that we expect will render it more useful for future applications in item analysis and test scoring. As IRT has come to be used in more varying contexts, expanding its domain of application from its origins in educational measurement into social and personality psychology, and the measurement of health outcomes and quality of life, the need to provide item analysis for items with polytomous responses with unknown scoring order has increased. The reparameterized nominal model provides a useful response to that challenge. Combined with the development of multidimensional nominal item analysis (Cai et al., in preparation), the nominal model represents a powerful component among the methods of IRT.
The first step in the direction of the nominal model was an extension of Thurstone’s (1927) method of paired comparisons to first choices among three or more objects. The objects can be anything for which subjects could be expected to have preferences—opinions on public issues, competing consumer products, candidates in an election, and so on. The observations for a set of m objects consist of the number of subjects who prefer object j to object k and the number who prefer k to j. Any given subject does not necessarily have to respond to all pairs. Thurstone proposed a statistical model for choice in which differences in the locations of the objects on a hypothetical scale of preference value predict the observed proportions of choice in all m(m − 1)/2 distinct pairs. He assumed that a subject’s response to the task of choosing between the objects depended upon a subjective variable for, say, object j,where, in the population of respondents, ε _{j} is a random deviation distributed normally with mean 0 and variance σ^{2}. He called this variable a response process and assumed that the subject chooses the object with the larger process. Although the distribution of v_{j} might have different standard deviations for each object and nonzero correlations between objects, this greatly complicates the estimation of differences between the means. Thurstone therefore turned his attention to the case V model in which the standard deviations were assumed equal and all correlations assumed zero in all comparisons. With this simplification, the socalled comparatal processhas mean μ _{j} − μ _{k} and the comparatal processes v_{jk} ,v_{jl} for object j have constant correlation ½. Thurstone’s solution to the estimation problem was to convert the response proportions to normal deviates and estimate the location differences by unweighted least squares, which requires only m ^{2} additions and m divisions. With modern computing machinery, solutions with better properties (e.g., weighted least squares or maximum likelihood) are now accessible (see Bock & Jones, 1968, Section 6.4.1). From the estimated locations, the expected proportions for each comparison are given by the cumulative normal distribution function, Φ(y), at y = (μ _{j} − μ _{k} ). These proportions can be used in chisquare tests of the goodness of fit of the paired comparisons model (Bock, 1956) (Bock & Jones, 1968, section 6.7.1).
The natural extension of the paired comparison case V solution to what might be called the “method of first choices,” that is, the choice of one preferred object in a set of m objects is simply to assume the m − 1 comparatal processes for object j,is distributed (m1)variate normal with means μ _{j} − μ _{k} , constant variance 2σ^{2}, and constant correlation ρ _{jk} equal to ½ (Bock, 1956; 1975, Section 8.1.3). Expected probabilities of first choice for a given object then correspond to the (m − 1)fold multiple integral of the (m − 1)variate normal density function in the orthant from minus infinity up to the limits equal to the comparatal means.
For general multivariate normal distributions of high dimensionality, evaluation of orthant probabilities is computationally challenging even with modern equipment. Computing formulae and tables exist for the bivariate case (National Bureau of Standards, 1956) and the trivariate case (Steck, 1958), but beyond that, Monte Carlo approximation of the positive orthant probabilities appears to be the only recourse at the present time. Fortunately, much simpler procedures based upon a multivariate logistic distribution are now available for estimating probabilities of first choice. By way of introduction, the following section gives essential results for the univariate and bivariate logistic distributions.
Applied to the case V paired comparisons model, the univariate logistic distribution function can be expressed either in terms of the comparatal process z = μ_{jk} :or in terms of the separate processes z _{1} = v_{j} and z _{2} = v_{k} :under the constraint z _{1} + z _{2} = 0. Then z _{1} = −z _{2} and
In either case the distribution is symmetric with mean z = 0, where Ψ(z) = , and variance. The deviate z is called a logit, and the pair z _{1}, z _{2} could be called a binomial logit.
The corresponding density function can be expressed in terms of the distribution function:Although ψ(z) is heavier in the tails than φ(z)Φ(z) closely resembles Ψ(1.7z). Using the scale factor 1.7 in place of the variance matching factor 1.81379 will bring the logistic probabilities closer to the normal over the full range of the distribution, with a maximum absolute difference less than 0.01 (Johnson, Kotz, & Balakrishnan, 1995, p. 119).
An advantage of the logistic distribution over the normal is that the deviate corresponding to an observed proportion, P, is simply the log odds,For that reason, logit linear functions are frequently used in analysis of γ binomially distributed data (see Anscombe, 1956).
Inasmuch as the prediction of first choices may be viewed as an extreme value problem, it is of interest that Dubey (1969) derived the logistic distribution from an extreme value distribution of the double exponential type with mixing variable γ. Then the cumulative extreme value distribution function, conditional on γ, iswhere γ has the exponential density function g(γ) = exp(γ). The corresponding extreme value density function is
Integrating the conditional distribution function over the range of γ gives the distribution function of x:which we recognize as the logistic distribution.
The natural extension of the logistic distribution to the bivariate case iswith marginal distributions Ψ(x _{1}) and Ψ(x _{2}). The density function isand regression equations and corresponding conditional variances are
This distribution is the simplest of three bivariate logistic distributions studied in detail by Gumbel (1961). It is similar to the bivariate normal distribution in having univariate logistic distributions as margins, but unlike the normal, the bivariate logistic density is asymmetric and the regression lines are curved (see Figure 3.6). Nevertheless, the distribution function gives probability values reasonably close to bivariate normal values when the 1.7 scale correction is used (see Bock and Jones (1968, Section 9.1.1) for some comparisons of bivariate normal and bivariate logistic probabilities).
Figure 3.6 Contours of the bivariate logistic density. The horizontal and vertical axes are x 1 and x 2 respectively, in Equation 3.64
The natural extension of the bivariate logistic distribution to higher dimensions is where the elements of the vector z = [z _{1},z _{2},…,z_{m} ]’ are constrained to sum to zero. This vector is referred to as a multinomial logit.
Although this extension of the logistic distribution to dimensions greater than two has been applied at least since 1967 (Bock, 1970; McFadden, 1974), its first detailed study was by Malik and Abraham (1973). They derived the mvariate logistic distribution from the mfold product of independent univariate marginal conditional distributions of the Dubey (1969) extreme value distribution with mixing variable γ. Integrating over γ givesThe corresponding density function is
McFadden (1974) arrived at the same result by essentially the same method, although he does not cite Dubey (1969). Gumbel’s bivariate distribution (above) is included for n = 2, and margins of all orders up to n − 1 are multivariate logistic and all univariate margins have mean zero and variance. No comparison of probabilities for highdimensional normal and logistic distributions has as yet been attempted.
If we substitute functions of external variables for normal or logistic deviates, we can study the relationships of these variables to the probabilities of first choice among the objects presented. In the twocategory case, we refer to these as binomial response relations, and with more than two categories, as multinomial response relations. The analytical problem becomes one of estimating the coefficients of these functions rather than the logit itself. If the relationship is less than perfect, some goodness of fit will be lost relative to direct estimation of the logit (which is equivalent to estimating the category expected probabilities). The difference in the Pearson or likelihood ratio chisquare provides a test of statistical significance of the loss. Examples of weighted least squares estimation of binomial response relations in paired comparison data when the external variables represent a factorial or response surface design on the objects are shown in Section 7.3 of Bock and Jones (1968). Examples of maximum likelihood estimation of multinomial response relations appear in Bock (1970), McFadden (1974), and Chapter 8 of Bock (1975).
An earlier application of maximum likelihood in estimating binomial response relations appears in Bradley and Terry (1952). They assume the modelfor the probability that object j is preferred to object k, but they estimated π _{j} and π _{k} directly rather than exponentiating in order to avoid introducing a Lagrange multiplier to constrain the estimates to sum to unity.
Luce and Suppes (1965) generalized the BradleyTerry model to multinomial data,but did not make the exponential transformation to the multinomial logit and did not apply the model in estimating multinomial response relations.
In item response theory we deal with data arising from twostage sampling: in the first stage we sample respondents from some identified population, and in the second stage we sample responses of each respondent to some number of items, usually items from some form of psychological or educational test. Thus, there are two sources of random variation in the data—between respondents and between item responses. When the response is scored dichotomously, right/wrong or yes/no, for example, the logistic distribution for binomial data applies. If the scoring is polytomous, as when the respondent is choosing among several alternatives, for instance, in a multiplechoice test with recording of each choice, the logistic distribution for multinomial data applies. If the respondent’s level of performance is graded polytomously in ordered categories, the multivariate logistic can still apply, but its parameterization must be specialized to reflect the assumed order of the categories.
In IRT the “external” variable is not an observable quantity, but rather an unobservable latent variable, usually designated by θ, that measures the respondent’s ability or other propensity. The binomial or multinomial logit is expressed as linear functions of θ containing parameters specific to each item. We refer to the functions that depend on θ as item response models. Item response models now in use (see Bock & Moustaki, 2007) include, for item j, the twoparameter logistic model, based on the binomial logistic distribution, and the nominal categories model, based on the multinomial logistic distribution,under the constraints and .
In empirical applications, the parameters of the item response models must be estimated in large samples of the twostage data. Estimation of these parameters is complicated, however, by the presence of the propensity variable θ, which is random in the firststage sample. Because there are potentially different values of this variable for every respondent, there is no way to achieve convergence in probability as number of respondents increases. We therefore proceed in the estimation by integrating over an assumed or empirically derived distribution of the latent variable. If the firststage sample is large enough to justify treating the parameter estimates so obtained as fixed values, we can then use Bayes or maximum likelihood estimation to locate each respondent on the propensity dimension, with a level of precision dependent on the number of items.
The special merit of the nominal categories item response model is that no assumption about the order or other structure of the categories is required. Given that the propensity variable is onedimensional and an ordering of the categories is implicit in the data and is revealed by the order of the coefficients a_{jk} in the nominal model (see Bock & Moustaki, 2007).