The 39-Item Parkinson's Disease Questionnaire (PDQ-39): Is it a Unidimensional Construct? (2024)

Journal List
Ther Adv Neurol Disord
v.2(4); 2009 Jul
PMC3002633

As a library, NLM provides access to scientific literature. Inclusion in an NLM database does not imply endorsem*nt of, or agreement with, the contents by NLM or the National Institutes of Health.
Learn more: PMC Disclaimer | PMC Copyright Notice

Ther Adv Neurol Disord. 2009 Jul; 2(4): 205–214.

doi:10.1177/1756285609103726

PMCID: PMC3002633

PMID: 21179529

Peter Hagell and Maria H. Nilsson

Author information Copyright and License information PMC Disclaimer

Abstract

The 39-item Parkinson's Disease Questionnaire, and particularly its summary index (PDQ-39SI) is a widely used patient-reported clinical trial endpoint. A basic assumption when summing items into a total score is that they represent a common variable. We therefore assessed the unidimensionality of the PDQ-39SI using Rasch and confirmatory factor analysis. Both analyses showed model misfit. Adjustment for differential item functioning and disordered response category thresholds did not improve model fit, and residual analyses showed deviation from unidimensionality. These data indicate multidimensionality and challenge the interpretation and validity of PDQ-39SI scores. Clinicians and investigators should use and interpret the PDQ-39SI with caution.

Keywords: dimensionality, outcome measurement, quality of life, Parkinson's disease

Introduction

A basic assumption when summing rating scale items into a total score is that the items represent a common underlying construct; that is, they should be unidimensional [Nunnally and Bernstein 1994]. When total scores are not unidimensional they are technically invalid and their meaning is ambiguous since it is unclear what scores represent [Smith 2002]. This cannot be compensated for by trial design and may cause misleading inferences that influence patient care. Unambiguous interpretation is a prerequisite for scores to be acceptable as clinical trial endpoints [Food and Drug Administration 2006].

One approach to assess rating scale unidimensionality is Rasch analysis [Wilson 2005; Tennant et al. 2004a; Hobart 2003; Andrich 1988; Rasch 1960]. Since unidimensionality is an explicit Rasch model assumption, adequate model fit has often been taken as support for scale unidimensionality. For example, Jenkinson et al. [2003b] Rasch analyzed the 40-item Amyotrophic Lateral Sclerosis Assessment Questionnaire to determine the legitimacy of summarizing the scale into a single score. Results indicated that the items fit the Rasch model, which was taken as support for unidimensionality and, therefore, also for the validity of constructing an overall summary index [Jenkinson et al. 2003b]. However, it has been shown that multidimensional data can fit the model; therefore, the unidimensionality assumption needs to be tested explicitly [Tennant and Pallant 2006; Smith 2002, 1996].

In Parkinson's disease (PD), the 39-item PD questionnaire (PDQ-39) is the most widely used patient reported rating scale endpoint in clinical trials. Recent observations challenge the validity and interpretability of a majority of its eight scales [Hagell and Nygren 2007] and its eight-item short form, the PDQ-8 [Franchignoni et al. 2008]. However, no studies using methods such as Rasch analysis have assessed the dimensionality of the PDQ-39 summary index (PDQ-39SI), an overall PDQ-39 score [Jenkinson et al. 1997]. Such analyses are relevant because unidimensionality is a relative matter relating to the level of perspective and conceptualization [Pallant and Tennant 2007; Andrich 1988]. For example, although the grouping of items into eight PDQ-39 scales may not have been successful in defining eight unidimensional variables, all 39 items together could still represent a single variable. We therefore assessed whether the PDQ-39 appears to represent a unidimensional construct.

Methods

Participants

Details have been reported elsewhere [Hagell and Nygren 2007]. Briefly, self-reported postal survey PDQ-39 data from 202 people (79% response rate) with neurologist diagnosed PD [Gibb and Lees 1988] were analyzed (Table 1). The study was approved by the local research ethics committee.

Table 1.

Sample characteristics (n ¼202).

Gender (men/women), n (%)	108 (53.5%)/94 (46.5%)
Mean age, years (SD; min–max)	69.8 (10.0; 34-90)
Mean disease duration, years (SD; min–max)	8.7 (6.6; 0.5-28)
Median ‘off’-phase Hoehn and Yahr stage (q1–q3; min–max)^a	III (II-IV; I-V)

Open in a separate window

^aFrom clinic visits within up to about 9 months of the postal survey. Higher values indicate more severe PD (range, I–V; I, mild unilateral disease; II, bilateral disease without postural impairment; III, bilateral disease with postural impairment, moderate disability; IV, severe disability, still able to walk and stand unassisted; V, confined to bed or wheelchair unless aided) [Hoehn and Yahr 1967]. SD, standard deviation; q1–q3, first and third quartiles.

The PDQ-39 summary index (PDQ-39SI)

The PDQ-39 [Peto et al. 1995] is a PD specific health status questionnaire comprising 39 items. Respondents are requested to affirm one of five ordered response categories according to how often, due to their PD, they have experienced the problem defined by each item. Items are grouped into eight scales that are scored by expressing summed item scores as a percentage score ranging between 0 and 100 (100¼more health problems). Based on results from exploratory factor analysis, a PDQ-39 summary index (PDQ-39SI) has been proposed [Jenkinson et al. 1997]. The PDQ-39SI is derived by the sum of the eight PDQ-39 scale scores divided by eight (the number of scales), which yields a score between 0 and 100 (100¼more health problems). This is equivalent to expressing the sum of all 39 item responses as a percentage score.

Analyses

The logic of computing and reporting the PDQ-39SI is based on the assumption that the PDQ-39 represents a single underlying construct [Jenkinson et al. 1997]. This assumption was tested using Rasch analysis and confirmatory factor analysis.

Rasch analysis

The Rasch model [Rasch 1960] mathematically defines what is required from item responses in order for them to express linear measures rather than mere numbers. It separately locates persons and items on a common logit (log-odd units) metric that is centered by the mean item location, which is set at zero.

Fit of data to the Rasch model is assessed by examining the accordance between expected and observed responses across person locations (class intervals) on the measured construct [Andrich et al. 2004-2005; Andrich 1988]. Overall fit is supported by a nonsignificant item-trait interaction chi-square statistic, and individual item fit is supported by nonsignificant standardized residuals ranging between —2.5 and +2.5 [Andrich et al. 2004-2005; Andrich 1988]. Residuals represent the discrepancy between observed and expected item responses. Large positive residuals primarily suggest violation of unidimensionality, whereas large negative residuals signal local dependency (i.e. item responses are dependent on responses to other items, suggesting item redundancy). Large residuals, both positive and negative, violate model assumptions and distort measurement.

However, fit statistics can be somewhat insensitive in detecting multidimensionality [Tennant and Pallant 2006; Smith, 2002, 1996]. Smith [2002] therefore proposed a combined approach to dimensionality testing. First, a principal component analysis (PCA; a form of factor analysis) of the residuals is used to identify potential subdimensions in the scale. A series of independent t-tests is then conducted to assess whether subsets of items yield different person measures. If violation of unidimensionality is trivial, the number of person locations that differ between two item sets is small. This approach attempts to assess whether scales are sufficiently unidimensional to be treated as such in practice [Tennant and Pallant 2006; Smith 2002].

Differential item functioning (DIF) is an additional aspect of fit to the Rasch model that may result from, for example, multidimensionality and can bias scale scores [Borsboom 2006; Holland and Wainer 1993; Andrich 1988]. DIF analyses assess whether subgroups of people with similar levels on the measured construct respond systematically different to items [Andrich et al. 2004-2005; Hagquist and Andrich 2004; Tennant et al. 2004]. When DIF is uniform (i.e. item responses differ uniformly between groups across the measured construct) this can be adjusted for by splitting the item into two new items, one for each subgroup [Hagquist and Andrich 2004; Tennant et al. 2004b].

When ordered response categories are used, such as with the PDQ-39, Rasch analysis can assess whether response categories work as assumed; that is, if they reflect an increasing amount of the measured variable [Andrich et al. 2004-2005; Hagquist and Andrich 2004; Hagquist 2001]. If thresholds between adjacent response categories (i.e. the points where there are 50/50 probabilities of scoring, e.g. 2 or 3) are disordered, these categories do not work as intended. This indicates problems such as too many response categories or overlapping category labels, or may be due to multidimensionality [Andrich et al. 2004-2005 Hagquist and Andrich 2004; Hagquist 2001].

Confirmatory factor analysis

To account for the procedure of creating PDQ-39SI scores from the eight suggested PDQ-39 scale scores we also used confirmatory factor analysis. Confirmatory factor analysis assesses statistically whether and to what extent empirical data fit a predefined hypothesized structure. Confirmatory factor analysis is therefore generally recommended over exploratory factor analysis when there is an a priori hypothesis regarding dimensionality [Floyd and Widaman 1995]. The extent to which empirical data accord with the hypothesized structure is assessed by a chi-square statistic that is expected to be nonsignificant when data fit the model. Because this statistic is sensitive to sample size, goodness-of-fit is also assessed by various descriptive fit indices [Schermelleh-Engel and Moosbrugger 2003].

Analysis plan

All 39 items were analyzed regarding fit to the unrestricted (partial credit) Rasch model for ordered response categories. Unidimensionality was further scrutinized by PCA of the residuals followed by independent t-tests. Two estimated locations for each person were compared; one from the items with the strongest positive and one from the items with the strongest negative residual loadings (> ±0.3, respectively) on the first principal component (factor) [Tennant and Pallant 2006]. Unidimensionality was considered statistically supported if the proportion of significant individual t-tests, or the lower bound of the associated 95% binomial confidence interval (CI), did not exceed 0.05 [Tennant and Pallant 2006].

Next, the hypothesized scales-to-summary index structure of the PDQ-39SI was assessed by confirmatory factor analysis. The a priori hypothesis that the eight PDQ-39 scales represent a single underlying construct was tested by means of chi-square statistics and four descriptive fit indices: the Goodness-of-Fit Index (GFI), the Adjusted Goodness-of-Fit Index (AGFI), the Comparative Fit Index (CFI), and the Root Mean Square Error of Approximation (RMSEA) [Schermelleh-Engel and Moosbrugger 2003].

In case of signs of multidimensionality, two potential sources were explored. First, we examined the presence of DIF between genders and age groups (as defined by the median: 572 versus >72 years old). When DIF was detected, this was adjusted for by splitting items into subgroup specific items [Andrich et al. 2004-2005; Hagquist and Andrich 2004]. Secondly, we assessed if increasing health problems (as defined by PDQ-39SI scores) were reflected by increasing probabilities of endorsing response categories 0 (‘never’) through 4 (‘always’) by examining the thresholds between categories [Andrich et al. 2004-2005; Hagquist and Andrich 2004]. When disordered thresholds were found, we explored whether collapsing adjacent response categories improved model fit and unidimensionality. Analyses were performed using SPSS 14 (SPSS Inc., Chicago, IL), RUMM2020 (Rumm Laboratory Pty Ltd., Perth) and AMOS 5 (SmallWaters Corp., Chicago, IL) for Windows.

Results

Rasch analysis yielded a significant item-trait interaction chi-square statistic (χ², 300.064; p50.0001), indicating lack of overall model fit. Reliability was 0.96. Inspection of individual item fit suggested that 12 items did not fit the model (Table 2). Among these, eight items (23, 25, 30, 32, 33, 37, 38, and 39) displayed large positive residuals, indicating departure from unidimensionality. PCA followed by independent t-tests showed that the proportion of significantly different person measures based on items with strong positive and negative loadings on the first principal component was 0.36 (95% CI, 0.33–0.39). Similarly, confirmatory factor analysis of the proposed scales-to-summary index structure showed inadequate goodness-of-fit (Figure 1).

Table 2.

Rasch item and fit statistics for the PDQ-39^a.

Item		Item statistics^b		Fit statistics^b
No.	Problem area (abridged)	Location^c	SE^c	Residual^d	Chi square^e^,^f	F-statistic^e^,^g
1	Leisure activities	-0.798	0.078	-0.640	3.127	1.850
2	Looking after home	-0.673	0.073	-2.075	6.417	5.069
3	Carry shopping bags	-0.715	0.065	-1.111	0.317	0.504
4	Walking half a mile	-0.756	0.064	0.360	3.657	1.267
5	Walking 100 yards	-0.138	0.069	-0.988	4.139	3.062
6	Getting around the house	-0.436	0.075	-2.578	9.394	8.041
7	Getting around in public	-0.636	0.073	-3.405	17.341	16.852
8	Need company when going out	-0.421	0.064	-0.361	0.179	0.008
9	Worry falling in public	-0.160	0.072	-1.868	6.454	5.152
10	Confined to the house	-0.235	0.077	-3.086	23.375	20.782
11	Washing	0.266	0.075	-2.388	16.187	13.519
12	Dressing	-0.115	0.073	-1.617	5.841	4.039
13	Do buttons or shoe laces	-0.615	0.075	-0.447	3.409	1.998
14	Writing clearly	-0.881	0.078	0.891	2.185	1.103
15	Cutting food	-0.316	0.075	1.789	5.861	2.363
16	Hold a drink without spilling	0.067	0.076	1.804	0.240	0.022
17	Depressed	-0.465	0.087	-0.309	1.882	1.238
18	Isolated and lonely	0.259	0.082	-1.449	3.077	2.466
19	Weepy or tearful	0.673	0.087	0.911	0.594	0.189
20	Angry or bitter	0.382	0.088	0.480	2.827	1.278
21	Anxious	-0.166	0.091	-0.236	4.527	2.430
22	Worried about the future	-0.191	0.086	0.036	8.044	4.104
23	Felt need to conceal PD	0.142	0.073	5.981	43.628	12.956
24	Avoid eating/drinking in public	0.145	0.078	-0.775	4.739	2.950
25	Embarrassed due to PD	0.183	0.077	2.657	7.193	3.080
26	Worried people's reactions	0.584	0.086	1.353	0.271	0.182
27	Close relationships	1.154	0.096	-0.223	0.167	0.087
28	Support from partner	0.386	0.094	1.393	12.761	4.689
29	Support from family or friends	0.936	0.093	0.664	2.956	1.234
30	Unexpectedly fallen asleep	0.868	0.078	2.685	9.241	4.131
31	Concentration	0.050	0.082	-0.886	2.690	1.798
32	Poor memory	0.025	0.083	3.030	10.769	4.382
33	Dreams or hallucinations	0.867	0.08	2.832	16.184	6.183
34	Speech	-0.037	0.076	0.385	0.279	0.092
35	Unable communicate properly	0.056	0.078	-1.223	1.843	1.574
36	Felt ignored	1.158	0.098	-1.671	6.436	4.839
37	Painful cramps or spasms	0.172	0.072	3.410	15.122	5.405
38	Pain in joints or body	-0.437	0.077	4.644	17.064	6.556
39	Unpleasantly hot or cold	-0.184	0.077	4.044	19.646	7.493

Open in a separate window

^aPerformed with the sample divided into three class intervals according to person locations on the measured construct.

^bRounded to three decimals.

^cExpressed in linear log-odds units (Logits). Mean item location is zero with positive values representing more health problems.

^dResiduals summarize the deviation of observed from expected responses. Deviation from the recommended range of —2.5 to +2.5, indicating item misfit, are bold.

^eBonferroni corrected statistically significant deviations across class intervals, indicating item misfit, are bold.

^fChi-square values summarize the deviation of observed from expected responses across the three class intervals of people. Higher values represent larger deviations.

^gF-statistics from one-way ANOVAs of deviations from model expectation across the three class intervals of people.

SE, standard error.

Open in a separate window

Figure 1.

Hypothesized relationships between PDQ-39 scales and the PDQ-39SI assessed for fit with data by confirmatory factor analysis. Arrows indicate hypothesized relationships according to the measurement model, and coefficients above each arrow are estimated standardized regression weights. Squares and circles represent observed and latent variables, respectively. The box summarizes model fit and accompanying criteria for acceptable fit [Schermelleh-Engel and Moosbrugger 2003]. MOB, mobility; ADL, activities of daily living; EMO, emotional well-being; STI, stigma; SOC, social support; COG, cognitions; COM, communication; BOD, bodily discomfort; e, error term; res, residual covariance; GFI, Goodness-of-Fit Index; AGFI, Adjusted Goodness-of-Fit Index; CFI, Comparative Fit Index; RMSEA, root mean square error of approximation; CI, confidence interval.

Next we examined the presence of DIF by gender and age. Four items (19, 24, 34, and 35) displayed significant DIF by gender and item 10 showed DIF by age. These items were then split into gender and age specific ones, respectively. Overall Rasch model fit remained significant (item-trait interaction χ², 235.358; p50.0001) and misfit was found for the same items as before. Reliability was unchanged at 0.96.

We then assessed whether the five response categories worked as assumed. We found disordered response category thresholds in 24 items (Table 3). Threshold disordering typically involved category 1 (‘seldom’), although disordering of all thresholds occurred. Figure 2 exemplifies these observations by displaying items with (Figure 2A, B) and without (Figure 2C) threshold disordering. Response categories were then collapsed into four (16 items: 1, 6, 7, 9-14, 24, 29, 31, 33, 34, 38 and 39) and three (eight items: 3-5, 8, 23, 28, 30 and 37) categories in order to obtain response scale functionality. This did not improve overall model fit (item-trait interaction χ², 236.136; p <0.0001). At the item level, misfit was resolved for items 29, 37, 38 and 39. However, two additional items (15 and 16) now displayed signs of misfit (fit residual values of 3.65 and 3.23, respectively). Independent t-tests of the DIF adjusted scale with collapsed response categories showed that the proportion of significantly different person measures was 0.35 (95% CI, 0.32–0.38). Reliability was unchanged at 0.96.

Table 3.

Response category thresholds of the PDQ-39.

Response category threshold location^a^,^b
Item	0→1	1→2	2→3	3→4
1	-0.842	-1.081	0.635	1.288
2	-0.925	-0.456	0.498	0.883
3	-0.001	-0.633	0.520	0.114
4	-0.099	-0.303	0.515	-0.113
5	-0.135	-0.436	0.297	0.274
6	-0.553	-1.104	0.442	1.214
7	-0.305	-1.121	0.263	1.163
8	0.074	-0.379	0.130	0.175
9	-0.405	-0.869	0.436	0.838
10	-0.912	-1.103	-0.021	2.036
11	-0.275	-1.031	0.233	1.073
12	-0.345	-1.151	0.454	1.042
13	-0.804	-0.853	0.266	1.390
14	-0.766	-1.110	0.395	1.482
15	-1.050	-0.569	0.138	1.481
16	-0.985	-0.182	-0.062	1.229
17	-1.828	-0.949	0.313	2.464
18	-1.186	-1.104	0.428	1.862
19	-1.766	-1.540	0.322	2.984
20	-1.772	-0.995	0.600	2.168
21	-2.222	-1.055	0.345	2.931
22	-1.745	-1.234	-0.103	3.082
23	-0.199	-0.770	0.303	0.666
24	-0.908	-0.917	0.147	1.678
25	-0.864	-0.587	-0.013	1.464
26	-1.344	-0.989	0.413	1.920
27	-1.337	-0.952	0.285	2.003
28	0.366	-0.873	1.761	-1.254
29	-0.760	-1.045	0.474	1.332
30	-1.043	-2.039	-0.535	3.617
31	-1.196	-1.269	0.365	2.101
32	-1.382	-1.218	0.265	2.336
33	-1.358	-1.524	-0.515	3.397
34	-0.710	-1.039	0.288	1.460
35	-0.958	-0.905	0.399	1.464
36	-1.301	-0.855	0.490	1.666
37	-0.514	-1.050	-0.710	2.275
38	-0.852	-0.926	-0.321	2.099
39	-0.817	-1.129	0.112	1.834

Open in a separate window

^aExpressed in linear log-odds units (logits) rounded to three decimals and centred at a mean of zero for each item. Threshold locations are the points where there are 50/50 probabilities of endorsing adjacent categories, centralized with a mean of zero for each item. These should be ordered in an expected manner from less to more health problems. Disordered thresholds are bold.

^bResponse categories are ‘never’ (0), ‘seldom’ (1), ‘sometimes’ (2), ‘often’ (3), and ‘always, or cannot do at all’ (4).

Open in a separate window

Figure 2.

Example category probability curves from the PDQ-39. Location on the measured construct is indicated on the x-axis (with threshold locations centered at zero; negative values ¼ less problems) and the y-axis represents the probability of affirming response options 0 (‘never’), 1 (‘seldom’), 2 (‘sometimes’), 3 (‘often’), and 4 (‘always’). Category probability curves show the probability of observing each category relative to the location on the measured construct (x-axis). Vertical arrows indicate the logit locations of the respective thresholds. Panel A shows an item (no. 12) representing the typical pattern with disordered thresholds between response options 0-to-1 and 1-to-2, whereas panel B displays an item (no. 3) with multiple disordering (thresholds 0-to-1/1-to-2 and thresholds 2-to-3/3-to-4). For comparison, panel C illustrates an item (no. 17) with ordered thresholds.

Discussion

This study tested whether the PDQ-39 represents a unidimensional construct. Such assessments are essential as legitimate use of total scores assumes unidimensionality, and violation thereof challenges the meaning and validity of scores. Both Rasch and confirmatory factor analyses gave similar results in that neither approach found support for the unidimensionality of the PDQ-39. This challenges the validity and, consequently, the interpretability of the PDQ-39SI.

There are at least three related reasons why unidimensionality is important to consider [Smith 2002; Stout 1987]. Firstly, unidimensionality is a basic assumption for valid calculation of total scores. Secondly, unambiguous interpretation requires scores to represent a single defined attribute. That is, scores on a scale that is used to measure one variable should not be appreciably influenced by varying levels on one or more other variables. Thirdly, if scores do not represent a common line of inquiry it is unclear if two individuals with the same score can be considered comparable. Similarly, the interpretation of any differences between individuals will be ambiguous since it is unknown how they actually differ. This hampers understanding of clinical trial outcomes, which in turn has consequences for selecting interventions for individual patients.

We found evidence that the PDQ-39 does not represent a unidimensional construct. These observations are in accordance with the ambiguities observed regarding the dimensionality of the eight PDQ-39 scales [Hagell and Nygren 2007] and the PDQ-8 [Franchignoni et al. 2008]. Previous studies addressing the dimensionality of the PDQ-39SI have conducted exploratory factor analyses of the eight PDQ-39 scale scores [Jenkinson and Fitzpatrick 2007; Luo et al. 2005; Tan et al. 2004; Jenkinson et al. 2003a]. Similarly to the initial derivation of the PDQ-39SI [Jenkinson et al. 1997], these studies suggested that the eight PDQ-39 scales represent a common construct by showing that all eight PDQ-39 scale scores loaded on a single factor according to the eigenvalue 51 criterion. However, this approach is generally discouraged because it typically yields erroneous results [Gorsuch 1983]. Firstly, it tends to identify too many or too few factors (dimensions) and, secondly, the number of factors identified tends to relate to the number of variables included in the analysis (regardless of the actual number of dimensions in the data). With eight variables, as with the PDQ-39 scales, identification of a single dimension is therefore not surprising [Hair et al. 2006]. Furthermore, in the majority of studies using exploratory factor analysis [Jenkinson and Fitzpatrick 2007; Luo et al. 2005; Tan et al. 2004; Jenkinson et al. 2003a] the identified single factor explained less than 50% of the total variance and in no instance did it exceed the recommended 60% [Hair et al. 2006]. That is, more than half of the information contained in the eight PDQ-39 scales was typically not accounted for.

It may be obvious from the nature of scales such as the PDQ-39 that they reflect illness-related aspects as perceived and interpreted by patients. On one hand, it may therefore be argued that it is of less concern exactly what such scores represent. However, this would be analogous to relying on an overall score that represents unknown aspects of neurological impairments and use this to understand outcomes in clinical trials. The impact of diseases such as PD is vast and involves a variety of aspects. In order to be able to understand these and to offer interventions to improve patient wellbeing, they need to be measured without ambiguousness. This is not to say that valid overall measurement of the impact of disease from the patient's perspective cannot be obtained. A prerequisite, however, is that such instruments are based on and developed according to well-defined theories [Doward et al. 2004].

We found some evidence for the presence of DIF by age and gender. However, this does not appear to be a main source of violations to unidimensionality in the PDQ-39SI since adjustment for DIF did not improve dimensionality. Similarly, while problems with the rating scale response categories may relate to multidimensionality, it appears unlikely that this would be a major explanation here since explorative post hoc combination of response categories did not improve model fit. Instead, the dimensionality problem appears to be a conceptual one where items do not work in harmony to define a common variable.

The observed disordering among the PDQ-39 response category thresholds shows that the response scale does not work as intended. This may be due to; for example, unclear distinctions between categories or difficulties making fine tuned ratings [Andrich et al. 2004–2005; Hagquist and Andrich 2004; Hagquist 2001]. Referring back to Figure 2B, threshold disordering means that the location at which people are equally likely to respond ‘often’ or ‘always’ represents less health problems than that at which they are equally likely to respond ‘sometimes’ or ‘often’. When this phenomenon occurs something has gone wrong in the interaction between respondents, items and the response options, and the clinical meaning of the response scale is unclear.

Similarly to previous observations in the US and UK [Paterson et al. 2005; Bushnell and Martin 1999], respondents to the original Swedish PDQ-39 found the distinction between ‘occasionally’ and ‘sometimes’ ambiguous [Hagell and McKenna 2003]. As in the US PDQ-39 [Bushnell and Martin 1999], the Swedish PDQ-39 used in this study therefore substituted ‘occasionally’ by ‘seldom’ [Kim et al. 2006]. Such a modification is supported by studies specifically addressing people's interpretation of response category labels in other populations [Skevington and Tucker 1999; Szabo et al. 1996]. However, while the change from ‘occasionally’ to ‘seldom’ improved this apparent ambiguity of the PDQ-39 response options, a significant proportion of respondents still found them difficult to use [Kim et al. 2006]. Furthermore, the observed problems with the PDQ-39 response categories do not appear to be specific for the Swedish version of the scale, as similar observations have been reported with other language versions [Franchignoni et al. 2008].

Despite displaying misfit to a unidimensional measurement model PDQ-39 items may still prove useful for measurement, provided that a (or several) subset(s) of items can be shown to represent a clearly defined variable. However, the aim of this study was to assess whether the present version of the full PDQ-39 represents a unidimensional construct. Additional studies are needed to explore if reduction and/or regrouping of its items can produce a more valid and interpretable outcome measure.

This is the first independent study to assess the dimensionality of the PDQ-39SI using contemporary methods. We found clear indications of multidimensionality that cannot be explained by technical aspects of the scale but probably relate to conceptual problems. This argues against its usefulness as a clinical trial endpoint [Food and Drug Administration 2006]. More independent studies regarding the dimensionality of the PDQ-39 are needed to confirm or falsify these observations. Meanwhile, clinicians and investigators should use and interpret the PDQ-39SI with caution.

Acknowledgements

The authors wish to thank all participating patients for their cooperation and Jan Reimer for assistance with data collection. The study was supported by the Swedish Research Council, the Swedish Parkinson Academy, the Swedish Parkinson Foundation, the Skane County Council Research and Development Foundation, and the Faculty of Medicine, Lund University.

Conflict of interest statement

The authors have no conflicts of interest.

Contributor Information

Peter Hagell, Department of Health Sciences, Lund University and Department of Neurology, Lund University Hospital, Lund, Sweden es.ul.dem@llegaH.reteP.

Maria H. Nilsson, Department of Health Sciences, Lund University and Department of Neurosurgery, University Hospital, Lund, Sweden.

References

Andrich D. (1988) Rasch Models for Measurement. Beverly Hills: SAGE Publications [Google Scholar]
Andrich D., Sheridan B., Luo G. (2004-2005) Interpreting Rumm. Perth: RUMM Laboratory Pty [Google Scholar]
Borsboom D. (2006) When does measurement invariance matter? Med Care 44: S176–181 [PubMed] [Google Scholar]
Bushnell D.M., Martin M.L. (1999) Quality of life and Parkinson's disease: translation and validation of the US Parkinson's Disease Questionnaire (PDQ-39). Qual Life Res 8: 345–350 [PubMed] [Google Scholar]
Doward L.C., Meads D.M., Thorsen H. (2004) Requirements for quality of life instruments in clinical research. Value Health 7(Suppl. 1): S13–S16 [PubMed] [Google Scholar]
Floyd F.J., Widaman K.F. (1995) Factor analysis in the development and refinement of clinical assessment instruments. Psychol Assess 7: 286–299 [Google Scholar]
Food and Drug Administration (2006) Patient-reported outcome measures: use in medicinal product development to support labelling claims. Federal Register 71: 5862–5863 [Google Scholar]
Franchignoni F., Giordano A., Ferriero G. (2008) Rasch analysis of the short form 8-item Parkinson's Disease Questionnaire (PDQ-8). Qual Life Res 17: 541–548 [PubMed] [Google Scholar]
Gibb W.R., Lees A.J. (1988) The relevance of the Lewy body to the pathogenesis of idiopathic Parkinson's disease. J Neurol Neurosurg Psychiatry 51: 745–752 [PMC free article] [PubMed] [Google Scholar]
Gorsuch R.L. (1983) Factor Analysis. 2nd edition Mahwah: Lawrence Erlbaum Associates [Google Scholar]
Hagell P., McKenna S.P. (2003) International use of health status questionnaires in Parkinson's disease: translation is not enough. Parkinsonism Relat Disord 10: 89–92 [PubMed] [Google Scholar]
Hagell P., Nygren C. (2007) The 39 item Parkinson's Disease Questionnaire (PDQ-39) revisited: implications for evidence based medicine. J Neurol Neurosurg Psychiatry 78: 1191–1198 [PMC free article] [PubMed] [Google Scholar]
Hagquist C. (2001) Evaluating composite health measures using Rasch modelling: an illustrative example. Soz Praventivmed 46: 369–378 [PubMed] [Google Scholar]
Hagquist C., Andrich D. (2004) Is the sense of coherence instrument applicable on adolescents? A latent trait analysis using Rasch modelling. Pers Individ Diff 36: 955–968 [Google Scholar]
Hair J.F., Black B., Babin B., Anderson R.E., Tatham R.L. (2006) Multivariate Data Analysis. 6th edition Upper Saddle River: Prentice Hall [Google Scholar]
Hobart J. (2003) Rating scales for neurologists. J Neurol Neurosurg Psychiatry 74(Suppl. 4): iv22–iv26 [PMC free article] [PubMed] [Google Scholar]
Hoehn M.M., Yahr M.D. (1967) Parkinsonism: onset, progression and mortality. Neurology 17: 427–442 [PubMed] [Google Scholar]
Holland P.W., Wainer H. (1993) Differential Item Functioning. Mahwah: Lawrence Erlbaum Associates [Google Scholar]
Jenkinson C., Fitzpatrick R. (2007) Cross-cultural evaluation of the short form 8-Item Parkinson's Disease Questionnaire (PDQ-8): results from America, Canada, Japan, Italy and Spain. Parkinsonism Relat Disord 13: 22–28 [PubMed] [Google Scholar]
Jenkinson C., Fitzpatrick R., Norquist J., Findley L., Hughes K. (2003a) Cross-cultural evaluation of the Parkinson's Disease Questionnaire: tests of data quality, score reliability, response rate, and scaling assumptions in the United States, Canada, Japan, Italy, and Spain. J Clin Epidemiol 56: 843–847 [PubMed] [Google Scholar]
Jenkinson C., Fitzpatrick R., Peto V., Greenhall R., Hyman N. (1997) The Parkinson's Disease Questionnaire (PDQ-39): development and validation of a Parkinson's disease Summary Index Score. Age Ageing 26: 353–357 [PubMed] [Google Scholar]
Jenkinson C., Norquist J.M., Fitzpatrick R. (2003b) Deriving summary indices of health status from the Amyotrophic Lateral Sclerosis Assessment Questionnaires (ALSAQ-40 and ALSAQ-5). J Neurol Neurosurg Psychiatry 74: 242–245 [PMC free article] [PubMed] [Google Scholar]
Kim M.Y., Dahlberg A., Hagell P. (2006) Respondent burden and patient-perceived validity of the PDQ-39. Acta Neurol Scand 113: 132–137 [PubMed] [Google Scholar]
Luo N., Tan L.C., Li S.C., Soh L.K., Thumboo J. (2005) Validity and reliability of the Chinese (Singapore) version of the Parkinson's Disease Questionnaire (PDQ-39). Qual Life Res 14: 273–279 [PubMed] [Google Scholar]
Nunnally J.C., Bernstein I.H. (1994) Psychometric Theory, 3rd edition New York: McGraw-Hill [Google Scholar]
Pallant J.F., Tennant A. (2007) An introduction to the Rasch measurement model: an example using the Hospital Anxiety and Depression Scale (HADS). Br J Clin Psychol 46: 1–18 [PubMed] [Google Scholar]
Paterson C., Allen J.A., Browning M., Barlow G., Ewings P. (2005) A pilot study of therapeutic massage for people with Parkinson's disease: the added value of user involvement. Complement Ther Clin Pract 11: 161–171 [PubMed] [Google Scholar]
Peto V., Jenkinson C., Fitzpatrick R., Greenhall R. (1995) The development and validation of a short measure of functioning and well being for individuals with Parkinson's disease. Qual Life Res 4: 241–248 [PubMed] [Google Scholar]
Rasch G. (1960) Probabilistic Models for Some Intelligence and Attainment Tests. Copenhagen: Danmarks Paedagogiske Institut [Google Scholar]
Schermelleh-Engel K., Moosbrugger H. (2003) Evaluating the fit of structural equation models: tests of significance and descriptive goodness-of-fit measures. Methods Psychol Res Online 8: 23–74 [Google Scholar]
Skevington S.M., Tucker C. (1999) Designing response scales for cross-cultural use in health care: data from the development of the UK WHOQOL. Br J Med Psychol 72(Pt 1): 51–61 [PubMed] [Google Scholar]
Smith E.V., Jr (2002) Detecting and evaluating the impact of multidimensionality using item fit statistics and principal component analysis of residuals. J Appl Meas 3: 205–231 [PubMed] [Google Scholar]
Smith R.M. (1996) A comparison of methods for determining dimensionality in Rasch measurement. Structural Equation Modeling 3: 25–40 [Google Scholar]
Stout W. (1987) A nonparametric approach for assessing latent trait unidimensionality. Psychometrika 52: 589–617 [Google Scholar]
Szabo S.et al. (on behalf of the WHOQOL Group) (1996) The World Health Organization Quality of Life (WHOQOL) assessment instrument,in Spilker B., (ed.), Quality of Life and Pharmacoeconomics in Clinical Trials, 2nd edition Philadelphia: Lippincott-Raven Publishers [Google Scholar]
Tan L.C., Luo N., Nazri M., Li S.C., Thumboo J. (2004) Validity and reliability of the PDQ-39 and the PDQ-8 in English-speaking Parkinson's disease patients in Singapore. Parkinsonism Relat Disord [PubMed]
Tennant A., McKenna S.P., Hagell P. (2004a) Online Application of Rasch analysis in the development and application of quality of life instruments. Value Health 7(Suppl. 1): S22–26 [PubMed] [Google Scholar]
Tennant A., Pallant J. (2006) Unidimensionality matters. Rasch Meas Trans 20: 1048–1051 [Google Scholar]
Tennant A., Penta M., Tesio L., Grimby G., Thonnard J.L., Slade A.et al. (2004b) Assessing and adjusting for cross-cultural validity of impairment and activity limitation scales through differential item functioning within the framework of the Rasch model: the Pro-Esor project. Med Care 42: I37–I48 [PubMed] [Google Scholar]
Wilson M. (2005) Constructing Measures: An Item Response Modelling Approach. Mahwah: Lawrence Erlbaum Associates [Google Scholar]

Articles from Therapeutic Advances in Neurological Disorders are provided here courtesy of SAGE Publications