


That fact makes it possible to test the maxim that reliability limits validity, provided that criteria of validity are chosen that are comparable across all facet scales: More reliable facets ought to be more valid. For a given set of scales, such as the 30 facets of the NEO Inventories (McCrae & Costa, in press), there is differential reliability: Some facets are more reliable than others. Scale reliability is commonly said to limit validity ( John & Soto, 2007) in principle, more reliable scales should yield more valid assessments (although of course reliability is not sufficient to guarantee validity). Further research on the nature and determinants of retest reliability is needed. Internal consistency of scales can be useful as a check on data quality, but appears to be of limited utility for evaluating the potential validity of developed scales, and it should not be used as a substitute for retest reliability. Available evidence suggests the same pattern of results for other personality inventories. Two estimates of retest reliability were independent predictors of the three validity criteria none of three estimates of internal consistency was. Composite estimates of facet scale stability, heritability, and cross-observer validity were broadly generalizable.

We evaluated the extent to which (a) psychometric properties of facet scales are generalizable across ages, cultures, and methods of measurement and (b) validity criteria are associated with different forms of reliability. We examined data ( N = 34,108) on the differential reliability and validity of facet scales from the NEO Inventories.
