## Discrepancy comparisons and confidence intervals for psychometric test results

The site contains several tools for analyzing psychometric test results, such as calculating confidence intervals and comparing discrepancies.

#### 1. Confidence intervals of psychometric test results

Psychometric test results are not absolutely accurate. Confidence intervals are estimates of the range in which the true value lies with a certain probability. The following calculation determines the confidence interval based on the standard error of estimation. The result can also be corrected for the effect of regression to the mean. In this case, the estimated value is also displayed.

 Type of score ---T scoreIQ scorez scorePercentile Score Reliability Confidence ---99%95%90%80%68% Regression to the mean Estimated Value Confidence interval

The calculation is performed using the following formula (correcting for regression to the mean with estimated value zpredicted = zscore * rel; for standard estimation errors, see Krum, Amelang & Schmidt-Atzert, 2022, p. 149):

$\mathrm{conf}=\mathrm{score}±\mathrm{SD}\cdot {z}_{1-\frac{\alpha }{2}}\cdot \sqrt{\mathrm{rel}\cdot \left(1-\mathrm{rel}\right)}$

In the case of percentiles, these are first converted to z-values and then the interval limits are recalculated. Note that the procedure is only useful for reliability parameters > .5. For lower values, the standard measurement error is used.

#### 2. Test for difference from fixed value

If you want to test against a fixed value, it is sufficient to use the one-sided confidence interval and test with z1-α (Krum et al., 2022, p. 151 f.). However, it is still necessary to specify the direction in which the result is to be secured. It also depends on the exact formulation of the hypothesis. Because of the higher accuracy of the calculation, the regression to the mean correction should be applied again.

For example, if a person scores an IQ of 135, it can be investigated whether the result is significantly higher than a value of 130, which is considered the threshold for giftedness. In this case, the score would have to be significantly higher than 130 (the direction must be "... is higher than..."). Another question may be whether giftedness can be ruled out, for example, if the result is 122. In this case, one would hedge upwards (direction "... is lower than...") and require a non-significant result, which would correspond to the statement that giftedness cannot be ruled out.

 Type of score ---T scoreIQ scorez scorePercentile Result Direction of test ---... is higher than ...... is lower than ... Cutoff Reliability Significance level ---.01.05.10.20 Regression to the mean Result of the hypothesis test

The test is one-tailed with the standard error of measurement using the following formula (correction for regression to the mean is made with the estimated value zpredicted = zscore * rel):

$\mathrm{cutoff}=\mathrm{score}±\mathrm{SD}\cdot {z}_{1-\alpha }\cdot \sqrt{1-\mathrm{rel}}$

In the case of percentiles, these are first converted to z-scores and then the interval limits are recalculated.

#### 3. Test for differences of repeated measurement (Reliable Change)

.

When a test is repeated on an individual, the so-called Reliable Change Index (RCI; Jacobson & Truax, 1991; see also Krum et al., 2022, p. 153) can be determined. The RCI can be interpreted as a test variable in a z-test. It can be used to express whether there are significant differences between two test scores, e.g. whether an intervention has led to a significant change in characteristics.

 Type of score ---T scoreIQ scorez scorePercentile Result 1 Result 2 Reliability Test value Interpretation

As in Calculators 1 and 2, the percentiles are converted to z-scores per inverse cumulative normal distribution prior to the calculation. The formula for calculating the RCI is based on Jacobson and Truax (1991; see Krum et al., 2022, p. 153):

$\mathrm{RCI}=\frac{{x}_{1}-{x}_{2}}{\mathrm{SD}\cdot \sqrt{2\cdot \left(1-\mathrm{rel}\right)}}$

#### 4. Discrepancy of two test results of a person

.

When a person is tested with different tests or scales of a test, it can be interesting to compare the results. For example, one might want to investigate whether logical reasoning is better developed than verbal comprehension, if intelligence tests do not already provide such analysis options. Or one might want to clarify whether the stress levels of different clinical symptoms differ.

 Type of score ---T scoreIQ scorez scorePercentile Result 1 Result 2 Reliability 1 Reliability 2 Test value Interpretation

As with the previous calculators, percentiles are converted to z-scores by inverse cumulative normal distribution before calculation. In general, the procedure is also suitable for raw values, provided that the population mean is known. This is given per se when norm scores are used. For both test results, a value Yi must first be calculated. Then the test statistic z can be determined. The formulas for calculating the test statistic to compare the test results (Krum et al., 2022, p. 154f.):

${Y}_{i}=\frac{{X}_{i}}{\sqrt{{\mathrm{rel}}_{i}}}+M\cdot \left(1-\frac{1}{\sqrt{{\mathrm{rel}}_{i}}}\right)$

and

${z}_{diff}=\frac{{Y}_{1}-{Y}_{2}}{\mathrm{SD}\cdot \sqrt{\frac{1-{\mathrm{rel}}_{1}}{{\mathrm{rel}}_{1}}+\frac{1-{\mathrm{rel}}_{2}}{{\mathrm{rel}}_{2}}}}$

#### 5. Profile analysis: test for identity, structure and profile height of two profiles

Profiles of psychometric results can be analyzed regarding equality (= profile identity), structure (= profile shape), or magnitude (= profile height) (cf. Huber 1973, chap. 10). Such an approach can be applied, for example, to intelligence profiles or the clinical stress spectrum before and after therapy. Huber (1973, p.) gives as an example the result of a 35-year-old person in the Intelligence Structure Test (IST; Amthauer, 1953) with the following results:

 Subtest Testing 1 Testing 2 Sentence completion 92 113 Vocabulary 103 96 Analogy 93 113 Similarities 100 116 Memory tasks 94 103 Calculation tasks 102 104 Cube tasks 109 98

Results are reported as standard scores (M = 100, SD = 10), and subtests have a uniform reliability of r = .9. Results can now be tested to determine whether the two profiles are identical and, if not, whether they differ in height, structure, or both.

Hypothesis test
Norm scale
Number of scales
Reliability
Total reliability
Test results
 Scale 1 Scale 2 Scale 3 Scale 4 Scale 5 Scale 6 Scale 7 Scale 8 Scale 9 Score 1 Score 2 Rel

#### References

Chi2 tests are conducted with the help of jStat.

• Amthauer, R. (1953). IST-Intelligenz-Struktur-Test. Hogrefe.
• Huber, H. (1973). Psychometrische Einzelfalldiagnostik [Psychometric diagnosis for single cases]. Beltz.
• Jacobson, N. S., & Truax, P. (1991). Clinical significance: a statistical approach to defining meaningful change in psychotherapy research. Journal of Consulting and Clinical Psychology, 59, 12-19.
• Krum, S., Schmidt-Atzert, L., & Amelang, M. (2022). Grundlagen diagnostischer Verfahren [Basics of diagnostic procedures]. In L. Schmidt-Atzert, S. Krum & M. Amelang (Eds.), Psychologische Diagnostik (6th ed.) (pp. 39-209). Springer. doi:10.1007/978-3-662-61643-7

Citeable source:
Lenhard, W. & Lenhard, A. (2023). Confidence intervals, test of discrepancy and profile analysis for psychometric results. available: https://www.psychometrica.de/discrepancy.html. Psychometrica.