Online-Calculator for hypothesis testing, profile analysis, discrepancy comparisons, and confidence intervals for psychometric test results: Psychometrica

Discrepancy comparisons and confidence intervals for psychometric test results

The site contains several tools for analyzing psychometric test results, such as calculating confidence intervals and comparing discrepancies.

Confidence intervals of psychometric test results
Test for difference from fixed value
Test for differences in repeated measurements (Reliable Change)
Discrepancy of two test results of one person
Profile analysis: test for identity, structure and profile height of two profiles
References

1. Confidence intervals of psychometric test results

Psychometric test results are not absolutely accurate. Confidence intervals are estimates of the range in which the true value lies with a certain probability. The following calculation determines the confidence interval based on the standard error of estimation. The result can also be corrected for the effect of regression to the mean. In this case, the estimated value is also displayed.

Type of score
Score
Reliability
Confidence
Regression to the mean
Estimated Value
Confidence interval

The calculation is performed using the following formula (correcting for regression to the mean with estimated value z_predicted = z_score * rel; for standard estimation errors, see Krum, Amelang & Schmidt-Atzert, 2022, p. 149):

conf = score \pm SD \cdot z_{1 - \frac{α}{2}} \cdot \sqrt{rel \cdot (1 - rel)}

In the case of percentiles, these are first converted to z-values and then the interval limits are recalculated. Note that the procedure is only useful for reliability parameters > .5. For lower values, the standard measurement error is used.

2. Test for difference from fixed value

If you want to test against a fixed value, it is sufficient to use the one-sided confidence interval and test with z_1-α (Krum et al., 2022, p. 151 f.). However, it is still necessary to specify the direction in which the result is to be secured. It also depends on the exact formulation of the hypothesis. Because of the higher accuracy of the calculation, the regression to the mean correction should be applied again.

For example, if a person scores an IQ of 135, it can be investigated whether the result is significantly higher than a value of 130, which is considered the threshold for giftedness. In this case, the score would have to be significantly higher than 130 (the direction must be "... is higher than..."). Another question may be whether giftedness can be ruled out, for example, if the result is 122. In this case, one would hedge upwards (direction "... is lower than...") and require a non-significant result, which would correspond to the statement that giftedness cannot be ruled out.

Type of score
Result
Direction of test
Cutoff
Reliability
Significance level
Regression to the mean
Result of the hypothesis test

The test is one-tailed with the standard error of measurement using the following formula (correction for regression to the mean is made with the estimated value z_predicted = z_score * rel):

cutoff = score \pm SD \cdot z_{1 - α} \cdot \sqrt{1 - rel}

In the case of percentiles, these are first converted to z-scores and then the interval limits are recalculated.

3. Test for differences of repeated measurement (Reliable Change)

When a test is repeated on an individual, the so-called Reliable Change Index (RCI; Jacobson & Truax, 1991; see also Krum et al., 2022, p. 153) can be determined. The RCI can be interpreted as a test variable in a z-test. It can be used to express whether there are significant differences between two test scores, e.g. whether an intervention has led to a significant change in characteristics.

Type of score
Result 1
Result 2
Reliability
Test value
Interpretation

As in Calculators 1 and 2, the percentiles are converted to z-scores per inverse cumulative normal distribution prior to the calculation. The formula for calculating the RCI is based on Jacobson and Truax (1991; see Krum et al., 2022, p. 153):

RCI = \frac{x_{1} - x_{2}}{SD \cdot \sqrt{2 \cdot (1 - rel)}}

4. Discrepancy of two test results of a person

When a person is tested with different tests or scales of a test, it can be interesting to compare the results. For example, one might want to investigate whether logical reasoning is better developed than verbal comprehension, if intelligence tests do not already provide such analysis options. Or one might want to clarify whether the stress levels of different clinical symptoms differ.

Type of score
Result 1
Result 2
Reliability 1
Reliability 2
Test value
Interpretation

As with the previous calculators, percentiles are converted to z-scores by inverse cumulative normal distribution before calculation. In general, the procedure is also suitable for raw values, provided that the population mean is known. This is given per se when norm scores are used. For both test results, a value Y_i must first be calculated. Then the test statistic z can be determined. The formulas for calculating the test statistic to compare the test results (Krum et al., 2022, p. 154f.):

Y_{i} = \frac{X_{i}}{\sqrt{{rel}_{i}}} + M \cdot (1 - \frac{1}{\sqrt{{rel}_{i}}})

and

z_{diff} = \frac{Y_{1} - Y_{2}}{SD \cdot \sqrt{\frac{1 - {rel}_{1}}{{rel}_{1}} + \frac{1 - {rel}_{2}}{{rel}_{2}}}}

5. Profile analysis: test for identity, structure and profile height of two profiles

Profiles of psychometric results can be analyzed regarding equality (= profile identity), structure (= profile shape), or magnitude (= profile height) (cf. Huber 1973, chap. 10). Such an approach can be applied, for example, to intelligence profiles or the clinical stress spectrum before and after therapy. Huber (1973, p.) gives as an example the result of a 35-year-old person in the Intelligence Structure Test (IST; Amthauer, 1953) with the following results:

Subtest	Testing 1	Testing 2
Sentence completion	92	113
Vocabulary	103	96
Analogy	93	113
Similarities	100	116
Memory tasks	94	103
Calculation tasks	102	104
Cube tasks	109	98

Results are reported as standard scores (M = 100, SD = 10), and subtests have a uniform reliability of r = .9. Results can now be tested to determine whether the two profiles are identical and, if not, whether they differ in height, structure, or both.

Hypothesis test

Norm scale

Number of scales

Reliability

Total reliability

Test results

	Scale 1	Scale 2	Scale 3	Scale 4	Scale 5	Scale 6	Scale 7	Scale 8	Scale 9
Score 1
Score 2
Rel

References

Chi² tests are conducted with the help of jStat.

Amthauer, R. (1953). IST-Intelligenz-Struktur-Test. Hogrefe.
Huber, H. (1973). Psychometrische Einzelfalldiagnostik [Psychometric diagnosis for single cases]. Beltz.
Jacobson, N. S., & Truax, P. (1991). Clinical significance: a statistical approach to defining meaningful change in psychotherapy research. Journal of Consulting and Clinical Psychology, 59, 12-19.
Krum, S., Schmidt-Atzert, L., & Amelang, M. (2022). Grundlagen diagnostischer Verfahren [Basics of diagnostic procedures]. In L. Schmidt-Atzert, S. Krum & M. Amelang (Eds.), Psychologische Diagnostik (6th ed.) (pp. 39-209). Springer. doi:10.1007/978-3-662-61643-7

Citeable source:
Lenhard, W. & Lenhard, A. (2023). Confidence intervals, test of discrepancy and profile analysis for psychometric results. available: https://www.psychometrica.de/discrepancy.html. Psychometrica.