The site contains several tools for analyzing psychometric test results, such as calculating confidence intervals and comparing discrepancies.
Psychometric test results are not absolutely accurate. Confidence intervals are estimates of the range in which the true value lies with a certain probability. The following calculation determines the confidence interval based on the standard error of estimation. The result can also be corrected for the effect of regression to the mean. In this case, the estimated value is also displayed.
| Type of score | |
| Score | |
| Reliability | |
| Confidence | |
| Regression to the mean | |
| Estimated Value | |
| Confidence interval |
The calculation is performed using the following formula (correcting for regression to the mean with estimated value zpredicted = zscore * rel; for standard estimation errors, see Krum, Amelang & Schmidt-Atzert, 2022, p. 149):
If you want to test against a fixed value, it is sufficient to use the one-sided confidence interval and test with z1-α (Krum et al., 2022, p. 151 f.). However, it is still necessary to specify the direction in which the result is to be secured. It also depends on the exact formulation of the hypothesis. Because of the higher accuracy of the calculation, the regression to the mean correction should be applied again.
For example, if a person scores an IQ of 135, it can be investigated whether the result is significantly higher than a value of 130, which is considered the threshold for giftedness. In this case, the score would have to be significantly higher than 130 (the direction must be "... is higher than..."). Another question may be whether giftedness can be ruled out, for example, if the result is 122. In this case, one would hedge upwards (direction "... is lower than...") and require a non-significant result, which would correspond to the statement that giftedness cannot be ruled out.
| Type of score | |
| Result | |
| Direction of test | |
| Cutoff | |
| Reliability | |
| Significance level | |
| Regression to the mean | |
| Result of the hypothesis test |
The test is one-tailed with the standard error of measurement using the following formula (correction for regression to the mean is made with the estimated value zpredicted = zscore * rel):
When a test is repeated on an individual, the so-called Reliable Change Index (RCI; Jacobson & Truax, 1991; see also Krum et al., 2022, p. 153) can be determined. The RCI can be interpreted as a test variable in a z-test. It can be used to express whether there are significant differences between two test scores, e.g. whether an intervention has led to a significant change in characteristics.
| Type of score | |
| Result 1 | |
| Result 2 | |
| Reliability | |
| Test value | |
| Interpretation |
As in Calculators 1 and 2, the percentiles are converted to z-scores per inverse cumulative normal distribution prior to the calculation. The formula for calculating the RCI is based on Jacobson and Truax (1991; see Krum et al., 2022, p. 153):
When a person is tested with different tests or scales of a test, it can be interesting to compare the results. For example, one might want to investigate whether logical reasoning is better developed than verbal comprehension, if intelligence tests do not already provide such analysis options. Or one might want to clarify whether the stress levels of different clinical symptoms differ.
| Type of score | |
| Result 1 | |
| Result 2 | |
| Reliability 1 | |
| Reliability 2 | |
| Test value | |
| Interpretation |
As with the previous calculators, percentiles are converted to z-scores by inverse cumulative normal distribution before calculation. In general, the procedure is also suitable for raw values, provided that the population mean is known. This is given per se when norm scores are used. For both test results, a value Yi must first be calculated. Then the test statistic z can be determined. The formulas for calculating the test statistic to compare the test results (Krum et al., 2022, p. 154f.):
Profiles of psychometric results can be analyzed regarding equality (= profile identity), structure (= profile shape), or magnitude (= profile height) (cf. Huber 1973, chap. 10). Such an approach can be applied, for example, to intelligence profiles or the clinical stress spectrum before and after therapy. Huber (1973, p.) gives as an example the result of a 35-year-old person in the Intelligence Structure Test (IST; Amthauer, 1953) with the following results:
| Subtest | Testing 1 | Testing 2 |
| Sentence completion | 92 | 113 |
| Vocabulary | 103 | 96 |
| Analogy | 93 | 113 |
| Similarities | 100 | 116 |
| Memory tasks | 94 | 103 |
| Calculation tasks | 102 | 104 |
| Cube tasks | 109 | 98 |
| Hypothesis test | |||||||||||||
| Norm scale | |||||||||||||
| Number of scales | |||||||||||||
| Reliability | |||||||||||||
| Total reliability | |||||||||||||
Test results
|
|||||||||||||
If the results of several scales point in one direction, for example, if they are all above or below average, then the overall result is more extreme than the average of the individual results. The following form estimates the overall result based on the sums of the individual scales. The (average) correlation is also required. Please enter the individual results separated by spaces, specify the type of scale (T-, IQ-, z-score, or percentile rank), and enter the (mean) correlation. Averaging of correlations can be performed here. Caution is advised if the correlations are very inhomogeneous.
Use case: If all scales in an intelligence test have high or low scores, the total IQ is even more extreme. This is the result of the intercorrelations between the scales. This effect can be used: If different scales correlate only moderately with each other, such as the various subjective assessments in ADHD (the mean correlation of parent, teacher, and self-assessment is approximately r = .3), then it is possible that neither the individual scales nor the average of the scales will reach a critical threshold value. If the intercorrelation of the scales is taken into account, the overall result is more extreme and may reach the threshold required for a diagnosis.
| Type of Scale | |
| Results (sperated by spaces) | |
| Mean Correlation | |
| Standard Deviation of Sum | |
| Estimated Total Score |
The variance of sums or difference values of correlated variables is calculated from the sum of the variances plus the covariance multiplied by the correlation. In the case of standardized variables, the calculation can be simplified (for z scores with SD = 1):
To predict the total value, the sum of the results is divided by the standard deviation of the sum.
Chi2 tests are conducted with the help of jStat.
Citeable source:
Lenhard, W. & Lenhard, A. (2023). Confidence intervals, test of discrepancy and profile analysis for psychometric results. available: https://www.psychometrica.de/discrepancy.html. Psychometrica.