 ## cNORM - Weighting

Representativeness of the norm sample is essential for the estimation of valid norm scores. To achieve this, random sampling is usually applied. But even if there are no systematic biases in data collection, the resulting sample might deviate from the population composition. cNORM offers functionality to integrate sampling weights into the norming process and, therefore, to reduce negative effects of non-representative norm samples on the norm score quality. For this purpose, the so-called raking (= iterative proportional fitting) was integrated, which allows post-stratifying the used norm sample with respect to one or more stratification variables (SVs) for given population marginals of the used SVs. Cases are weighted so that the composition of the weighted data set corresponds to the representative population.

### Computation and standardization of raking weights

To compute the weights, please provide a data frame with three columns to specify the population marginals. The first column specifies the stratification variable, the second the factor level of the stratification variable and the third the proportion for the representative population. The function 'computeWeights()' is used to retrieve the weights. The original data and the marginals have to be passed as function parameters.

Afterwards, the norm sample is ranked with respect to the raking weights. The standardized raking weights are as well used in the weighted best-subset regression to obtain an adequate norm model. This step is the actual start of the further regression-based norming approach and it is automatically applied in the 'cnorm()' function, as soon as weights are specified. In the following example, there are the two stratification variables sex and migration and both have two factor levels. Weights are computed for the ppvt dataset, which includes both SV.

marginals <- data.frame(var = c("sex", "sex", "migration", "migration"),
level = c(1,2,0,1),
prop = c(0.51, 0.49, 0.65, 0.35))

weights <- computeWeights(data = ppvt, population.margins = marginals)

Using the 'cnorm()' function passing the raking weights by function parameter 'weights', the intial weighted ranking and the actual norming process is started.

### Caveats and recommendation for use

We extensively simulated biased distributions and assessed, if our approach can mitigate the effects of unrepresentative samples. cNORM itself already corrects for several types of sampling eror, namely if deviations occur in specific age groups or if joint probabilities of stratification variables are unbalanced (while preserving the marginals). Weighted Continuous Norming as well works very well in most, but not all use cases. Please note the following:

• Non-representativeness in most cases leads to (moderately) increased error of the normed scores. It is - of course - always better to ensure the highest feasible degree of representativeness in the data collection.
• The data collection should be as random as possible.
• In most but not in all cases, Weighted Continuous Norming reduces negative effects of non-representative norm samples. If the mean of the standardized weights exceeds a value of mweights = 2, this is an indication to rather not use weighting.
• With cNORM, representativeness need not necessarily be established in every single age group. If the marginals are more or less correct, weighting is unnecessary.
• Only use stratification for variables with considerable influence on the dependent variable.
• If available, the probabilities of cross-classifications of the stratification variables can be used. You can recode several variables into one and directly specify the according population marginals (especially in combination with the next point).
• Avoid too many stratification variables with many fine-grained levels. This leads to high weights in specific subgroups. Rather combine different levels of stratification variables, if the according subgroups do not differ in the outcome variable. Data preparation Modeling 