Normalisation (normalization for Americans) attempts to remove score differentials due to different point scoring regimes (CW v phone), transmitter power level (high, low and QRP), and the availability of stations and multipliers due to geography (a function of propagation and proximity to population highs). Normalised scores should reflect operator ability as much as possible. This page describes the method employed in this attempt and discusses some of the limitations.
An underlying critical assumption in the score normalisation is that there is an equal proportion of serious, out-to-win, contesters in each entry-class and in each geographical area. Although I am sure that there are contesters of this description in every category, some categories will attract more of these individuals than others (e.g. the multi-operator category will include a higher than average proportion of serious entries). I have therefore only used the top 10% of entries in each class to calculate the normalising factors. The appropriate assumption is that the top 10% in each group comprise serious, out-to-win contesters.
This is the methodology adopted in normalising the 1998 ARRL 10m contest scores: -
| Mode | QSO Factor | Mult Factor | Power | QSO Factor | Mult Factor | |
| A - Mixed | 1.2 | 0.8 | A - QRP | 2.4 | 1.4 | |
| B - Phone | 1.4 | 1.3 | B - Low | 1.4 | 1.2 | |
| C - CW | 1.9 | 1.4 | C - High | 0.7 | 0.9 | |
| D - Multi | 0.5 | 0.6 | ||||
| Factors less than 1 mean QSOs or mults are reduced during normalisation and vice versa. These factors confirm that QSO totals are lowest for CW (that's why CW QSOs count double points!), higher for phone, highest for mixed mode. Mixed mode obviously also has an advantage on multiplier availability. Power factors are clear too: both QSOs and multipliers are a lot easier to obtain if you run high power. The figures imply that for every 100 QSOs made by a QRP station, a low power station should make 170 and a high power station 340 to remain on par (all other things being equal). | ||||||
The area factors include some unexplained patterns. In particular I am at a loss to suggest why factors for W2 are slightly but significantly higher than for W1 and W3. Indeed, W1-3 and W8-9 all have QSO factors greater than 1 suggesting the NE USA was relatively impoverished in the availability of stations to work. This may in part be a function of latitude. It is not surprising that southern states in W4 and W5 had a QSO-numbers advantage. But what about W7, W0 and VE that also had better-than-average QSO totals? Maybe its because these areas benefited from the sporadic E openings I seem to recall being reported? The western US was clearly less well off for multipliers than the east - presumably related to propagation access to Europe.
Europe is mostly quite high latitude, and us Europeans can find it harder to find QSOs, as suggested by the slightly surprisingly high QSO factor for Europe (perhaps due to the absence of sporadic E in Europe this time). But we did OK for multipliers. Southern continents (AF, SA) have good propagation to high population regions hence the low QSO and multiplier factors for these areas. Asia is the worst off continent having poorer than average access to QSOs and multipliers, while North America is the best off for both. I think this is what one would expect. On balance, the geographic normalisation works.
I know that many of the areas are large with diverse population densities and/or different propagation characteristics. This is a numbers of entries limitation. I used the same areas as ARRL do for listing scores, each of which happen to have a reasonable number of entries. Splitting into sub-areas might be an improvement if each area contained enough entries. There might be scope for splitting VE and, for example, lumping VE1-3 with W1 and W2 (as suggested by Bob VE3SRE) as these areas have more-or-less common populations and propagation conditions.
I am pleased that this process highlights outstanding efforts that are otherwise overlooked - an example is the low power CW score of JA1PS. But I am also aware that relative positions of stations can change with different normalising procedures. I suspect the chosen normalisation has been a little unfair to high powered entrants and to multi-op stations. This may be because these classes have been penalised for including a high proportion of contest-grade stations and top operators. Stations in North America may feel a little aggrieved by the severity of the penalties normalisation imposes on them. I have already mentioned W2 stations as probably getting a more than fair boost.
Other variants on the above methodology I tried (or might have tried) include standardisation (normalising using the difference from the mean divided by the standard deviation), log-normally transforming values before attempting normalisation, identifying outlier scores and removing these from calculations of the normalising factors, other "trimming" strategies, etc. I have not applied the statistical analysis I would employ if this was work and I was getting paid for it, and I am aware of the limitations this imposes. Improvements are possible. But I have other things to do.
The astute may have noticed that GØAEV's normalised relative placing is quite flattering whereas his real score isn't. Yes, there is some personal motive for doing all this. As spelt out in the introduction, I wanted to see how well I was doing relative to everyone else. The fact that I just missed a dx top-ten place in almost the largest entry class (low power phone, 11th out of 212) did spur me on! However, the normalisation method chosen was not influenced by any ulterior motives. I have tried to get the fairest result possible, and hope I have identified the areas where the method is weakest. I would be interested to hear from anyone who has constructive comments on the process or on the benefits or otherwise of normalising scores, but please don't flame me because your relative score is not what you'd expect.