The Measurement of Drivers' Mental Workload


Chapter 5.1

Traffic research: Self report measures

In this chapter, experience with driver self-report workload ratings will be described. The Dutch RSME and the originally German Activation scale will first be treated. The activation scale of the RECL is `an odd one out', but is included because it was the only self-report rating that is available from the Weaving section, Noise Barrier and Road Layout studies (see table 4). Results obtained by others with the Task Load Index and SWAT are discussed under other self-report measures.

RSME, Rating Scale Mental Effort
In traffic research, the RSME (Zijlstra & Van Doorn, 1985, Zijlstra & Meijman, 1989) was used in the car-phone study and in the simulator experiment, and effects are compared with effects of the sedative antihistamine Triprolidine and the effects of alcohol and time-on-task (Car-Phone, Tutoring, Antihistamine and DREAM respectively in table 3). In figure 3 the absolute scores on the RSME scale of these four studies are indicated. Baseline ratings of effort of driving are compared with ratings of effort while driving and using a car-phone (load), driving without (baseline) vs. with (load) a switched-on enforcement and feedback system (Tutoring), and driving under placebo vs. under the influence of Triprolidine (load). The effects of 0.5 ‰ alcohol and fatigue (2.5 hours of driving) could not, due to the experimental design, be compared with baseline ratings, which could not be collected. In figure 4 the change in scale values of the load condition opposed to baseline is indicated for the studies that included such a condition. All ratings were collected after completion of the driving task.


Figure 3. Average ratings of exerted effort on the unidimensional RSME of baseline driving and car-phone use (both on the motorway, Pmw, and on a busy ringroad, Prr), driving with and without an enforcement & tutoring system (Tut), driving under placebo and Triprolidine (overall rating, Tri), and driving on the motorway under the influence of alcohol (Alc(mw)) and while fatigued (Fat(mw)). If available, the 95% confidence interval is indicated.


Figure 4. Average change in ratings of exerted effort on the unidimensional RSME in the case of car-phone use (Pmw and Prr), driving with an enforcement and tutoring system (Tut), driving under influence of Triprolidine (Tri), all compared with the baseline (control, placebo) measurement.

In all cases the RSME was able to distinguish between task-load situation and baseline. An increase in effort was reported in the case of car-phone use and as a result of the behavioural adaptation required by the enforcement system. The sedative effect of Triprolidine also resulted in an increase in effort exerted. Between the Tutoring and Car-phone study important differences in baseline values were found. These differences may reflect differences between the subjects who participated, but it is more likely that they reflect differences between the baseline tasks. As mentioned previously, the effects of task load were compared with baseline driving. For the Tutoring experiment, baseline driving included handling a simulator car and driving through a varied area, while in the Car-phone experiment an instrumented vehicle had to be driven through traffic. Judging from the absolute scores, the latter task is less effortful. Recently, support for this statement was found in a study in which the same subjects performed the same task both in traffic and in a simulator (De Waard & Brookhuis, in press). Driving in the simulator required more effort as measured with the RSME.

Activation scale
Bartenwerfer's activation scale was used in two studies that are listed in table 4. In the DREAM experiment, however, no baseline ratings could be collected (Average rating for Alc(mw) was 130.5 and for Fat(mw) 139.0.). The effect of Triprolidine on reported activation level estimated over the whole journey, was not significant. The application of the activation scale to traffic research has mainly been limited to drug research. An indication of the measure's sensitivity to affected driver state can be obtained by looking at the results from these `drugs & driving' studies. In figure 5 the change in scale values, compared with placebo, is listed for drugs as measured in five on-the-road studies. The average placebo value over all studies was 131, which is just below the reference point `I am solving a crossword puzzle' (see appendix B for the scale). Data regarding antidepressants, hypnotics, analgesics, tranquillizers and antihistamines have been taken from Louwerens et al. (1983), Volkerts et al. (1984), Brookhuis et al. (1985a), Volkerts et al. (1987) and De Vries et al. (1989), respectively.
The most pronounced effect on reported activation level was the reduction found in the antidepressant study. One hypnotic reduced reported activation level, while the analgesic showed a dose-related effect. This last effect was not in the expected direction, activation level increased with an increase in dose of this drug for pain-treatment. However, in that study performance measures did not decline with increasing dose either, and nor did reaction-time performance in a laboratory task (see Brookhuis et al., 1985a).


Figure 5. Average change in rated activation in five drug studies compared with the placebo conditions

No effects on the scale were found in the tranquillizer and antihistamine studies. On the basis of these studies it seems that the scale is fit to be used for effects on subjectively experienced effects on the Central Nervous System. The relation of the scale to mental workload in general is, at present, hard to assess. It seems likely that the scale is of particular use in the areas further away from optimal performance, hence in the D and C regions.

Other self-report measures used

RECL, Road Environment Construct List
The Road Environment Construct List (RECL, Steyvers, 1993, Steyvers et al., 1994) was developed to measure appraisal of road environments. The RECL is a three factor scale. Each of the sixteen items load on one of three factors. The factors are: `Hedonic value', which denotes the aesthetic appraisal of the road and its environment, `Perceptual variation' denoting the heterogeneity in the road environment, and `Activation value' denoting the extent to which the road and environment are considered to be activating. The latter factor may be useful for workload measurement in a traffic environment.
The RECL was used in studies in which the RSME was not used and therefore the Activation value of the RECL is included in the evaluation on usefulness as an indicator of driver activation. Though the driver is asked to evaluate the road and its environment, an activating effect of the environment could be related to road-environment demands and might therefore influence driver mental activation.

Although the trend in scores of baseline and load conditions in the road-layout experiment was in the direction of increased load, differences between the two conditions on both roads (Wr and Mr) were not significant. In the two motorway studies no baseline measurements were taken. However, two other conditions of these experiments could be compared: driving without and with (`c') eye-movement equipment mounted on subjects' heads. In both studies subjects did not rate the activating influence of the environment different as a result of the equipment (see table 5).

Table 5. Average rating on the Activation scale of the RECL. Baseline measurements were only collected in the road layout study.
  Baseline Load Significance (t-test)
Weaving Section
Weaving Section (c)
-
-
3.6
3.7
| ns
Noise Barrier
Noise Barrier (c)
-
-
3.6
3.7
| ns
Road Layout Wr 4.2 4.5 ns
Road Layout Mr 3.2 3.8 ns

Other self-report measures in other studies

TLX, Task Load Index
In none of the studies listed in table 3 was the NASA Task Load Index (TLX) used. A few on-the-road studies reporting the use of this self-report measure were found. Fairclough et al. (1991) used the RTLX (Byers et al., 1989) in a dual-task performance study. They found an increase in overall workload in the dual task condition, which consisted of driving plus having a conversation, compared with single-task performance, which was normal driving. The RTLX was also used in another study performed in the same vehicle (Vaughan et al., 1994). In the experiment RDS (Radio Data System) messages had to be attended to. The messages were presented to subjects in three conditions in a within-subjects design: 1. auditory, 2. auditory and continuously visible on a display, and 3. auditory and temporarily (15 s) visible on a display. Overall RTLX mental workload rating was lowest for condition 2, auditory plus visual constant. The RTLX factors `mental effort' and `time pressure' showed a similar effect (the lowest rating for condition 2, the highest rating for condition 1 and slightly less high for condition 3). The results found in this dual task study illustrate the diagnosticity of the RTLX in the reflection of higher scores on the time-pressure factor in the case of auditory messages and no or quickly disappearing visual information.
In a simulator study in which the effects of a hands-free car-phone were tested Alm & Nilsson (1994) found an effect of the car-phone task on all subscales of the TLX. An interaction between car-phone use and driving-task difficulty (in terms of driving a straight opposed to a winding road) was only found on the frustration subscale, and not on the mental-demand or operator-effort subscales.

SWAT
The SWAT was used in simulator and on-the-road experimental tests of the GIDS system (Janssen et al., 1994). The system gave support to the driver by route guidance messages, and with respect to speed, collision avoidance and lane keeping (simulator trials only). Judging from the SWAT-reference that was provided in the text, an adapted version was used in which the card-sort section was left out. The authors report the overall mental workload index, which is defined as the addition of three 3-point scales (time stress, mental effort and psychological stress) resulting in a sum-scale range from 3 to 9. SWAT ratings differed between integrated and non-integrated GIDS support both in the simulator trials and in the on-the-road tests. The difference between integrated and non-integrated support was that support was only scheduled according to demand in the first condition. Scheduling includes, for instance, postponing an incoming phone call in the event that a lead vehicle brakes suddenly.
In an on-the-road experiment Verwey & Veltman (1995) found that summational SWAT ratings were equally sensitive to increases in workload as ratings on the RSME. Inclusion of the card-sort task for SWAT did not yield more accurate workload estimates.

Properties of self-report measures

Sensitivity, selectivity, diagnosticity, validity and primary-task intrusion are of major importance for a measure of driver workload. These properties were assessed as adequately as possible on the basis of the above-described experiments. The region in which the measure was found to be sensitive is indicated under sensitivity, and region-sensitivity has to be considered the prime property.
The RSME is designed to reflect operator effort. In the car-phone and tutoring experiments the RSME was found to be sensitive to task-related effort, while in the antihistamine study the rating scale was sensitive to state-related effort. Accordingly, when performance is in Region A1 and A3/B the RSME can be expected to reflect driver mental effort. The drug studies showed that the activation scale is in particular sensitive to an affected driver state as a result of (highly) sedative medicine such as hypnotics and antidepressants. Increased activation levels, e.g., as a result of the use of amphetamine (Sanders, 1983), can be expected to be reflected in higher activation scores, but as yet, there is, to my knowledge, no evidence available from empirical studies to support this prediction.
Diagnosticity for the two unidimensional scales is low unless they are applied per task dimension as proposed by Zijlstra & Meijman (1989). Selectivity is difficult to assess as the main other factor to which the scales could be sensitive, physical workload, is very restricted in driving. Reliability is high, as sensitivity to mental workload in the different studies is high. Primary-task intrusion is low as long as the ratings are asked after completion of the task. Since hardly any equipment is required for collection of the measures the implementation requirements are low. No problems in operator acceptance have been encountered, so informal evidence supports high operator acceptance. In table 6 the results are summarized.

Table 6 Summary of properties of self-report workload measures.
  Measure  
Property RSME Activation
sensitivity (Region) (D-)A1, A3-B D, (B-C)
diagnosticity low low
selectivity prob. high (?)
Reliability high high (?)
primary-task intrusion low low
implementation requirements low low
operator acceptance high high

to chapter 5.2 Primary-task performance measures
to chapter 5.3 Secondary-task performance measures
to chapter 5.4 Physiology and discussion
back to chapter 5
back to thesis summary

I like to hear from you, so if you find this information useful, a short message is very much appreciated. For more information you can also contact me.
© Dick de Waard 1996
You may only use (parts) of this thesis if you quote the source:
De Waard, D. (1996). The measurement of drivers' mental workload. PhD thesis, University of Groningen. Haren, The Netherlands: University of Groningen, Traffic Research Centre.

Back to my HomePage