th_define(1M)
NAME
th_define - create fault injection test harness error
specifications
SYNOPSIS
th_define [-n name -i instance| -P path] [-a acc_types] [-
r reg_number] [-l offset [length]] [-c count [failcount]]
[-o operator [operand]] [-f acc_chk] [-w max_wait_period
[report_interval]]
or
th_define [-n name -i instance| -P path] [-a log
[acc_types] [-r reg_number] [-l offset [length]]] [-
c count [failcount]] [-s collect_time] [-p policy] [-
x flags] [-C comment_string] [-e fixup_script [args]]
or
th_define [-h]
DESCRIPTION
The th_define utility provides an interface to the bus_ops
fault injection bofi device driver for defining error injec-
tion specifications (referred to as errdefs). An errdef
corresponds to a specification of how to corrupt a device
driver's accesses to its hardware. The command line argu-
ments determine the precise nature of the fault to be
injected. If the supplied arguments define a consistent
errdef, the th_define process will store the errdef with the
bofi driver and suspend itself until the criteria given by
the errdef become satisfied (in practice, this will occur
when the access counts go to zero).
You use the th_manage(1M) command with the start option to
activate the resulting errdef. The effect of th_manage with
the start option is that the bofi driver acts upon the
errdef by matching the number of hardware accesses-specified
in count, that are of the type specified in acc_types, made
by instance number instance-of the driver whose name is
name, (or by the driver instance specified by path) to the
register set (or DMA handle) specified by reg_number, that
lie within the range offset to offset + length from the
beginning of the register set or DMA handle. It then applies
operator and operand to the next failcount matching
accesses.
If acc_types includes log, th_define runs in automatic test
script generation mode, and a set of test scripts (written
in the Korn shell) is created and placed in a sub-directory
of the current directory with the name <driver>.test.<id>
(for example, glm.test.978177106). A separate, executable
script is generated for each access handle that matches the
logging criteria. The log of accesses is placed at the top
of each script as a record of the session. If the current
directory is not writable, file output is written to stan-
dard output. The base name of each test file is the driver
name, and the extension is a number that discriminates
between different access handles. A control script (with the
same name as the created test directory) is generated that
will run all the test scripts sequentially.
Executing the scripts will install, and then activate, the
resulting error definitions. Error definitions are activated
sequentially and the driver instance under test is taken
offline and brought back online before each test (refer to
the -e option for more information). By default, logging
applies to all PIO accesses, all interrupts, and all DMA
accesses to and from areas mapped for both reading and writ-
ing. You can constrain logging by specifying additional
acc_types, reg_number, offset and length. Logging will con-
tinue for count matching accesses, with an optional time
limit of collect_time seconds.
Either the -n or -P option must be provided. The other
options are optional. If an option (other than -a) is speci-
fied multiple times, only the final value for the option is
used. If an option is not specified, its associated value is
set to an appropriate default, which will provide maximal
error coverage as described below.
OPTIONS
The following options are available:
-n name
Specify the name of the driver to test. (String)
-i instance
Test only the specified driver instance (-1 matches
all instances of driver). (Numeric)
-P path
Specify the full device path of the driver to test.
(String)
-r reg_number
Test only the given register set or DMA handle (-1
matches all register sets and DMA handles). (Numeric)
-a acc_types
Only the specified access types will be matched. Valid
values for the acc_types argument are log, pio, pio_r,
pio_w, dma, dma_r, dma_w and intr. Multiple access
types, separated by spaces, can be specified. The
default is to match all hardware accesses.
If acc_types is set to log, logging will match all PIO
accesses, interrupts and DMA accesses to and from
areas mapped for both reading and writing. log can be
combined with other acc_types, in which case the
matching condition for logging will be restricted to
the specified addional acc_types. Note that dma_r will
match only DMA handles mapped for reading only; dma_w
will match only DMA handles mapped for writing only;
dma will match only DMA handles mapped for both read-
ing and writing.
-l offset [length]
Constrain the range of qualifying accesses. The offset
and length arguments indicate that any access of the
type specified with the -a option, to the register set
or DMA handle specified with the -r option, lie at
least offset bytes into the register set or DMA handle
and at most offset + length bytes into it. The default
for offset is 0. The default for length is the maximum
value that can be placed in an offset_t C data type
(see types.h). Negative values are converted into
unsigned quantities. Thus, th_define -l 0 -1 is maxi-
mal.
-c count[failcount]
Wait for count number of matching accesses, then apply
an operator and operand (see the -o option) to the
next failcount number of matching accesses. If the
access type (see the -a option) includes logging, the
number of logged accesses is given by count + fail-
count - 1. The -1 is required because the last access
coincides with the first faulting access.
Note that access logging may be combined with error
injection if failcount and operator are nonzero and if
the access type includes logging and any of the other
access types (pio, dma and intr) See the description
of access types in the definition of the -a option,
above.
When the count and failcount fields reach zero, the
status of the errdef is reported to standard output.
When all active errdefs created by the th_define pro-
cess complete, the process exits. If acc_types
includes log, count determines how many accesses to
log. If count is not specified, a default value is
used. If failcount is set in this mode, it will simply
increase the number of accesses logged by a further
failcount - 1.
-o operator [operand]
For qualifying PIO read and write accesses, the value
read from or written to the hardware is corrupted
according to the value of operator:
EQ operand is returned to the driver.
OR operand is bitwise ORed with the real value.
AND operand is bitwise ANDed with the real value.
XOR operand is bitwise XORed with the real value.
For PIO write accesses, the following operator is allowed:
NO Simply ignore the driver's attempt to write to the
hardware.
Note that a driver performs PIO via the ddi_getX(),
ddi_putX(), ddi_rep_getX() and ddi_rep_putX() routines
(where X is 8, 16, 32 or 64). Accesses made using ddi_getX()
and ddi_putX() are treated as a single access, whereas an
access made using the ddi_rep_*(9F) routines are broken down
into their respective number of accesses, as given by the
repcount parameter to these DDI calls. If the access is per-
formed via a DMA handle, operator and value are applied to
every access that comprises the DMA request. If interference
with interrupts has been requested then the operator may
take any of the following values:
DELAY After count accesses (see the -c option), delay
delivery of the next failcount number of interrupts
for operand number of microseconds.
LOSE After count number of interrupts, fail to deliver the
next failcount number of real interrupts to the
driver.
EXTRA After count number of interrupts, start delivering
operand number of extra interrupts for the next fail-
count number of real interrupts.
The default value for operand and operator is to corrupt the
data access by flipping each bit (XOR with -1).
-f acc_chk
If the acc_chk parameter is set to 1 or pio, then the
driver's calls to ddi_check_acc_handle(9F) return
DDI_FAILURE when the access count goes to 1. If the
acc_chk parameter is set to 2 or dma, then the
driver's calls to ddi_check_dma_handle(9F) return
DDI_FAILURE when the access count goes to 1.
-w max_wait_period [report_interval]
Constrain the period for which an error definition
will remain active. The option applies only to non-
logging errdefs. If an error definition remains active
for max_wait_period seconds, the test will be aborted.
If report_interval is set to a nonzero value, the
current status of the error definition is reported to
standard output every report_interval seconds. The
default value is zero. The status of the errdef is
reported in parsable format (eight fields, each
separated by a colon (:) character, the last of which
is a string enclosed by double quotes and the remain-
ing seven fields are integers):
ft:mt:ac:fc:chk:ec:s:"message" which are defined as
follows:
ft The UTC time when the fault was injected.
mt The UTC time when the driver reported the fault.
ac The number of remaining non-faulting accesses.
fc The number of remaining faulting accesses.
chk The value of the acc_chk field of the errdef.
ec The number of fault reports issued by the driver
against this errdef (mt holds the time of the
initial report).
s The severity level reported by the driver.
"message"
Textual reason why the driver has reported a
fault.
-h Display the command usage string.
-s collect_time
If acc_types is given with the -a option and includes
log, the errdef will log accesses for collect_time
seconds (the default is to log until the log becomes
full). Note that, if the errdef specification matches
multiple driver handles, multiple logging errdefs are
registered with the bofi driver and logging terminates
when all logs become full or when collect_time expires
or when the associated errdefs are cleared. The
current state of the log can be checked with the
th_manage(1M) command, using the broadcast parameter.
A log can be terminated by running th_manage(1M) with
the clear_errdefs option or by sending a SIGALRM sig-
nal to the th_define process. See alarm(2) for the
semantics of SIGALRM.
-p policy
Applicable when the acc_types option includes log. The
parameter modifies the policy used for converting from
logged accesses to errdefs. All policies are
inclusive:
o Use rare to bias error definitions toward rare
accesses (default).
o Use operator to produce a separate error defini-
tion for each operator type (default).
o Use common to bias error definitions toward com-
mon accesses.
o Use median to bias error definitions toward
median accesses.
o Use maximal to produce multiple error defini-
tions for duplicate accesses.
o Use unbiased to create unbiased error defini-
tions.
o Use onebyte, twobyte, fourbyte, or eightbyte to
select errdefs corresponding to 1, 2, 4 or 8
byte accesses (if chosen, the -xr option is
enforced in order to ensure that ddi_rep_*()
calls are decomposed into multiple single
accesses).
o Use multibyte to create error definitions for
multibyte accesses performed using
ddi_rep_get*() and ddi_rep_put*().
Policies can be combined by adding together these
options. See the NOTES section for further informa-
tion.
-x flags
Applicable when the acc_types option includes log. The
flags parameter modifies the way in which the bofi
driver logs accesses. It is specified as a string con-
taining any combination of the following letters:
w Continuous logging (that is, the log will wrap
when full).
t Timestamp each log entry (access times are in
seconds).
r Log repeated I/O as individual accesses (for
example, a ddi_rep_get16(9F) call which has a
repcount of N is logged N times with each tran-
saction logged as size 2 bytes. Without this
option, the default logging behavior is to log
this access once only, with a transaction size
of twice the repcount).
-C comment_string
Applicable when the acc_types option includes log. It
provides a comment string to be placed in any gen-
erated test scripts. The string must be enclosed in
double quotes.
-e fixup_script [args]
Applicable when the acc_types option includes log. The
output of a logging errdefs is to generate a test
script for each driver access handle. Use this option
to embed a command in the resulting script before the
errors are injected. The generated test scripts will
take an instance offline and bring it back online
before injecting errors in order to bring the instance
into a known fault-free state. The executable
fixup_script will be called twice with the set of
optional args- once just before the instance is taken
offline and again after the instance has been brought
online. The following variables are passed into the
environment of the called executable:
DRIVER_PATH
Identifies the device path of the instance.
DRIVER_INSTANCE
Identifies the instance number of the device.
DRIVER_UNCONFIGURE
Has the value 1 when the instance is about to be taken
offline.
DRIVER_CONFIGURE
Has the value 1 when the instance has just been
brought online.
Typically, the executable ensures that the device under test
is in a suitable state to be taken offline (unconfigured) or
in a suitable state for error injection (for example config-
ured, error free and servicing a workload). A minimal script
for a network driver could be:
#!/bin/ksh
driver=xyznetdriver
ifnum=$driver$DRIVER_INSTANCE
if [[ $DRIVER_CONFIGURE = 1 ]]; then
ifconfig $ifnum plumb
ifconfig $ifnum ...
ifworkload start $ifnum
elif [[ $DRIVER_UNCONFIGURE = 1 ]]; then
ifworkload stop $ifnum
ifconfig $ifnum down
ifconfig $ifnum unplumb
fi
exit $?
The -e option must be the last option on the command line.
If the -a log option is selected but the -e option is not
given, a default script is used. This script repeatedly
attempts to detach and then re-attach the device instance
under test.
EXAMPLES
Examples of Error Definitions
th_define -n foo -i 1 -a log
Logs all accesses to all handles used by instance 1 of the
foo driver while running the default workload (attaching and
detaching the instance). Then generates a set of test
scripts to inject appropriate errdefs while running that
default workload.
th_define -n foo -i 1 -a log pio
Logs PIO accesses to each PIO handle used by instance 1 of
the foo driver while running the default workload (attaching
and detaching the instance). Then generates a set of test
scripts to inject appropriate errdefs while running that
default workload.
th_define -n foo -i 1 -p onebyte median -e fixup arg -now
Logs all accesses to all handles used by instance 1 of the
foo driver while running the workload defined in the fixup
script fixup with arguments arg and -now. Then generates a
set of test scripts to inject appropriate errdefs while run-
ning that workload. The resulting error definitions are
requested to focus upon single byte accesses to locations
that are accessed a median number of times with respect to
frequency of access to I/O addresses.
th_define -n se -l 0x20 1 -a pio_r -o OR 0x4 -c 10 1000
Simulates a stuck serial chip command by forcing 1000 con-
secutive read accesses made by any instance of the se driver
to its command status register, thereby returning status
busy.
th_define -n foo -i 3 -r 1 -a pio_r -c 0 1 -f 1 -o OR 0x100
Causes 0x100 to be ORed into the next physical I/O read
access from any register in register set 1 of instance 3 of
the foo driver. Subsequent calls in the driver to
ddi_check_acc_handle() return DDI_FAILURE.
th_define -n foo -i 3 -r 1 -a pio_r -c 0 1 -o OR 0x0
Causes 0x0 to be ORed into the next physical I/O read access
from any register in register set 1 of instance 3 of the foo
driver. This is of course a no-op.
th_define -n foo -i 3 -r 1 -l 0x8100 1 -a pio_r -c 0 10 -o
EQ 0x70003
Causes the next ten next physical I/O reads from the regis-
ter at offset 0x8100 in register set 1 of instance 3 of the
foo driver to return 0x70003.
th_define -n foo -i 3 -r 1 -l 0x8100 1 -a pio_w -c 100 3 -o
AND 0xffffffffffffefff
The next 100 physical I/O writes to the register at offset
0x8100 in register set 1 of instance 3 of the foo driver
take place as normal. However, on each of the three subse-
quent accesses, the 0x1000 bit will be cleared.
th_define -n foo -i 3 -r 1 -l 0x8100 0x10 -a pio_r -c 0 1 -f
1 -o XOR 7
Causes the bottom three bits to have their values toggled
for the next physical I/O read access to registers with
offsets in the range 0x8100 to 0x8110 in register set 1 of
instance 3 of the foo driver. Subsequent calls in the driver
to ddi_check_acc_handle() return DDI_FAILURE.
th_define -n foo -i 3 -a pio_w -c 0 1 -o NO 0
Prevents the next physical I/O write access to any register
in any register set of instance 3 of the foo driver from
going out on the bus.
th_define -n foo -i 3 -l 0 8192 -a dma_r -c 0 1 -o OR 7
Causes 0x7 to be ORed into each long long in the first 8192
bytes of the next DMA read, using any DMA handle for
instance 3 of the foo driver.
th_define -n foo -i 3 -r 2 -l 0 8 -a dma_r -c 0 1 -o OR
0x7070707070707070
Causes 0x70 to be ORed into each byte of the first long long
of the next DMA read, using the DMA handle with sequential
allocation number 2 for instance 3 of the foo driver.
th_define -n foo -i 3 -l 256 256 -a dma_w -c 0 1 -f 2 -o OR
7
Causes 0x7 to be ORed into each long long in the range from
offset 256 to offset 512 of the next DMA write, using any
DMA handle for instance 3 of the foo driver. Subsequent
calls in the driver to ddi_check_dma_handle() return
DDI_FAILURE.
th_define -n foo -i 3 -r 0 -l 0 8 -a dma_w -c 100 3 -o AND
0xffffffffffffefff
The next 100 DMA writes using the DMA handle with sequential
allocation number 0 for instance 3 of the foo driver take
place as normal. However, on each of the three subsequent
accesses, the 0x1000 bit will be cleared in the first long
long of the transfer.
th_define -n foo -i 3 -a intr -c 0 6 -o LOSE 0
Causes the next six interrupts for instance 3 of the foo
driver to be lost.
th_define -n foo -i 3 -a intr -c 30 1 -o EXTRA 10
When the thirty-first subsequent interrupt for instance 3 of
the foo driver occurs, a further ten interrupts are also
generated.
th_define -n foo -i 3 -a intr -c 0 1 -o DELAY 1024
Causes the next interrupt for instance 3 of the foo driver
to be delayed by 1024 microseconds.
NOTES
The policy option in the th_define -p syntax determines how
a set of logged accesses will be converted into the set of
error definitions. Each logged access will be matched
against the chosen policies to determine whether an error
definition should be created based on the access.
Any number of policy options can be combined to modify the
generated error definitions.
Bytewise Policies
These select particular I/O transfer sizes. Specifing a byte
policy will exclude other byte policies that have not been
chosen. If none of the byte type policies is selected, all
transfer sizes are treated equally. Otherwise, only those
specified transfer sizes will be selected.
onebyte
Create errdefs for one byte accesses (ddi_get8())
twobyte
Create errdefs for two byte accesses (ddi_get16())
fourbyte
Create errdefs for four byte accesses (ddi_get32())
eightbyte
Create errdefs for eight byte accesses (ddi_get64())
multibyte
Create errdefs for repeated byte accesses
(ddi_rep_get*())
Frequency of Access Policies
The frequency of access to a location is determined accord-
ing to the access type, location and transfer size (for
example, a two-byte read access to address A is considered
distinct from a four-byte read access to address A). The
algorithm is to count the number of accesses (of a given
type and size) to a given location, and find the locations
that were most and least accessed (let maxa and mina be the
number of times these locations were accessed, and mean the
total number of accesses divided by total number of loca-
tions that were accessed). Then a rare access is a location
that was accessed less than
(mean - mina) / 3 + mina
times. Similarly for the definition of common accesses:
maxa - (maxa - mean) / 3
A location whose access patterns lies within these cutoffs
is regarded as a location that is accessed with median fre-
quency.
rare Create errdefs for locations that are rarely accessed.
common
Create errdefs for locations that are commonly
accessed.
median
Create errdefs for locations that are accessed a
median frequency.
Policies for Minimizing errdefs
If a transaction is duplicated, either a single or multiple
errdefs will be written to the test scripts, depending upon
the following two policies:
maximal
Create multiple errdefs for locations that are repeat-
edly accessed.
unbiased
Create a single errdef for locations that are repeat-
edly accessed.
operators
For each location, a default operator and operand is
typically applied. For maximal test coverage, this
default may be modified using the operators policy so
that a separate errdef is created for each of the pos-
sible corruption operators.
SEE ALSO
kill(1), th_manage(1M), alarm(2), ddi_check_acc_handle(9F),
ddi_check_dma_handle(9F)
Man(1) output converted with
man2html