mhd(7i)




NAME

     mhd - multihost disk control operations


SYNOPSIS

     #include <sys/mhd.h>


DESCRIPTION

     The  mhd ioctl(2) control access rights of a multihost disk,
     using disk reservations on the disk device.

     The stability level of this interface (see attributes(5)) is
     evolving.  As  a  result, the interface is subject to change
     and you should limit your use of it.

     The mhd ioctls fall into two major categories:

      ioctls for non-shared multihost disks

      ioctls for shared multihost disks.

     One ioctl, MHIOCENFAILFAST, is applicable to both non-shared
     and shared multihost disks.  It is described after the first
     two categories.

     All the ioctls require root privilege.

     For all of the ioctls, the caller  should  obtain  the  file
     descriptor  for  the  device  by  calling   open(2) with the
     O_NDELAY flag; without the  O_NDELAY flag, the open may fail
     due to another host already having a conflicting reservation
     on the device.  Some of the ioctls below permit  the  caller
     to  forcibly clear a conflicting reservation held by another
     host, however, in order to call the ioctl, the  caller  must
     first obtain the open file descriptor.

  Non-shared multihost disks
     Non-shared multihost disks  ioctls  consist  of  MHIOCTKOWN,
     MHIOCRELEASE,  HIOCSTATUS,  and  MHIOCQRESERVE.  These ioctl
     requests control the access rights of  non-shared  multihost
     disks.  A  non-shared  multihost  disk  is one that supports
     serialized, mutually exclusive I/O mastery by the  connected
     hosts.   This  is  in  contrast to the shared-disk model, in
     which concurrent access is allowed from more than  one  host
     (see below).

     A non-shared multihost disk can be in one of two states:

        o  Exclusive access state, where only one connected  host
           has I/O access

        o  Non-exclusive access state, where all connected  hosts
           have  I/O access. An external hardware reset can cause
           the disk to enter the non-exclusive access state.

     Each multihost disk driver views the machine on  which  it's
     running  as  the "local host"; each views all other machines
     as "remote hosts".  For  each  I/O  or  ioctl  request,  the
     requesting host is the local host.

     Note that the non-shared ioctls are designed  to  work  with
     SCSI-2 disks.  The SCSI-2 RESERVE/RELEASE command set is the
     underlying hardware facility in the device that supports the
     non-shared ioctls.

     The function prototypes for the non-shared ioctls are:

     ioctl(fd, MHIOCTKOWN);
     ioctl(fd, MHIOCRELEASE);
     ioctl(fd, MHIOCSTATUS);
     ioctl(fd, MHIOCQRESERVE);

     MHIOCTKOWN
           Forcefully acquires exclusive  access  rights  to  the
           multihost  disk for the local host. Revokes all access
           rights  to  the  multihost  disk  from  remote  hosts.
           Causes the disk to enter the exclusive access state.

           Implementation Note:  Reservations  (exclusive  access
           rights)  broken via random resets should be reinstated
           by the driver upon their detection,  for  example,  in
           the automatic probe function described below.

     MHIOCRELEASE
           Relinquishes exclusive access rights to the  multihost
           disk  for the local host.  On success, causes the disk
           to enter the  non- exclusive access state.

     MHIOCSTATUS
           Probes a multihost disk to determine whether the local
           host  has access rights to the disk. Returns  0 if the
           local host has access to the disk,   1 if it  doesn't,
           and  -1 with errno set to  EIO if the probe failed for
           some other reason.

     MHIOCQRESERVE
           Issues, simply and only, a SCSI-2 Reserve command.  If
           the  attempt  to  reserve  fails due to the SCSI error
           Reservation Conflict (which implies  that  some  other
           host  has  the  device  reserved), then the ioctl will
           return   -1   with   errno   set   to    EACCES.   The
           MHIOCQRESERVE  ioctl does NOT issue a bus device reset
           or bus reset prior to attempting  the  SCSI-2  reserve
           command.   It  also does not take care of re-instating
           reservations that disappear due to bus resets  or  bus
           device  resets;  if that behavior is desired, then the
           caller can call MHIOCTKOWN after the MHIOCQRESERVE has
           returned  success.  If the device does not support the
           SCSI-2 Reserve command, then  the  ioctl  returns   -1
           with  errno set to ENOTSUP. The MHIOCQRESERVE ioctl is
           intended to be used by high-availability or clustering
           software  for  a  "quorum" disk, hence, the "Q" in the
           name of the ioctl.

  Shared Multihost Disks
     Shared multihost disks ioctls control access to shared  mul-
     tihost  disks.  The ioctls are merely a veneer on the SCSI-3
     Persistent Reservation facility. Therefore,  the  underlying
     semantic  model is not described in detail here, see instead
     the SCSI-3 standard. The SCSI-3 Persistent Reservations sup-
     port the concept of a group of hosts all sharing access to a
     disk.

     The function prototypes and descriptions for the shared mul-
     tihost ioctls are as follows:

     ioctl(fd,  MHIOCGRP_INKEYS, (mhioc_inkeys_t)  *k);
           Issues the SCSI-3 command Persistent Reserve  In  Read
           Keys  to the device.  On input, the field k->li should
           be  initialized  by  the  caller  with  k->li.listsize
           reflecting  how  big  of an array the caller has allo-
           cated for the k->li.list field and with  k->li.listlen
           == 0. On return, the field k->li.listlen is updated to
           indicate the number of  reservation  keys  the  device
           currently  has:  if  this  value  is  larger  than  k-
           >li.listsize  then  that  indicates  that  the  caller
           should  have  passed  a bigger k->li.list array with a
           bigger k->li.listsize.  The number of  array  elements
           actually  written by the callee into k->li.list is the
           minimum of k->li.listlen and k->li.listsize. The field
           k->generation  is updated with the generation informa-
           tion returned by the SCSI-3 Read Keys query.   If  the
           device  does  not  support  SCSI-3 Persistent Reserva-
           tions, then this ioctl returns -1 with  errno  set  to
           ENOTSUP.

     ioctl(fd,  MHIOCGRP_INRESVS, (mhioc_inresvs_t)  *r);
           Issues the SCSI-3 command Persistent Reserve  In  Read
           Reservations   to   the  device.  Remarks  similar  to
           MHIOCGRP_INKEYS apply to the array  manipulation.   If
           the device does not support SCSI-3 Persistent Reserva-
           tions, then this ioctl returns -1 with  errno  set  to
           ENOTSUP.

     ioctl(fd,  MHIOCGRP_REGISTER, (mhioc_register_t)  *r);
           Issues  the  SCSI-3  command  Persistent  Reserve  Out
           Register.  The  fields  of structure r are all inputs;
           none of the fields are modified  by  the  ioctl.   The
           field r->aptpl should be set to true to specify that
            registrations and reservations should persist  across
           device  power  failures,  or  to false to specify that
           registrations and reservations should be cleared  upon
           device power failure; true is the recommended setting.
           The  field  r->oldkey  is  the  key  that  the  caller
           believes  the  device  may  already have for this host
           initiator; if the caller believes that that this  host
           initiator  is not already registered with this device,
           it should pass the  special  key  of  all  zeros.   To
           achieve  the  effect of unregistering with the device,
           the caller should pass its  current  key  for  the  r-
           >oldkey  field  and  an r->newkey field containing the
           special key of all zeros.  If the device  returns  the
           SCSI  error  code  Reservation  Conflict,  this  ioctl
           returns -1 with errno set to EACCES.

     ioctl(fd,  MHIOCGRP_RESERVE, (mhioc_resv_desc_t)  *r);
           Issues  the  SCSI-3  command  Persistent  Reserve  Out
           Reserve.  The  fields  of  structure r are all inputs;
           none of the fields are modified by the ioctl.  If  the
           device  returns  the  SCSI error code Reservation Con-
           flict, this ioctl returns -1 with errno set to EACCES.


*r);

     ioctl(fd,   MHIOCGRP_PREEMPTANDABORT,   (mhioc_preemptandabort_t)
           Issues  the  SCSI-3  command  Persistent  Reserve  Out
           Preempt-And-Abort.  The fields of structure r are  all
           inputs; inputs; none of the fields are modified by the
           ioctl. The key of the victim host is specified by  the
           field  r->victim_key.  The  field r->resvdesc supplies
           the preempter's key and the  reservation  that  it  is
           requesting  as  part  of  the SCSI-3 Preempt-And-Abort
           command.  If the device returns the  SCSI  error  code
           Reservation Conflict, this ioctl returns -1 with errno
           set to  EACCES.

     ioctl(fd,  MHIOCGRP_PREEMPT, (mhioc_preemptandabort_t)  *r);
           Similar  to  MHIOCGRP_PREEMPTANDABORT,   but   instead
           issues  the  SCSI-3  command  Persistent  Reserve  Out
           Preempt.

     ioctl(fd,  MHIOCGRP_CLEAR, (mhioc_resv_key_t)  *r);
           Issues  the  SCSI-3  command  Persistent  Reserve  Out
           Clear. The input parameter r is the reservation key of
           the caller, which should have been already  registered
           with    the    device,   by   an   earlier   call   to
           MHIOCGRP_REGISTER.

     For each device, the non-shared ioctls should not  be  mixed
     with  the  Persistent  Reserve  Out shared ioctls, and vice-
     versa,  otherwise, the underlying device is likely to return
     errors,  because SCSI does not permit SCSI-2 reservations to
     be mixed with SCSI-3 reservations on a  single  device.   It
     is,  however,  legitimate  to call the Persistent Reserve In
     ioctls,  because  these  are  query   only.    Issuing   the
     MHIOCGRP_INKEYS  ioctl   is the recommended way for a caller
     to determine  if  the  device   supports  SCSI-3  Persistent
     Reservations  (the  ioctl  will return -1 with  errno set to
     ENOTSUP if the device does not).

  MHIOCENFAILFAST Ioctl
     The MHIOCENFAILFAST ioctl is applicable for both  non-shared
     and shared disks, and may be used with either the non-shared
     or shared ioctls.

     ioctl(fd,  MHIOENFAILFAST, (unsigned int *)  millisecs);
           Enables or disables the failfast option  in  the  mul-
           tihost  disk  driver and enables or disables automatic
           probing of a multihost  disk,  described  below.   The
           argument  is an unsigned integer specifying the number
           of milliseconds to  wait  between  executions  of  the
           automatic  probe  function.   An argument of zero dis-
           ables the failfast option and disables automatic prob-
           ing.   If  the  MHIOCENFAILFAST ioctl is never called,
           the effect is defined to be  that  both  the  failfast
           option and automatic probing are disabled.

  Automatic Probing
     The MHIOCENFAILFAST ioctl sets up a timeout in the driver to
     periodically  schedule  automatic  probes of  the  disk. The
     automatic probe function works in this manner: The driver is
     scheduled  to probe the multihost disk every n milliseconds,
     rounded up to the  next  integral  multiple  of  the  system
     clock's resolution. If

     1. the local host no longer has access  rights  to  the mul-
        tihost disk, and

     2. access rights were expected to  be  held  by  the   local
        host,

     the driver immediately panics the machine to comply with the
     failfast model.

     If the driver makes this discovery outside the timeout func-
     tion,  especially  during  a  read or write operation, it is
     imperative that it panic the system then as well.


RETURN VALUES


     Each request returns -1 on failure and sets errno  to  indi-
     cate the error.

     EPERM Caller is not root.

     EACCES
           Access rights were denied.

     EIO   The multihost disk or controller was  unable  to  suc-
           cessfully complete the requested operation.

     EOPNOTSUP
           The multihost disk does not support the operation. For
           example,    it    does    not   support   the   SCSI-2
           Reserve/Release command set, or the SCSI-3  Persistent
           Reservation command set.


ATTRIBUTES

     See attributes(5) for a description of the following  attri-
     butes:

     ____________________________________________________________
    |       ATTRIBUTE TYPE        |       ATTRIBUTE VALUE       |
    | Availability                | SUNWhea                     |
    | Stability                   | Evolving                    |
    |_____________________________|_____________________________|


SEE ALSO

     ioctl(2), open(2), attributes(5)open(2)


NOTES

     The ioctls for shared multihost disks and the  MHIOCQRESERVE
     ioctl  are currently implemented only for SPARC and only for
     the following disk device drivers:  sd(7D), ssd(7D).


Man(1) output converted with man2html