Wednesday, 18 September 2013

In this post we will discussing about most common issue of replacing the faulty VxVM disk in UNIX server.
Every Unix System Admin should gone through this situation where they removing/replacing the faulty VxVM disk. Here we can see the procedure for replacing the faulty disk through "vxdiskadm" command, however we can also do the same in command line mode which is preferred.
     Assume that diskgroup "unixrockdg" is having one faultydisk unixrockdg04 (c3t9d0s2) which needs to be replaced. Let we can do the high level plan before doing the replacement. 

HIGH LEVEL PLAN:
Step1: Take the Backup of System and Disk Configuration.(Recommended to take cfg2html or explorer)
Step2: Remove the disk from VxVM level by using vxdiskadm utility
Step3: Unconfigure the disk from Os level by using cfgadm command.
Step4: Request to change the faulty disk.
Step5: Configure the disk from Os level by using cfgadm and devfsadm command.
Step6: Replace the disk from VxVM level by using vxdiskadm utility.
Step7: Start the VOLUME and Mount the same.

Let we can start the activity after taking the valid configuration backup.Below output is confirming that unixrockdg04(c3t9d0s2) is failed status.
root@unixrock # vxdisk list
DEVICE       TYPE      DISK         GROUP        STATUS
c1t0d0s2     sliced    rootdisk     rootdg       online
c1t1d0s2     sliced    rootmirr     rootdg       online
c3t8d0s2     sliced    unixrockdg03    unixrockdg      online
c3t10d0s2    sliced    -            -            online
c3t11d0s2    sliced    unixrockdg01    unixrockdg      online
c3t12d0s2    sliced    unixrockdg02    unixrockdg      online
-            -         unixrockdg04    unixrockdg      failed was:c3t9d0s2
root@unixrock #
Remove the failed disk from VxVM with using vxdiskadm command.
root@unixrock # vxdiskadm

Volume Manager Support Operations
Menu: VolumeManager/Disk

 1      Add or initialize one or more disks
 2      Encapsulate one or more disks
 3      Remove a disk
 4      Remove a disk for replacement
 5      Replace a failed or removed disk
 6      Mirror volumes on a disk
 7      Move volumes from a disk
 8      Enable access to (import) a disk group
 9      Remove access to (deport) a disk group
 10     Enable (online) a disk device
 11     Disable (offline) a disk device
 12     Mark a disk as a spare for a disk group
 13     Turn off the spare flag on a disk
 14     Unrelocate subdisks back to a disk
 15     Exclude a disk from hot-relocation use
 16     Make a disk available for hot-relocation use
 17     Prevent multipathing/Suppress devices from VxVM's view
 18     Allow multipathing/Unsuppress devices from VxVM's view
 19     List currently suppressed/non-multipathed devices
 20     Change the disk naming scheme
 21     Get the newly connected/zoned disks in VxVM view
 list   List disk information


 ?      Display help about menu
 ??     Display help about the menuing system
 q      Exit from menus

Select an operation to perform: 4

Remove a disk for replacement
Menu: VolumeManager/Disk/RemoveForReplace

  Use this menu operation to remove a physical disk from a disk
  group, while retaining the disk name.  This changes the state
  for the disk name to a "removed" disk.  If there are any
  initialized disks that are not part of a disk group, you will be
  given the option of using one of these disks as a replacement.

Enter disk name [,list,q,?] list

Disk group: rootdg

DM NAME         DEVICE       TYPE     PRIVLEN  PUBLEN   STATE

dm rootdisk     c1t0d0s2     sliced   10175    143339136 -
dm rootmirr     c1t1d0s2     sliced   10175    143339136 -

Disk group: unixrockdg

DM NAME         DEVICE       TYPE     PRIVLEN  PUBLEN   STATE

dm unixrockdg01    c3t11d0s2    sliced   9919     143328960 -
dm unixrockdg02    c3t12d0s2    sliced   9919     143328960 -
dm unixrockdg03    c3t8d0s2     sliced   9919     143328960 -
dm unixrockdg04    -            -        -        -        NODEVICE

Enter disk name [,list,q,?] unixrockdg04

  The following volumes will be disabled as a result of this
  operation:

        unixrockvol

  These volumes will require restoration from backup.

Are you sure you want do do this? [y,n,q,?] (default: n) y

  The requested operation is to remove disk unixrockdg04 from disk group
  unixrockdg.  The disk name will be kept, along with any volumes using
  the disk, allowing replacement of the disk.

  Select "Replace a failed or removed disk" from the main menu
  when you wish to replace the disk.

Continue with operation? [y,n,q,?] (default: y) y

  Removal of disk unixrockdg04 completed successfully.

Remove another disk? [y,n,q,?] (default: n) n

Volume Manager Support Operations
Menu: VolumeManager/Disk

 1      Add or initialize one or more disks
 2      Encapsulate one or more disks
 3      Remove a disk
 4      Remove a disk for replacement
 5      Replace a failed or removed disk
 6      Mirror volumes on a disk
 7      Move volumes from a disk
 8      Enable access to (import) a disk group
 9      Remove access to (deport) a disk group
 10     Enable (online) a disk device
 11     Disable (offline) a disk device
 12     Mark a disk as a spare for a disk group
 13     Turn off the spare flag on a disk
 14     Unrelocate subdisks back to a disk
 15     Exclude a disk from hot-relocation use
 16     Make a disk available for hot-relocation use
 17     Prevent multipathing/Suppress devices from VxVM's view
 18     Allow multipathing/Unsuppress devices from VxVM's view
 19     List currently suppressed/non-multipathed devices
 20     Change the disk naming scheme
 21     Get the newly connected/zoned disks in VxVM view
 list   List disk information


 ?      Display help about menu
 ??     Display help about the menuing system
 q      Exit from menus

Select an operation to perform: q

Goodbye.
Now we can see the failed disk status as "Removed"
root@unixrock # vxdisk list
DEVICE       TYPE      DISK         GROUP        STATUS
c1t0d0s2     sliced    rootdisk     rootdg       online
c1t1d0s2     sliced    rootmirr     rootdg       online
c3t8d0s2     sliced    unixrockdg03    unixrockdg      online
c3t9d0s2     sliced    -            -            error
c3t10d0s2    sliced    -            -            online
c3t11d0s2    sliced    unixrockdg01    unixrockdg      online
c3t12d0s2    sliced    unixrockdg02    unixrockdg      online
-            -         unixrockdg04    unixrockdg      removed was:c3t9d0s2
Once we removed the disk from VxVM level, we have to remove the faulty disk from OS level by using cfgadm -c unconfigure .
root@unixrock # cfgadm -c unconfigure c3::dsk/c3t9d0
root@unixrock #
Once its done, we have to replace the faulty disk physically and configure the disk in OS level.
root@unixrock # cfgadm -c configure c3::dsk/c3t9d0
root@unixrock #
root@unixrock # devfsadm -c disk
root@unixrock # echo|format|grep -i c3t9d0
3. c3t9d0 SUN72G cyl 14087 alt 2 hd 24 sec 424
root@unixrock #
Now the disk is available in OS level, we have to get the disk into VxVM control now.
root@unixrock # vxdctl enable
root@unixrock # vxdiskadm

Volume Manager Support Operations
Menu: VolumeManager/Disk

 1      Add or initialize one or more disks
 2      Encapsulate one or more disks
 3      Remove a disk
 4      Remove a disk for replacement
 5      Replace a failed or removed disk
 6      Mirror volumes on a disk
 7      Move volumes from a disk
 8      Enable access to (import) a disk group
 9      Remove access to (deport) a disk group
 10     Enable (online) a disk device
 11     Disable (offline) a disk device
 12     Mark a disk as a spare for a disk group
 13     Turn off the spare flag on a disk
 14     Unrelocate subdisks back to a disk
 15     Exclude a disk from hot-relocation use
 16     Make a disk available for hot-relocation use
 17     Prevent multipathing/Suppress devices from VxVM's view
 18     Allow multipathing/Unsuppress devices from VxVM's view
 19     List currently suppressed/non-multipathed devices
 20     Change the disk naming scheme
 21     Get the newly connected/zoned disks in VxVM view
 list   List disk information


 ?      Display help about menu
 ??     Display help about the menuing system
 q      Exit from menus

Select an operation to perform: 5

Replace a failed or removed disk
Menu: VolumeManager/Disk/ReplaceDisk

  Use this menu operation to specify a replacement disk for a disk
  that you removed with the "Remove a disk for replacement" menu
  operation, or that failed during use.  You will be prompted for
  a disk name to replace and a disk device to use as a replacement.
  You can choose an uninitialized disk, in which case the disk will
  be initialized, or you can choose a disk that you have already
  initialized using the Add or initialize a disk menu operation.

Select a removed or failed disk [,list,q,?] list

Disk group: rootdg

DM NAME         DEVICE       TYPE     PRIVLEN  PUBLEN   STATE


Disk group: unixrockdg

DM NAME         DEVICE       TYPE     PRIVLEN  PUBLEN   STATE

dm unixrockdg04    -            -        -        -        REMOVED


Select a removed or failed disk [,list,q,?] unixrockdg04

Select disk device to initialize [
,list,q,?] list DEVICE DISK GROUP STATUS c1t0d0 rootdisk rootdg online c1t1d0 rootmirr rootdg online c3t8d0 unixrockdg03 unixrockdg online c3t9d0 - - error c3t10d0 - - online c3t11d0 unixrockdg01 unixrockdg online c3t12d0 unixrockdg02 unixrockdg online Select disk device to initialize [
,list,q,?] c3t9d0 The following disk device has a valid VTOC, but does not appear to have been initialized for the Volume Manager. If there is data on the disk that should NOT be destroyed you should encapsulate the existing disk partitions as volumes instead of adding the disk as a new disk. Output format: [Device_Name] c3t9d0 Encapsulate this device? [y,n,q,?] (default: y) n c3t9d0 Instead of encapsulating, initialize? [y,n,q,?] (default: n) y The requested operation is to initialize disk device c3t9d0 and to then use that device to replace the removed or failed disk unixrockdg04 in disk group unixrockdg. Continue with operation? [y,n,q,?] (default: y) Use a default private region length for the disk? [y,n,q,?] (default: y) Replacement of disk unixrockdg04 in group unixrockdg with disk device c3t9d0 completed successfully. Replace another disk? [y,n,q,?] (default: n)
Checking the status
root@unixrock # vxdisk list
DEVICE       TYPE      DISK         GROUP        STATUS
c1t0d0s2     sliced    rootdisk     rootdg       online
c1t1d0s2     sliced    rootmirr     rootdg       online
c3t8d0s2     sliced    unixrockdg03    unixrockdg      online
c3t9d0s2     sliced    unixrockdg04    unixrockdg      online
c3t10d0s2    sliced    -            -            online
c3t11d0s2    sliced    unixrockdg01    unixrockdg      online
c3t12d0s2    sliced    unixrockdg02    unixrockdg      online
root@unixrock #
We have successfully replaced the faulty disk. however we have to check the VOLUME status. Below output "unixrockvol" is disabled status.
root@unixrock # vxprint -hvtg unixrockdg
V  NAME         RVG          KSTATE   STATE    LENGTH   READPOL   PREFPLEX UTYPE
PL NAME         VOLUME       KSTATE   STATE    LENGTH   LAYOUT    NCOL/WID MODE
SD NAME         PLEX         DISK     DISKOFFS LENGTH   [COL/]OFF DEVICE   MODE
SV NAME         PLEX         VOLNAME  NVOLLAYR LENGTH   [COL/]OFF AM/NM    MODE
DC NAME         PARENTVOL    LOGVOL
SP NAME         SNAPVOL      DCO

v  unixrockvol     -            DISABLED ACTIVE   573313024 SELECT   -        fsgen
pl unixrock-01     unixrockvol     DISABLED RECOVER  573315840 CONCAT   -        RW
sd unixrockdg03-01 unixrock-01     unixrockdg03 0       143328960 0        c3t8d0   ENA
sd unixrockdg04-01 unixrock-01     unixrockdg04 0       143328960 143328960 c3t9d0  ENA
sd unixrockdg01-01 unixrock-01     unixrockdg01 0       143328960 286657920 c3t11d0 ENA
sd unixrockdg02-01 unixrock-01     unixrockdg02 0       143328960 429986880 c3t12d0 ENA
root@unixrock #
I tried below steps make the volume active status.
root@unixrock # vxrecover -s unixrockvol
root@unixrock # vxprint -hvtg unixrockdg
V  NAME         RVG          KSTATE   STATE    LENGTH   READPOL   PREFPLEX UTYPE
PL NAME         VOLUME       KSTATE   STATE    LENGTH   LAYOUT    NCOL/WID MODE
SD NAME         PLEX         DISK     DISKOFFS LENGTH   [COL/]OFF DEVICE   MODE
SV NAME         PLEX         VOLNAME  NVOLLAYR LENGTH   [COL/]OFF AM/NM    MODE
DC NAME         PARENTVOL    LOGVOL
SP NAME         SNAPVOL      DCO

v  unixrockvol     -            DISABLED ACTIVE   573313024 SELECT   -        fsgen
pl unixrock-01     unixrockvol     DISABLED RECOVER  573315840 CONCAT   -        RW
sd unixrockdg03-01 unixrock-01     unixrockdg03 0       143328960 0        c3t8d0   ENA
sd unixrockdg04-01 unixrock-01     unixrockdg04 0       143328960 143328960 c3t9d0  ENA
sd unixrockdg01-01 unixrock-01     unixrockdg01 0       143328960 286657920 c3t11d0 ENA
sd unixrockdg02-01 unixrock-01     unixrockdg02 0       143328960 429986880 c3t12d0 ENA
root@unixrock # vxtask list
TASKID  PTID TYPE/STATE    PCT   PROGRESS
root@unixrock # vxvol -g unixrockdg startall
vxvm:vxvol: ERROR: Volume unixrockvol has no CLEAN or non-volatile ACTIVE plexes
root@unixrock #
Then I follow the below steps in order to make the volume active status.
root@unixrock # vxmend -g unixrockdg fix stale unixrock-01
root@unixrock # vxmend -g unixrockdg fix clean unixrock-01
root@unixrock # vxvol -g unixrockdg start unixrockvol
root@unixrock #
root@unixrock # vxprint -hvtg unixrockdg
V  NAME         RVG          KSTATE   STATE    LENGTH   READPOL   PREFPLEX UTYPE
PL NAME         VOLUME       KSTATE   STATE    LENGTH   LAYOUT    NCOL/WID MODE
SD NAME         PLEX         DISK     DISKOFFS LENGTH   [COL/]OFF DEVICE   MODE
SV NAME         PLEX         VOLNAME  NVOLLAYR LENGTH   [COL/]OFF AM/NM    MODE
DC NAME         PARENTVOL    LOGVOL
SP NAME         SNAPVOL      DCO

v  unixrockvol     -            ENABLED  ACTIVE   573313024 SELECT   -        fsgen
pl unixrock-01     unixrockvol     ENABLED  ACTIVE   573315840 CONCAT   -        RW
sd unixrockdg03-01 unixrock-01     unixrockdg03 0       143328960 0        c3t8d0   ENA
sd unixrockdg04-01 unixrock-01     unixrockdg04 0       143328960 143328960 c3t9d0  ENA
sd unixrockdg01-01 unixrock-01     unixrockdg01 0       143328960 286657920 c3t11d0 ENA
sd unixrockdg02-01 unixrock-01     unixrockdg02 0       143328960 429986880 c3t12d0 ENA
root@unixrock #
Then I tried to mount the Volume, but I got below errors
root@unixrock # mount /unixrock
mount: /dev/vx/dsk/unixrockdg/unixrockvol is already mounted, /unixrock is busy,
        or the allowable number of mount points has been exceeded
root@unixrock # mount -v|grep -i /unixrock
root@unixrock #
Then i did some breakfix in order to mount the volume
root@unixrock # mv /unixrock /unixrock_old
root@unixrock # mkdir /unixrock
root@unixrock # mount /unixrock
root@unixrock # df -k|grep -i unixrock
Filesystem            kbytes    used   avail capacity  Mounted on
/dev/vx/dsk/unixrockdg/unixrockvol
                     282176390 29435764 249918863    11%    /unixrock
root@unixrock #
Now we have successfully replaced the faulty disk and mounted the volume. Thanks for reading this post.