Find knowledge base article(s) by searching for keywords in the title e.g. type linux in the search box below
Find knowledge base article(s) by browsing the subject categories of articles
Technology quick references, cheatsheets, user manuals etc.
Shop Online through ShopifyLite
Tutorials on various IT applications.
Search Title    (UL:0 |SS:f)

Software >> OS >> Unix >> Solaris >> How to replace a failed disk in Solaris that is part of a an SVM mirror

 

SVM = Solaris Volume Manager

Take a backup

# metastat –p >/var/tmp/metastat-p-before.txt

# metastat –t >/var/tmp/metastat-t-before.txt

# metadb –i >/var/tmp/metadb-i-before.txt

# echo | format >/var/tmp/echo-format-before.txt

# iostat –en >/var/tmp/iostat-en-before.txt

 

 

Identify the failed disk 

 

# metastat

- look for Maintenance

OR

# echo | format

- look for "unknown" etc.

OR

# iostat -en

OR

look at /var/adm/messages log disk error/failure events

 

 

Example of metastat output with a failed disk :-

# metastat

d0: Mirror
    Submirror 0: d20
      State: Okay         Tue 07 Jun 2011 06:03:48 PM SGT
    Submirror 1: d10
      State: Needs maintenance Tue 16 Dec 2014 11:35:50 PM SGT
...

d20: Submirror of d0
    State: Okay         Tue 07 Jun 2011 06:03:48 PM SGT
    Size: 10241505 blocks
    Stripe 0:
      Device     Start  Dbase  State        Hot Spare  Time
      c1t1d0s0       0  No     Okay                    Tue 07 Jun 2011 06:03:31 PM SGT

d10: Submirror of d0
    State: Needs maintenance Tue 16 Dec 2014 11:35:50 PM SGT
    Invoke: metareplace d0 c1t0d0s0 <new device>
    Size: 10241505 blocks
    Stripe 0:
      Device     Start  Dbase  State        Hot Spare  Time
      c1t0d0s0       0  No     Maintenance             Tue 16 Dec 2014 11:35:50 PM SGT

d3: Mirror
    Submirror 0: d23
      State: Okay         Tue 07 Jun 2011 06:04:41 PM SGT
    Submirror 1: d13
      State: Needs maintenance Tue 16 Dec 2014 11:35:15 PM SGT
...

d23: Submirror of d3
    State: Okay         Tue 07 Jun 2011 06:04:41 PM SGT
    Size: 56754405 blocks
    Stripe 0:
      Device     Start  Dbase  State        Hot Spare  Time
      c1t1d0s3       0  No     Okay                    Tue 07 Jun 2011 06:04:22 PM SGT

d13: Submirror of d3
    State: Needs maintenance Tue 16 Dec 2014 11:35:15 PM SGT
    Invoke: metareplace d3 c1t0d0s3 <new device>
    Size: 56754405 blocks
    Stripe 0:
      Device     Start  Dbase  State        Hot Spare  Time
      c1t0d0s3       0  No     Maintenance             Tue 16 Dec 2014 11:35:15 PM SGT

# metadb -i 

    flags        first blk    block count
     a m  p  luo       16        1034        /dev/dsk/c1t1d0s5
     a    p  luo       16        1034        /dev/dsk/c1t1d0s6
     a    p  luo       16        1034        /dev/dsk/c1t1d0s7
    M     p            unknown        unknown        /dev/dsk/c1t0d0s6
    M     p            unknown        unknown        /dev/dsk/c1t0d0s7
 o - replica active prior to last mddb configuration change
 u - replica is up to date
 l - locator for this replica was read successfully
 c - replica's location was in /etc/lvm/mddb.cf
 p - replica's location was patched in kernel
 m - replica is master, this is replica selected as input
 W - replica has device write errors
 a - replica is active, commits are occurring to this replica
 M - replica had problem with master blocks
 D - replica had problem with data blocks
 F - replica had format problems
 S - replica is too small to hold current data base
 R - replica had device read errors

In this example, we deduce that disk c1t0 has failed

 

Identify submirrors to detach

 

# metastat -p

d0 -m d20 d10 1
d20 1 1 c1t1d0s0
d10 1 1 c1t0d0s0
d3 -m d23 d13 1
d23 1 1 c1t1d0s3
d13 1 1 c1t0d0s3

=> submirrors d10 and d13 are in disk c1t0 and needs to be detached

Detach

# metadetach -f d0 d10
# metadetach -f d3 d13

-f option is to force and the server is likely to complain if we attempt to detach without any option

 

Run metastat -p again to confirm the submirrors have been detached

# metastat -p

d0 -m d20 1
d20 1 1 c1t1d0s0
d3 -m d23 1
d23 1 1 c1t1d0s3
d10 1 1 c1t0d0s0
d13 1 1 c1t0d0s3

 

Clear

# metaclear d10

d10: Concat/Stripe is cleared

# metaclear d13

d13: Concat/Stripe is cleared

 

Run metastat -p again to verify they have been cleared

 

# metastat -p

d0 -m d20 1
d20 1 1 c1t1d0s0
d3 -m d23 1
d23 1 1 c1t1d0s3

 

Delete any state database replicase on the failed disk

In above example, the database replicas in /dev/dsk/c1t0d0s6 & /dev/dsk/c1t0d0s7 has "M" flag beside it indicating "replica had problem with master blocks".  Delete those and then run metadb -i again to verify they have been removed.

# metadb -d /dev/dsk/c1t0d0s6

# metadb -d /dev/dsk/c1t0d0s7

# metadb -i

    flags        first blk    block count
     a m  p  luo       16        1034        /dev/dsk/c1t1d0s5
     a    p  luo       16        1034        /dev/dsk/c1t1d0s6
     a    p  luo       16        1034        /dev/dsk/c1t1d0s7
 o - replica active prior to last mddb configuration change
 u - replica is up to date
 l - locator for this replica was read successfully
 c - replica's location was in /etc/lvm/mddb.cf
 p - replica's location was patched in kernel
 m - replica is master, this is replica selected as input
 W - replica has device write errors
 a - replica is active, commits are occurring to this replica
 M - replica had problem with master blocks
 D - replica had problem with data blocks
 F - replica had format problems
 S - replica is too small to hold current data base
 R - replica had device read errors


Remove the failed disk

use either luxadm or cfgadm (depending on the type of disk) to prepare the HDD for removal

# /usr/sbin/luxadm remove_device /dev/rdsk/c1t0d0s2

or

# cfgadm -c unconfigure c1::dsk/c1t0d0

 

Remove the faulty disk and Insert the replacement disk

# devfsadm -v

 

If cfgadm was used to unconfigure, configure it back

# cfgadm -c configure c1::dsk/c1t0d0

 

Confirm the disk is available by

# echo | format

copy disk structure from the unaffected disk to the replacement disk

# prtvtoc /dev/rdsk/c1t1d0s2 | fmthard -s - /dev/rdsk/c1t0d0s2

if the disk is used for booting,

# installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk /dev/rdsk/c1t0d0s0

create replicas of state database on the new disk, in our example there was 1 copy of the replica in c1t0d0s5, c1t0d0s6 & c1t0d0s7 before the disk failed

# metadb -af /dev/dsk/c1t0d0s5

# metadb -af /dev/dsk/c1t0d0s6

# metadb -af /dev/dsk/c1t0d0s7

Verify the replicas are healthy in the replaced disk

# metadb -i

initialise the new submirrors on the replaced disk

# metainit -f d10 1 1 c1t0d0s0

# metainit -f d13 1 1 c1t0d0s3

attach to the mirrors

# metattach d0 d10

# metattach d3 d13

Confirm the mirrors are starting to sync

# metastat

 

 

 

[ © 2008-2021 myfaqbase.com - A property of WPDC Consulting ]