Sun Microsystems, Inc.
spacerspacer
spacer www.sun.com docs.sun.com |
spacer
black dot
 
 
24.  Troubleshooting Solaris Volume Manager Recovering From State Database Replica Failures How to Recover From Insufficient State Database Replicas Example--Recovering From Stale State Database Replicas  Previous   Contents   Next 
   
 

Repairing Transactional Volumes

Because a transactional volume is a "layered" volume, consisting of a master device and logging device, and because the logging device can be shared among file systems, repairing a failed transactional volume requires special recovery tasks.

Any device errors or panics must be managed by using the command line utilities.

Panics

If a file system detects any internal inconsistencies while it is in use, it will panic the system. If the file system is configured for logging, it notifies the transactional volume that it needs to be checked at reboot. The transactional volume transitions itself to the "Hard Error" state. All other transactional volumes that share the same log device also go into the "Hard Error" state.

At reboot, fsck checks and repairs the file system and transitions the file system back to the "Okay" state. fsck completes this process for all transactional volumes listed in the /etc/vfstab file for the affected log device.

Transactional Volume Errors

If a device error occurs on either the master device or the log device while the transactional volume is processing logged data, the device transitions from the "Okay" state to the "Hard Error" state. If the device is either in the "Hard Error" or "Error" state, either a device error has occurred, or a panic has occurred.

Any devices sharing the failed log device also go the "Error" state.

Recovering From Soft Partition Problems

The following sections show how to recover configuration information for soft partitions. You should only use these techniques if all of your state database replicas have been lost and you do not have a current or accurate copy of metastat -p output, the md.cf file, or an up-to-date md.tab file.

How to Recover Configuration Data for a Soft Partition

At the beginning of each soft partition extent, a sector is used to mark the beginning of the soft partition extent. These hidden sectors are called extent headers and do not appear to the user of the soft partition. If all Solaris Volume Manager configuration is lost, the disk can be scanned in an attempt to generate the configuration data.

This procedure is a last option to recover lost soft partition configuration information. The metarecover command should only be used when you have lost both your metadb and your md.cf files, and your md.tab is lost or out of date.


Note - This procedure only works to recover soft partition information, and does not assist in recovering from other lost configurations or for recovering configuration information for other Solaris Volume Manager volumes.



Note - If your configuration included other Solaris Volume Manager volumes that were built on top of soft partitions, you should recover the soft partitions before attempting to recover the other volumes.


Configuration information about your soft partitions is stored on your devices and in your state database. Since either of these sources could be corrupt, you must tell the metarecover command which source is reliable.

First, use the metarecover command to determine whether the two sources agree. If they do agree, the metarecover command cannot be used to make any changes. If the metarecover command reports an inconsistency, however, you must examine its output carefully to determine whether the disk or the state database is corrupt, then you should use the metarecover command to rebuild the configuration based on the appropriate source.

  1. Read the "Background Information About Soft Partitions".

  2. Review the soft partition recovery information by using the metarecover command.

    metarecover component-p -d }

    In this case, component is the c*t*d*s* name of the raw component. The -d option indicates to scan the physical slice for extent headers of soft partitions.

    For more information, see the metarecover(1M) man page.

Example--Recovering Soft Partitions from On-Disk Extent Headers

# metarecover c1t1d0s1 -p -d
The following soft partitions were found and will be added to
your metadevice configuration.
 Name            Size     No. of Extents
    d10           10240         1
    d11           10240         1
    d12           10240         1
# metarecover c1t1d0s1 -p -d
The following soft partitions were found and will be added to
your metadevice configuration.
 Name            Size     No. of Extents
    d10           10240         1
    d11           10240         1
    d12           10240         1
WARNING: You are about to add one or more soft partition
metadevices to your metadevice configuration.  If there
appears to be an error in the soft partition(s) displayed
above, do NOT proceed with this recovery operation.
Are you sure you want to do this (yes/no)?yes
c1t1d0s1: Soft Partitions recovered from device.
bash-2.05# metastat
d10: Soft Partition
    Device: c1t1d0s1
    State: Okay
    Size: 10240 blocks
        Device              Start Block  Dbase Reloc
        c1t1d0s1                   0     No    Yes

        Extent              Start Block              Block count
             0                        1                    10240

d11: Soft Partition
    Device: c1t1d0s1
    State: Okay
    Size: 10240 blocks
        Device              Start Block  Dbase Reloc
        c1t1d0s1                   0     No    Yes

        Extent              Start Block              Block count
             0                    10242                    10240

d12: Soft Partition
    Device: c1t1d0s1
    State: Okay
    Size: 10240 blocks
        Device              Start Block  Dbase Reloc
        c1t1d0s1                   0     No    Yes

        Extent              Start Block              Block count
             0                    20483                    10240

This example recovers three soft partitions from disk, after the state database replicas were accidentally deleted.

Recovering Configuration From a Different System

You can recover a Solaris Volume Manager configuration, even onto a different system from the original. For example, assume you have a system with an external Multipack of six disks in it, and a Solaris Volume Manager configuration, including at least one state database replica, on some of those disks. If you experience a system failure, you can attach the Multipack to a different system and recover the complete configuration from the local disk set.


Note - Only recover a Solaris Volume Manager configuration onto a system with no preexisting Solaris Volume Manager configuration. Otherwise, you risk replacing a logical volume on your system with a logical volume that you are recovering, and possibly corrupting your system.



Note - This process only works to recover volumes from the local disk set.


How to Recover a Configuration

How to Recover a Configuration

  1. Attach the disk or disks that contain the Solaris Volume Manager configuration to a system with no preexisting Solaris Volume Manager configuration.

  2. Do a reconfiguration reboot to ensure that the system recognizes the newly added disks.

    # reboot -- -r
  3. Determine the major/minor number for a slice containing a state database replica on the newly added disks.

    Use ls -lL, and note the two numbers between the group name and the date. Those are the major/minor numbers for this slice.
    # ls -Ll /dev/dsk/c1t9d0s7
    brw-r-----   1 root     sys       32, 71 Dec  5 10:05 /dev/dsk/c1t9d0s7

  4. If necessary, determine the major name corresponding with the major number by looking up the major number in /etc/name_to_major.
    # grep " 32" /etc/name_to_major 
    sd 32

  5. Update the /kernel/drv/md.conf file with two commands: one command to tell Solaris Volume Manager where to find a valid state database replica on the new disks, and one command to tell it to trust the new replica and ignore any conflicting device ID information on the system.

    In the line in this example that begins with mddb_bootlist1, replace the sd in the example with the major name you found in the previous step. Replace 71 in the example with the minor number you identified in Step 3.
    #pragma ident   "@(#)md.conf    2.1     00/07/07 SMI"
    #
    # Copyright (c) 1992-1999 by Sun Microsystems, Inc.
    # All rights reserved.
    #
    name="md" parent="pseudo" nmd=128 md_nsets=4;
    #
    #pragma ident   "@(#)md.conf    2.1     00/07/07 SMI"
    #
    # Copyright (c) 1992-1999 by Sun Microsystems, Inc.
    # All rights reserved.
    #
    name="md" parent="pseudo" nmd=128 md_nsets=4;
    # Begin MDD database info (do not edit)
    mddb_bootlist1="sd:71:16:id0"; 
    md_devid_destroy=1;# End MDD database info (do not edit)

  6. Reboot to force Solaris Volume Manager to reload your configuration.

    You will see messages similar to the following displayed to the console.
    volume management starting.
    Dec  5 10:11:53 lexicon metadevadm: Disk movement detected
    Dec  5 10:11:53 lexicon metadevadm: Updating device names in 
    Solaris Volume Manager
    The system is ready.

  7. Verify your configuration by using the metadb and metastat commands.
    # metadb
            flags           first blk       block count
         a m  p  luo        16              8192            /dev/dsk/c1t9d0s7
         a       luo        16              8192            /dev/dsk/c1t10d0s7
         a       luo        16              8192            /dev/dsk/c1t11d0s7
         a       luo        16              8192            /dev/dsk/c1t12d0s7
         a       luo        16              8192            /dev/dsk/c1t13d0s7
    # metastat
    d12: RAID
        State: Okay         
        Interlace: 32 blocks
        Size: 125685 blocks
    Original device:
        Size: 128576 blocks
            Device              Start Block  Dbase State        Reloc  Hot Spare
            c1t11d0s3                330     No    Okay         Yes    
            c1t12d0s3                330     No    Okay         Yes    
            c1t13d0s3                330     No    Okay         Yes    
    
    d20: Soft Partition
        Device: d10
        State: Okay
        Size: 8192 blocks
            Extent              Start Block              Block count
                 0                     3592                     8192
    
    d21: Soft Partition
        Device: d10
        State: Okay
        Size: 8192 blocks
            Extent              Start Block              Block count
                 0                    11785                     8192
    
    d22: Soft Partition
        Device: d10
        State: Okay
        Size: 8192 blocks
            Extent              Start Block              Block count
                 0                    19978                     8192
    
    d10: Mirror
        Submirror 0: d0
          State: Okay         
        Submirror 1: d1
          State: Okay         
        Pass: 1
        Read option: roundrobin (default)
        Write option: parallel (default)
        Size: 82593 blocks
    
    d0: Submirror of d10
        State: Okay         
        Size: 118503 blocks
        Stripe 0: (interlace: 32 blocks)
            Device              Start Block  Dbase State        Reloc  Hot Spare
            c1t9d0s0                   0     No    Okay         Yes    
            c1t10d0s0               3591     No    Okay         Yes    
    
    
    d1: Submirror of d10
        State: Okay         
        Size: 82593 blocks
        Stripe 0: (interlace: 32 blocks)
            Device              Start Block  Dbase State        Reloc  Hot Spare
            c1t9d0s1                   0     No    Okay         Yes    
            c1t10d0s1                  0     No    Okay         Yes    
    
    
    Device Relocation Information:
    Device       Reloc    Device ID
    c1t9d0       Yes      id1,sd@SSEAGATE_ST39103LCSUN9.0GLS3487980000U00907AZ
    c1t10d0      Yes      id1,sd@SSEAGATE_ST39103LCSUN9.0GLS3397070000W0090A8Q
    c1t11d0      Yes      id1,sd@SSEAGATE_ST39103LCSUN9.0GLS3449660000U00904NZ
    c1t12d0      Yes      id1,sd@SSEAGATE_ST39103LCSUN9.0GLS32655400007010H04J
    c1t13d0      Yes      id1,sd@SSEAGATE_ST39103LCSUN9.0GLS3461190000701001T0
    # 
    # metadb         
            flags           first blk       block count
         a m  p  luo        16              8192            /dev/dsk/c1t9d0s7
         a       luo        16              8192            /dev/dsk/c1t10d0s7
         a       luo        16              8192            /dev/dsk/c1t11d0s7
         a       luo        16              8192            /dev/dsk/c1t12d0s7
         a       luo        16              8192            /dev/dsk/c1t13d0s7
    # metastat 
    d12: RAID
        State: Okay         
        Interlace: 32 blocks
        Size: 125685 blocks
    Original device:
        Size: 128576 blocks
            Device              Start Block  Dbase State        Reloc  Hot Spare
            c1t11d0s3                330     No    Okay         Yes    
            c1t12d0s3                330     No    Okay         Yes    
            c1t13d0s3                330     No    Okay         Yes    
    
    d20: Soft Partition
        Device: d10
        State: Okay
        Size: 8192 blocks
            Extent              Start Block              Block count
                 0                     3592                     8192
    
    d21: Soft Partition
        Device: d10
        State: Okay
        Size: 8192 blocks
            Extent              Start Block              Block count
                 0                    11785                     8192
    
    d22: Soft Partition
        Device: d10
        State: Okay
        Size: 8192 blocks
            Extent              Start Block              Block count
                 0                    19978                     8192
    
    d10: Mirror
        Submirror 0: d0
          State: Okay         
        Submirror 1: d1
          State: Okay         
        Pass: 1
        Read option: roundrobin (default)
        Write option: parallel (default)
        Size: 82593 blocks
    
    d0: Submirror of d10
        State: Okay         
        Size: 118503 blocks
        Stripe 0: (interlace: 32 blocks)
            Device              Start Block  Dbase State        Reloc  Hot Spare
            c1t9d0s0                   0     No    Okay         Yes    
            c1t10d0s0               3591     No    Okay         Yes    
    
    
    d1: Submirror of d10
        State: Okay         
        Size: 82593 blocks
        Stripe 0: (interlace: 32 blocks)
            Device              Start Block  Dbase State        Reloc  Hot Spare
            c1t9d0s1                   0     No    Okay         Yes    
            c1t10d0s1                  0     No    Okay         Yes    
    
    
    Device Relocation Information:
    Device         Reloc    Device ID
    c1t9d0         Yes     id1,sd@SSEAGATE_ST39103LCSUN9.0GLS3487980000U00907AZ1
    c1t10d0        Yes     id1,sd@SSEAGATE_ST39103LCSUN9.0GLS3397070000W0090A8Q
    c1t11d0        Yes     id1,sd@SSEAGATE_ST39103LCSUN9.0GLS3449660000U00904NZ
    c1t12d0        Yes     id1,sd@SSEAGATE_ST39103LCSUN9.0GLS32655400007010H04J
    c1t13d0        Yes     id1,sd@SSEAGATE_ST39103LCSUN9.0GLS3461190000701001T0
    # metastat -p
    d12 -r c1t11d0s3 c1t12d0s3 c1t13d0s3 -k -i 32b
    d20 -p d10 -o 3592 -b 8192 
    d21 -p d10 -o 11785 -b 8192 
    d22 -p d10 -o 19978 -b 8192 
    d10 -m d0 d1 1
    d0 1 2 c1t9d0s0 c1t10d0s0 -i 32b
    d1 1 2 c1t9d0s1 c1t10d0s1 -i 32b
    #

 
 
 
  Previous   Contents   Next