Solaris Volume Manager Administration Guide

Maintenance and Last Erred States

When a component in a mirror or RAID 5 volume experiences errors, Solaris Volume Manager puts the component in the "Maintenance" state. No further reads or writes are performed to a component in the "Maintenance" state. Subsequent errors on other components in the same volume are handled differently, depending on the type of volume. A RAID 1 volume might be able to tolerate many components in the "Maintenance" state and still be read from and written to. A RAID 5 volume, by definition, can only tolerate a single component in the "Maintenance" state.

When a component in a RAID 0 or RAID 5 volume experiences errors and there are no redundant components to read from (for example, in a RAID 5 volume, after one component goes into Maintenance state, there is no redundancy available, so the next component to fail would go into "Last Erred" state) When either a mirror or RAID 5 volume has a component in the "Last Erred" state, I/O is still attempted to the component marked "Last Erred." This happens because a "Last Erred" component contains the last good copy of data from Solaris Volume Manager's point of view. With a component in the "Last Erred" state, the volume behaves like a normal device (disk) and returns I/O errors to an application. Usually, at this point some data has been lost.

Always replace components in the "Maintenance" state first, followed by those in the "Last Erred" state. After a component is replaced and resynchronized, use the metastat command to verify its state, then validate the data to make sure it is good.

Mirrors -If components are in the "Maintenance" state, no data has been lost. You can safely replace or enable the components in any order. If a component is in the "Last Erred" state, you cannot replace it until you first replace all the other mirrored components in the "Maintenance" state. Replacing or enabling a component in the "Last Erred" state usually means that some data has been lost. Be sure to validate the data on the mirror after you repair it.

RAID 5 Volumes-A RAID 5 volume can tolerate a single component failure. You can safely replace a single component in the "Maintenance" state without losing data. If an error on another component occurs, it is put into the "Last Erred" state. At this point, the RAID 5 volume is a read-only device. You need to perform some type of error recovery so that the state of the RAID 5 volume is stable and the possibility of data loss is reduced. If a RAID 5 volume reaches a "Last Erred" state, there is a good chance it has lost data. Be sure to validate the data on the RAID 5 volume after you repair it.

Background Information For Replacing and Enabling Slices in Mirrors and RAID 5 Volumes

When you replace components in a mirror or a RAID 5 volume, follow these guidelines:

Always replace components in the "Maintenance" state first, followed by those components in the "Last Erred" state.
After a component is replaced and resynchronized, use the metastat command to verify the volume's state, then validate the data to make sure it is good. Replacing or enabling a component in the "Last Erred" state usually means that some data has been lost. Be sure to validate the data on the volume after you repair it. For a UFS, run the fsck command to validate the "metadata" (the structure of the file system) then check the actual user data. (Practically, users will have to examine their files.) A database or other application must have its own way of validating its internal data structure.
Always check for state database replicas and hot spares when you replace components. Any state database replica shown to be in error should be deleted before you replace the physical disk. The state database replica should be added back before enabling the component. The same procedure applies to hot spares.
RAID 5 volumes - During component replacement, data is recovered, either from a hot spare currently in use, or using the RAID level 5 parity, when no hot spare is in use.
RAID 1 volumes- When you replace a component, Solaris Volume Manager automatically starts resynchronizing the new component with the rest of the mirror. When the resynchronization completes, the replaced component becomes readable and writable. If the failed component has been replaced with data from a hot spare, the hot spare is placed in the "Available" state and made available for other hot spare replacements.
The new component must be large enough to replace the old component.
As a precaution, back up all data before you replace "Last Erred" devices.

Note - A submirror or RAID 5 volume might be using a hot spare in place of a failed component. When that failed component is enabled or replaced by using the procedures in this section, the hot spare is marked "Available" in the hot spare pool, and is ready for use.


21. Maintaining Solaris Volume Manager (Tasks) Overview of Replacing and Enabling Components in RAID 1 and RAID 5 Volumes Replacing a Component With Another Available Component


	Maintenance and Last Erred States When a component in a mirror or RAID 5 volume experiences errors, Solaris Volume Manager puts the component in the "Maintenance" state. No further reads or writes are performed to a component in the "Maintenance" state. Subsequent errors on other components in the same volume are handled differently, depending on the type of volume. A RAID 1 volume might be able to tolerate many components in the "Maintenance" state and still be read from and written to. A RAID 5 volume, by definition, can only tolerate a single component in the "Maintenance" state. When a component in a RAID 0 or RAID 5 volume experiences errors and there are no redundant components to read from (for example, in a RAID 5 volume, after one component goes into Maintenance state, there is no redundancy available, so the next component to fail would go into "Last Erred" state) When either a mirror or RAID 5 volume has a component in the "Last Erred" state, I/O is still attempted to the component marked "Last Erred." This happens because a "Last Erred" component contains the last good copy of data from Solaris Volume Manager's point of view. With a component in the "Last Erred" state, the volume behaves like a normal device (disk) and returns I/O errors to an application. Usually, at this point some data has been lost. Always replace components in the "Maintenance" state first, followed by those in the "Last Erred" state. After a component is replaced and resynchronized, use the `metastat` command to verify its state, then validate the data to make sure it is good. Mirrors -If components are in the "Maintenance" state, no data has been lost. You can safely replace or enable the components in any order. If a component is in the "Last Erred" state, you cannot replace it until you first replace all the other mirrored components in the "Maintenance" state. Replacing or enabling a component in the "Last Erred" state usually means that some data has been lost. Be sure to validate the data on the mirror after you repair it. RAID 5 Volumes-A RAID 5 volume can tolerate a single component failure. You can safely replace a single component in the "Maintenance" state without losing data. If an error on another component occurs, it is put into the "Last Erred" state. At this point, the RAID 5 volume is a read-only device. You need to perform some type of error recovery so that the state of the RAID 5 volume is stable and the possibility of data loss is reduced. If a RAID 5 volume reaches a "Last Erred" state, there is a good chance it has lost data. Be sure to validate the data on the RAID 5 volume after you repair it. Background Information For Replacing and Enabling Slices in Mirrors and RAID 5 Volumes When you replace components in a mirror or a RAID 5 volume, follow these guidelines: Always replace components in the "Maintenance" state first, followed by those components in the "Last Erred" state. After a component is replaced and resynchronized, use the `metastat` command to verify the volume's state, then validate the data to make sure it is good. Replacing or enabling a component in the "Last Erred" state usually means that some data has been lost. Be sure to validate the data on the volume after you repair it. For a UFS, run the `fsck` command to validate the "metadata" (the structure of the file system) then check the actual user data. (Practically, users will have to examine their files.) A database or other application must have its own way of validating its internal data structure. Always check for state database replicas and hot spares when you replace components. Any state database replica shown to be in error should be deleted before you replace the physical disk. The state database replica should be added back before enabling the component. The same procedure applies to hot spares. RAID 5 volumes - During component replacement, data is recovered, either from a hot spare currently in use, or using the RAID level 5 parity, when no hot spare is in use. RAID 1 volumes- When you replace a component, Solaris Volume Manager automatically starts resynchronizing the new component with the rest of the mirror. When the resynchronization completes, the replaced component becomes readable and writable. If the failed component has been replaced with data from a hot spare, the hot spare is placed in the "Available" state and made available for other hot spare replacements. The new component must be large enough to replace the old component. As a precaution, back up all data before you replace "Last Erred" devices. Note - A submirror or RAID 5 volume might be using a hot spare in place of a failed component. When that failed component is enabled or replaced by using the procedures in this section, the hot spare is marked "Available" in the hot spare pool, and is ready for use.