Back to Contents Page

Troubleshooting

Dell OpenManage™ Array Manager 3.4

  Disks and Volumes

  Common Troubleshooting Procedures

  Problem Situations and Solutions

  System Performance Problems

  Dell PowerVault 660F and 224F Storage Systems Troubleshooting

This chapter contains status message information, troubleshooting procedures, and common problems and solutions. It also has a separate section for troubleshooting the Dell PowerVault™ 660F and 224F storage systems.


Disks and Volumes

If a disk or volume fails, it is important to repair the disk or volume as quickly as possible to avoid data loss. Because time is critical, Array Manager makes it easy for you to locate problems quickly. In the Status column of the list view, you can view the status of a disk or volume. The status also appears in the graphical view of each disk or volume. If the status is not Healthy for volumes or Online for disks, use the status information to determine the problem and then fix it.

There are also various troubleshooting procedures for disks, volumes, and arrays.

Topics include:

Disk Status Descriptions

One of the following disk status descriptions will always appear in the Status column of the disk in the right pane of the console window. If there is a problem with a disk, you can use this troubleshooting chart to diagnose and correct the problem

.

Status

Meaning

Online

The disk is accessible and has no known problems. This is the normal disk status. No user action is required. Both dynamic disks and basic disks display the Online status.

Online (Errors)

This status indicates that the disk is in an error state or that I/O errors have been detected on a region of the disk. All the volumes on the disk will display Failed or Failed Redundancy status, and you may not be able to create new volumes on the disk. Only dynamic disks display this status.

Right-click the failed disk and select Reactivate Disk to bring the disk to an Online status and bring all the volumes to a Healthy status.

Offline

The disk is not accessible. The disk may be corrupted or intermittently unavailable. An error icon appears on the offline disk. Only dynamic disks display the Offline status.

If the disk status is Offline and a separate corresponding icon titled Missing, Disk appears, the disk was recently available on the system but can no longer be located or identified. The Missing disk may be corrupted, powered down, or disconnected, or the disk may be a virtual disk that has been deleted.

Unreadable

The disk is not accessible. The disk may have experienced hardware failure, corruption, or I/O errors. The disk's copy of the system's disk configuration database may be corrupted. An error icon appears on the Unreadable disk. Both dynamic and basic disks display the Unreadable status.

Disks may display the Unreadable status while they are spinning up or when Array Manager is rescanning all the disks on the system. In some cases, an Unreadable disk has failed and is not recoverable. For dynamic disks, the Unreadable status usually results from corruption or I/O errors on part of the disk, rather than failure of the entire disk. You can rescan the disks (using the Rescan Disks command) or reboot the computer to see if the disk status changes.

Unrecognized

The disk has an original equipment manufacturer's (OEM) signature and Array Manager will not allow you to use this disk. For example, a disk from a UNIX system displays the Unrecognized status. Only Unknown disk types display the Unrecognized status.

Foreign Disk

The disk has been moved to your computer from another Microsoft® Windows NT® or Windows® 2000 computer and has not been set up for use. Only dynamic disks display this status. To add the disk so that it can be used, right-click the disk and select Merge Foreign Disk. All existing volumes on the disk will be visible and accessible.


Because a volume can span more than one disk (e.g., a mirrored volume), it is important that you first verify your disk configurations and then move the entire disk set that the volume is on. If only part of the disk set is moved, some of the volumes will show the Failed Redundancy or Failed error condition.

Array Disk Status Information

These definitions appear in the Status line and indicate the condition of array disks.

Status line entry

Status indication

Unknown

May signify a problem or indicate a transitional state. Additionally, a new disk that had previously been formatted or initialized by another type of RAID controller may show this state.

Ready

Means the array disk is operational. For PERC 2/SC, 3/SC, 2/DC, 3/DCL, 3/DC, 3/QC, 4/SC, 4/DC, 4/Di, and CERC ATA100/4ch controllers, Ready status applies to operational array disks that are not part of a virtual disk.

For the PERC 2, PERC 2/Si, PERC 3/Si, and PERC 3/Di controllers, operational array disks display Ready status regardless of whether they are a part of a virtual disk or not.

Failed

Not operational. A disk needs repair, has been removed, or has another problem that prevents operation.

Online

Operational. Applies to array disks contained in a virtual disk on PERC 2/SC, 2/DC, 3/SC, 3/DCL, 3/DC, 3/QC, 4/SC, 4/DC, 4/Di, and CERC ATA100/4ch controllers.

Offline

The drive is not available to the RAID controller.

Degraded

Refers to a fault-tolerant array/virtual disk that has a failed disk. This state definition may also appear when resynching the array/virtual disk, since the array/virtual disk is not fault-tolerant during the resynchronization.

Recovering

Refers to state of recovering from bad blocks on disks.

Removed

Indicates that array disk has been removed.

Resynching

This state definition appears during the following types of disk operations: Transform Type, Reconfiguration, and Check Consistency.

Rebuilding

Refers to part of a virtual disk being rebuilt.

No Media

CD-ROM or removable disk has no media.

Formatting

Refers to array disk in process of formatting.

Diagnostics

Indicates that diagnostics are running.

Reconstructing

The configuration of a virtual disk has been changed. The individual array disks within the virtual disk are being modified to support the changes. The data on the virtual disk will be saved. You cannot cancel a virtual disk reconstruction.

Initializing

Applies only to virtual disks on PERC 2/SC, 2/DC, 3/SC, 3/DCL, 3/DC, 3/QC, 4/SC, 4/DC, 4/Di, and CERC ATA100/4ch controllers. This prepares the virtual disk for use by Array Manager by deleting the configuration information on this virtual disk. The data on the virtual disk will be lost.

Disk Troubleshooting Procedures

The following sections describe disk troubleshooting procedures:

See also the following sections for these and other troubleshooting procedures:

Volume Status Descriptions

One of the following volume status descriptions will always appear in the graphical view of the volume and in the Status column of the volume in list view. If there is a problem with a volume, you can use this troubleshooter to diagnose and correct the problem.

Status

Meaning

Healthy

The volume is accessible and has no known problems. This is the normal volume status. No user action is required. Both dynamic volumes and basic volumes display the Healthy status.

Healthy (At Risk)

The volume is currently accessible, but I/O errors have been detected on the underlying disk. If an I/O error is detected on any part of a disk, all volumes on the disk display the Healthy (At Risk) status. A warning icon appears on the volume. Only dynamic volumes display the Healthy (At Risk) status.

When the volume status is Healthy (At Risk), an underlying disk's status is usually Online (Errors). To return the underlying disk to the Online status, reactivate the disk (using the Reactivate Disk command). Once the disk is returned to Online status, the volume should return to the Healthy status.

Initializing

The volume is being initialized. Dynamic volumes display the Initializing status.

No user action is required. When initialization is complete, the volume's status becomes Healthy. Initialization should be completed very quickly.

Resynching

The volume's mirrors are being resynchronized so that both mirrors contain identical data. Both dynamic and basic mirrored volumes display the Resynching status.

No user action is required. When resynchronization is complete, the mirrored volume's status returns to Healthy. Resynchronization may take some time, depending on the size of the mirrored volume. Although you can access a mirrored volume while resynchronization is in progress, you should avoid making configuration changes (such as breaking a mirror) during resynchronization.

Regenerating

Data and parity are being regenerated for the RAID-5 volume. Both dynamic and basic RAID-5 volumes display the Regenerating status.

No user action is required. When regeneration is complete, the RAID-5 volume's status returns to Healthy. You can access a RAID-5 volume while data and parity regeneration is in progress.

Failed Redundancy

The data on the volume is no longer fault tolerant because one of the underlying disks is not online. A warning icon appears on the volume with Failed Redundancy. The Failed Redundancy status applies only to mirrored or RAID-5 volumes. Both dynamic and basic volumes display the Failed Redundancy status.

You can continue to access the volume using the remaining online disks, but if another disk that contains the volume fails, you will lose the volume and its data. To avoid such loss, you should attempt to repair the volume as soon as possible.

A Failed Redundancy status will also display if a disk was moved and the volume on it spanned more than the single disk. To correct the problem, you must move the entire disk set that contains all the appropriate volumes.

Failed Redundancy (At Risk)

The data on the volume is no longer fault tolerant, and I/O errors have been detected on the underlying disk. If an I/O error is detected on any part of a disk, all volumes on the disk display the (At Risk) status. A warning icon appears on the volume. Only dynamic mirrored or RAID-5 volumes display the Failed Redundancy (At Risk) status.

When the volume status is Failed Redundancy (At Risk), the underlying disk's status is usually Online (Errors). To return the underlying disk to the Online status, reactivate the disk (using the Reactivate Disk command). Once the disk is returned to the Online status, the volume status should change to Failed Redundancy.

Failed

The volume cannot be started automatically. An error icon appears on the failed volume. Both dynamic and basic volumes display the Failed status.

Formatting

The volume is being formatted using the specifications you chose for formatting.

No Media

No media has been inserted into the CD-ROM or removable drive. The volume status will become Online when you insert the appropriate media into the CD-ROM or removable drive. Only CD-ROM or removable disk types display the No Media status.

Volume Troubleshooting Procedures

The following sections describe common volume troubleshooting procedures:

See also the following sections for these and other troubleshooting procedures:


Common Troubleshooting Procedures

This section describes commands and procedures that can be used in troubleshooting. Topics covered include:

Cables attached correctly

Verify that the power-supply cord and adapter cables are attached correctly. If the system is having trouble with read and write operations to a particular array (if the system hangs, for example), then make sure that the SCSI cables attached to the array are secure. If the connection is secure but the problem persists, you may need to replace a cable. See also the "Isolate SCSI device problems" section.

System Requirements

Make sure that the system meets all system requirements as described in the readme.txt file located in the installation directory. In particular, verify that the correct levels of firmware and drivers are installed on the system. For more information on drivers and firmware, see the "Drivers and Firmware" section.

Drivers and Firmware

Array Manager is tested with the supported controller firmware and drivers. The supported controllers and firmware are listed in the readme.txt file. To avoid possible conflicts or inconsistencies between the controller firmware and drivers, it is recommended that you only use the supported versions. The most current versions can be obtained from the Dell support site at http://support.dell.com.

In a SAN environment, all LS modules in an array should have the same firmware version. When upgrading the firmware on an LS module, make sure to upgrade the firmware on the other LS modules at the same time.

It is also recommended to obtain and apply the latest Dell PowerEdge™ Server System BIOS on a periodic basis to benefit from the most recent improvements. Please refer to the Dell PowerEdge system documentation for more information.

Isolate SCSI device problems

If you receive a "timeout" event related to a SCSI device or if you otherwise suspect that one of the SCSI devices is experiencing a hardware failure, then do the following to confirm the problem:

Rescan to Update Information

Use Rescan to update disk information. This operation may take a few minutes if there are a number of devices attached to the system. You will see a message "Getting hardware configuration. Please wait." while the rescan is occurring.

If this does not properly update the disk information, you may need to reboot your system.

Maintain integrity of redundant (mirrored and parity) information

The Check Consistency function determines the accuracy of mirrored data and parity information. When necessary, this feature rebuilds the parity information. For more information, see the following sections:

Reactivate a Disk

  1. Reboot your machine to update the list of existing disks.

  2. Right-click the disk marked Missing or Offline dynamic disk.

  3. Use Rescan to change the disk status to Online (errors).

  4. Right-click the disk marked Missing or Offline dynamic disk. Select Reactivate Disk from the context menu. The disk should be marked Online after the disk is reactivated.

  5. For any volumes that are not Healthy, right-click the volume from the context menu and select Reactivate Volume.

Caution When reactivating a volume, be aware that the volume's data is restored, even if it is stale, corrupt, or out-of-date. See "Reactivate a Dynamic Volume" for more information on the consequences of reactivating a volume.

Bring a Dynamic RAID-5 or Mirrored Volume Back Online

A RAID 5 volume's status can appear as Failed Redundancy and the disk's status is Offline. The disk's name may be Missing, and an error icon (X) appears on the missing or offline disk. In this case, do the following.

  1. Rescan the disk to make sure the disk, controller, or cable problem is fixed.

  2. Try to reactivate the disk by right-clicking on the disk and selecting Reactivate Disk.

  3. If the volume remains as Failed Redundancy or Failed, right-click the volume, then select Reactivate Volume. If all disks on this volume are Online, the volume should be brought back to a healthy state.

Caution When reactivating a volume, be aware that the volume's data is restored, even if it is stale, corrupt, or out-of-date. See "Reactivate a Dynamic Volume" for more information on the consequences of reactivating a volume.

Reactivate a Dynamic Volume

Reactivating a volume attempts to restart all volumes regardless of the volume's state. If data corruption exists, you can reactivate the volume and then run the chkdsk utility. However, in the case of a mirrored or RAID-5 volume, reactivating a volume with stale data can cause that data to be used when it is inaccurate.

Reactivating a volume should be done only if you understand that the volume's data, which might be corrupted, will be restored. For example, if one mirror in a mirrored volume fails and data is written to the remaining mirror, the data is now out of sync. Then, if the remaining mirror (the one with accurate data) fails and the first mirror is reactivated, the stale data becomes "real" data.

For this reason, it is important to act on data failures as soon as possible. You should use care when reactivating volumes.

Repair Basic Volumes

Make sure that the underlying physical disk is turned on, plugged in, and attached to the computer. No other user action is possible for basic volumes unless the volumes are mirrored or RAID-5 volumes that were originally created in NT Disk Administrator. The repair of these volumes is covered in the next topic.

Repair Dynamic Volumes

  1. If the disks are not online, use the Rescan and then the Reactivate Disk commands to return the disk to the Online status. If this succeeds, the volume automatically restarts and returns to the Healthy status. A mirrored volume repairs itself by resynchronizing the data in its mirrors. A RAID-5 volume repairs itself by regenerating its parity and data.

  2. If the disk returns to the Online status but the volume does not return to the Healthy status, you can reactivate the volume manually (using the Reactivate Volume command).

Caution When reactivating a volume, be aware that the volume's data is restored, even if it is stale, corrupt, or out-of-date. See "Reactivate a Dynamic Volume" for more information on the consequences of reactivating a volume.
  1. If the volume is a mirrored or RAID-5 volume with stale data, bringing the underlying disk online will not automatically restart the volume. If the disks that contain non-stale data are disconnected, you should bring those disks online first (to allow the data to become synchronized). Otherwise, restart the mirrored or RAID-5 volume manually (using the Reactivate Disk command), and then run Chkdsk.exe. To run Chkdsk.exe, click Start, click Run, type chkdsk, and then click OK.

  2. If the disk does not return to the Online status and the volume does not return to the Healthy status, there may be something wrong with the disk. You should replace the failed mirror or RAID-5 disk region. To replace the failed mirror in a mirrored volume, use the Remove Mirror command to remove the failed mirror, then use the Add Mirror command to create a new mirror on another disk. To replace the failed disk region in a RAID-5 volume, use the Repair RAID-5 Volume command.

There are particular considerations regarding dynamic disks and volumes on NetWare, Windows Server 2003, and Linux. See "Dynamic Disk and Volume Support on NetWare, Windows Server 2003, and Linux" for more information.

Repair a Dynamic RAID-5 Volume

  1. Right-click volume, then click Repair RAID-5 volume.

  2. A message appears that indicates that the repair will be attempted if there is another dynamic disk with adequate unallocated space. Click Yes to confirm the repair.

  3. The volume should be brought back to a healthy state.

You should be able to repair a RAID-5 volume if it is in a state of Failed Redundancy, and if there is unallocated space on another dynamic disk available. To avoid data loss, you should attempt to repair the volume as soon as possible.

Repair Basic Mirrored or RAID-5 Volumes

Use Microsoft Windows NT Disk Administrator to repair basic mirrored or RAID-5 volumes if you are running Windows NT 4.0. For Windows 2000, there is a command available form the context menu for repairing basic mirrored or RAID-5 volumes.

Caution In Windows NT 4.0, Disk Administrator should never be used while Array Manager is running, especially if there are tasks running on the controller at the time. Data loss can occur if both applications are running simultaneously.

Drive Letters and Drive Mapping

A Drive Letter is Unavailable

After deleting a basic disk, the drive letter used by that disk may no longer be available. To correct this problem, reboot the server.

Drive Mapping is Not Working

Drive mapping may not work properly on Windows NT and Windows 2000 systems with PERC 3/DC, PERC 3/DCL, PERC 3/QC, PERC 2/DC, PERC 3/SC, PERC 2/SC, PERC 4/SC, PERC 4/DC, PERC 4/Di, PERC 4/IM, and CERC ATA100/4ch controllers. After creating a virtual disk on these controllers, the disk may not be visible in the disk folder until the system is rebooted. After rebooting the system, the mapping between the newly created disk and the corresponding Windows NT or Windows 2000 disk may not be displayed in the Array Manager console.

Solution for Windows NT:

After creating a virtual disk and rebooting the system, do a console rescan by either clicking the Rescan button or selecting Rescan from the View pull-down menu.

Solution for Windows 2000:

When using a PERC 2/SC or 2/DC controller, upgrade your driver to MRAID 35X.SYS version 2.68 or later.

Recovering from Removing the Wrong Drive

If the drive that you mistakenly removed is part of a redundant virtual disk that also has a hot spare, then the virtual disk rebuilds automatically either immediately or when a write request is made. After the rebuild has completed, the virtual disk will no longer have a hot spare since data has been rebuilt onto the disk previously assigned as a hot spare. In this case, you should assign a new hot spare.

If the drive that you removed is part of a redundant virtual disk that does not have a hot spare, then replace the drive and do a rebuild.

See the following sections for information on rebuilding drives and assigning hot spares:

You can avoid removing the wrong drive by blinking the LED display on the drive that you intend to remove. See the following sections for information on blinking the LED display:

Resolving Windows Upgrade Problems

If you upgrade the Windows operating system on a server, you may find that Array Manager no longer functions after the upgrade. The installation process installs files and makes registry entries on the server that are specific to the operating system. For this reason, changing the operating system can disable Array Manager.

To avoid this problem, you should uninstall Array Manager before upgrading. If you have already upgraded without uninstalling Array Manager, however, you should uninstall Array Manager after the upgrade.

After you have uninstalled Array Manager and completed the upgrade, reinstall Array Manager using the Array Manager install media. You can download Array Manager from the Dell support site at http://support.dell.com.


Problem Situations and Solutions

This section contains additional trouble-shooting problem areas. Topics include:

Note If you are using the Dell PowerVault 660F storage system and the PowerVault 224F enclosure, see "Dell PowerVault 660F and 224F Storage Systems Troubleshooting," for additional issues specific to the PowerVault 660F storage system and PowerVault 224F enclosure.

Rebuild does not work

A rebuild will not work in the following situations:

Cannot create a virtual disk (option is inactive)

Check:

Cannot create a RAID-5 volume

Check:

Cannot create a mirror

Check:

When expanding the Disks object, error icons appear

Situation:

Windows is not aware of the status of these disks. Most likely, the virtual disks that were associated with these have been deleted.

Check:

To remove these error status icons from the Disks object, the computer must be restarted to allow Windows to find the current information.

Situation:

If the type of disk shows No Signature, you need to write a signature to the disk. When creating a new virtual disk, the software must write a signature to the virtual disk that prepares it for use. This signature is not written automatically in case this disk has been merged from another operating system and the configuration information needs to be kept intact.

Check:

For instructions on writing a disk signature, see the section "Write a Disk Signature" in the "Disk Management" chapter.

Missing Disk displays error icon

The corresponding virtual disk has been removed, or the disk has been rendered inactive because of a problem.

Check:

Once you have repaired the disk, controller, or cable problem, you need to:

  1. Rescan to see the disk within Array Manager. If Array Manager finds the disk, this should bring the disk Online. If Array Manager does not find the disk, a reboot may be required.

  2. Reactivate Disk to bring all the volumes on the disk to the Healthy status.

Read and Write Operations Experience Problems

If the system is hanging, timing out, or experiencing other problems with read and write operations, then there may be a problem with the adapter cables or a SCSI device. For more information, see the "Cables attached correctly" and "Isolate SCSI device problems" sections.

Problems after Installing a PERC 2/SC or 2/DC Controller

If you install a PERC 2/SC or 2/DC controller after you have already installed Array Manager, you may experience problems with Drive mapping, system hangs, and other performance problems. Reinstall Array Manager to resolve these problems.

I/O Stops on a Channel Redundant Channel

If you have implemented channel redundancy on a PERC 2/SC, 2/DC, 3/SC, 3/DCL, 3/DC, 3/QC, 4/SC, 4/DC, or 4/Di controller, a failure of one channel causes I/O to stop on the other channels included in the channel redundant configuration. The resolution to this problem is described in the "Considerations for Implementing Channel Redundancy" section.

Error message that the connection to remote computer has terminated

The full message is: "The connection to the remote computer has terminated. Remote computer will be removed from view." The remote computer that you were connected to has been disconnected from your console. Most often, there is a problem with the network connection and the transmissions timed out. This can also occur if the remote machine was restarted or the service on the remote machine was stopped.

Check:

Make sure that the remote machine is turned on and is available to the network, and that the service is started. Reconnect to the resource.

Error Message: The stripe depth is out of range

Array Manager displays "The stripe depth is out of range" error message when you attempt to apply a RAID 0 or RAID 5 to more array disks than the controller can support in a single virtual disk. For example, the PERC 4/SC and 4/DC controllers can support up to 32 array disks in a virtual disk when using RAID 0 or RAID 5. Attempting to create a RAID 0 or RAID 5 using more than 32 array disks on these controllers will cause this error message to be displayed.

Tree view object for PowerEdge RAID controller cannot be expanded after the software and driver are installed

The installation detects any drivers that you have installed for PowerEdge RAID controllers. If these drivers (and/or the card itself) are installed after the software is installed, support for the controller will need to be added.

Check:

Close the console. Open the Array Manager Utilities and check the box next to the appropriate controller. This action will restart the service, and the disks should be available the next time you launch the console.

An option is inactive

When an operation is inactive or dimmed in a menu, the task cannot be performed on the object at this time. Certain operations are valid only for certain types of objects. (For example: RAID levels that are not fault tolerant will not allow you to check the consistency of the virtual disk.) If there is a task currently running on that object, wait until it has finished and try again. Otherwise, the operation may not be appropriate at this time.

To bring a disk that is Offline and Missing back online

If this was a virtual disk, then check that the virtual disk still exists. If it no longer exists, use the Remove Disk command to remove the disk from the list of disks.

Repair any disk, controller, or cable problems and make sure that the physical disk is turned on, plugged in, and attached to the computer. From the View pull-down menu, select Rescan. The disk should change from Offline to Online, but the volumes remain Failed. (If they do not change to Online, you may need to reboot.) Right-click the disk and select Reactivate Disk. The disk status changes to Healthy. (You can also select each volume one at a time and select Reactivate Volume. It is recommended you do a chkdsk.

Caution When reactivating a volume, be aware that the volume's data is restored, even if it is stale, corrupt, or out-of-date. See "Reactivate a Dynamic Volume" for more information on the consequences of reactivating a volume.

If the disk status remains Offline and Missing and you determine that the disk has a problem that cannot be repaired, you can remove the disk from the system (using the Remove Disk command). However, before you can remove the disk, you must delete all volumes on the disk. You can save any mirrored volumes on the disk by removing the mirror that is on the Missing disk instead of the entire volume. Deleting a volume destroys the data in the volume, so you should remove a disk only if you are absolutely certain that the disk is permanently damaged and unusable.

To bring a disk that is Offline (not Missing) and is still named Disk # back online

Use the Reactivate Disk command to bring the disk back online. If the disk status remains Offline, check the cables and disk controller, and make sure that the physical disk is healthy. Correct any problems and try to reactivate the disk again. If the disk reactivation succeeds, any volumes on the disk should automatically return to the Healthy status.

A disk on a PERC 4/Di controller does not return online after a Prepare to Remove

When you do a Prepare to Remove command on an array disk attached to a PERC 4/Di controller, you may find that the disk does not display in the Array Manager tree view even after doing a rescan or a reboot.

In this case, do the following to redisplay the disk in the Array Manager tree view:

  1. Manually remove and then replace the array disk.

  2. Either do a Rescan or reboot the system.

A disk is marked as Foreign

The disk has been moved to your computer from another Microsoft Windows NT/2000 computer and has not been set up for use. Only dynamic disks display this status. To add the disk so that it can be used, right-click the disk and select Merge Foreign Disk. All existing volumes on the disk will be visible and accessible.

Because a volume can span more than one disk (e.g., a mirrored volume), it is important that you first verify your disk configurations and then move the entire disk set that the volume is on. If only part of the disk set is moved, some of the volumes will show Failed Redundancy or Failed error condition.

A Disk is Marked as Offline or Foreign after Upgrading to Dynamic with a PERC 2/SC, 2/DC, 3/SC, 3/DCL, 3/DC, 3/QC, 4/SC, 4/DC, 4/Di, or CERC ATA100/4ch Controller

If you initialize a virtual disk that has been upgraded to dynamic, the status of the dynamic disk may change to "offline" or "foreign." You can view a disk's status by selecting the disk's General tab. When using a PERC 2/SC, 2/DC, 3/SC, 3/DCL, 3/DC, 3/QC, 4/SC, 4/DC, 4/Di, or CERC ATA100/4ch controller, you can resolve this problem by reverting the "offline" or "foreign" disk to a basic disk. See "Reverting a Dynamic Disk to Basic."

Virtual Disk Initialization Causes a Foreign, Offline, or Missing Disk

Because initializing a virtual disk destroys the data on the virtual disk, you may find that after initializing a virtual disk on a Windows system, a disk marked as "foreign" or "missing" is displayed under the Disks folder. In addition, initializing a virtual disk containing a dynamic disk changes the status of the dynamic disk to foreign or offline.

To reuse a Windows disk that is set to foreign or offline, right-click the disk and select Merge Foreign Disk or Revert to Basic Disk from the pop-up menu.

If the disk is marked as missing, right-click the disk and select Remove Disk.

A Disk's Functions become Inactive or It is Marked as Foreign or Offline after Upgrading to Dynamic with a PERC 2, 2/Si, 3/Si, and 3/Di Controller

If you format a virtual disk that has been upgraded to dynamic, the disk functions may become inactive or the status of the disk may change to "foreign" or "offline." You can view a disk's status by selecting the disk's General tab. When using a PERC 2, 2/Si, 3/Si, and 3/Di controller, you can resolve these problems by doing a global rescan.

The Online Help behaves strangely, or will not come up at all

The Help file uses a technology known as HTML Help, a Microsoft standard. Some software will attempt to update the core files with an older version of HTML Help and make Array Manager's Help file unusable. The required HTML Help update is located on the Array Manager CD-ROM in the Help Update folder. Double-click HHUPD.EXE and follow the instructions.

When attempting to bring up the Help file, Dr. Watson reports an Access Violation in HH.EXE

HH is Microsoft's HTML Help format, which reads precompiled HTML files for Array Manager's Help sections.

Check:

Delete the HH.DAT file in your Windows directory. Deleting this file will remove any customizations that have been made to your HTML help files.

During reboot, a message displays about a "corrupt drive," suggesting that you run autocheck

Let autocheck run, but do not worry about the message. Autocheck will finish and the reboot will be complete. If you have a large system (more than 1 gigabyte), this may take about 10 minutes.

When attempting to access a remote computer, you are denied access or get an error message

There are several situations where this occurs.

You are denied access and do not even get a connection login box

This occurs when you log in to the local computer originally as a local user, local administrator, or domain user and the remote computer is not in your domain or a trusted domain. The Windows security model does not allow you to have access under these circumstances. The workaround is to log in to your local computer with an account that has the same user name and password as an administrator account on the remote computer.

You are denied access after typing the login information in the connection box

Access can be denied here if you do not type in a user name and password that match a local or domain administrator account on the remote computer or if you mistype the login information.

"Connection Failed" message

If the remote computer is not on or there are network problems, you will get the message "Connection Failed."

For a NetWare system, refer to "The Connection Failed" message displays when connecting to the NetWare server.

You are unable to connect to a Windows 2000 server with Disk Management after a client-only installation

Another situation where you may get an error message is when you have just done a client-only installation of Array Manager and you bring up the Array Manager client and attempt to connect to a remote server that has Windows 2000 Disk Management.

Array Manager assumes that its client will connect first to a remote server running Array Manager before connecting to a system running Windows 2000 Disk Management.

Once you connect to a server with Array Manager, you will then be able to connect successfully to a remote system running Disk Management.

Windows 2000 Disk Management is the disk and volume management program that comes with Windows 2000. Because Array Manager and Disk Management are related programs, Array Manager is able to remotely manage the storage on a Windows 2000 computer with Disk Management.

You are unable to connect to a NetWare server

If you are having problems connecting to a NetWare® server from a local machine, use the ping and nslookup TCP/IP network diagnostic tools to determine whether the NetWare server is accessible from the local machine and whether the system running the NetWare server has a valid DNS name. If the system running the NetWare server does not have a valid DNS name, then you can edit the Hosts file on the local machine with an entry for the system running the NetWare server. The Hosts file is located in the winnt/system32/drivers/etc directory. The entry in the Hosts file should consist of the IP address and the host or server name of the system running the NetWare server.

If you do not connect by using a valid DNS name or an entry in the Hosts file, then you will need to use the IP address.

When you want to connect to a NetWare server, Array Manager expects the server to be identified by one of three types of entries:

If you identify the name of the machine by a NetWare server's name that is not one of the three items above, the connection will fail. It is suggested that the name assigned to the NetWare server be the same name as its DNS or Hosts file entry.

Note that the DNS and Hosts file entries do not allow for a computer name that consists of all numbers. In addition, the DNS name does not allow a computer name that starts with a number. If the NetWare server has a numeric name or a name that starts with a number, you can use the IP address to identify that server. You can also put quotation marks around the computer's name for the entry in DNS or the Hosts file (such as "12345").

The Hosts file has to be on the client computer (local machine) that has the Array Manager console.

In addition, connecting to a remote system requires that you have administrator authority on both the local and remote system.

Note Dell does not offer NetWare in Japan.

After creating a virtual disk with a PERC 2/SC, 2/DC, 3/SC, 3/DCL, 3/DC, 3/QC, 4/SC, 4/DC, 4/Di, or CERC ATA100/4ch controller, the virtual disk does not appear under the Disks storage object

If there are no virtual disks configured at boot time on a PERC 2/SC, 2/DC, 3/SC, 3/DCL, 3/DC, 3/QC, 4/SC, 4/DC, 4/Di, or CERC ATA100/4ch controller on Windows 2000, the Windows disk driver may not be loaded. The solution is to reboot after creating the first virtual disk or create the first virtual disk in the bios (use Ctrl-m to invoke the BIOS utility).

"The Connection Failed" message displays when connecting to the NetWare server

If you are trying to connect to the NetWare server with the Array Manager console, you may receive a "The Connection Failed" error message. There can be a variety of reasons for why the connection between the Array Manager console and the NetWare managed system fails. (See also "Connection Failed" message.)

To identify why the connection failed, perform the following steps:

  1. Ping the NetWare server from the system running the Array Manager console. If this fails, then you are experiencing network problems.

  2. Verify that the correct NDS tree, user ID, and password are correct for the target NetWare server. Also verify that the user ID has administrator rights.

  3. Verify that the server name is included in some form of DNS (DNS server, hosts file, and so forth).

  4. Verify that the server name does not start with a number. This can cause problems with DNS.

  5. Restart the console if an entry for the NetWare server was just added to the hosts file.

Note You may be able to avoid this connection problem by using the NetWare server's IP address instead of the server.
Note Dell does not offer NetWare in Japan.

A Disk is Marked as Failed when Rebuilding in a Cluster Configuration

When a system in a cluster attempts to rebuild a failed disk but the rebuild fails, then another system takes over the rebuild. In this situation, you may notice that the rebuilt disk continues to be marked as failed on both systems even after the second system has rebuilt successfully. To resolve this problem, perform a rescan on both systems after the rebuild completes successfully.

Erroneous Status and Error Messages after a Windows Hibernation

Activating the Windows hibernation feature may cause Array Manager to display erroneous status information and error messages. This problem resolves itself when the Windows operating system recovers from hibernation.

Cannot Connect to Remote System from Windows Server 2003

Certain conditions must be met before you can connect to a remote system from Windows Server 2003. For a description of these conditions, see "Remote Connection and Windows Server 2003."


System Performance Problems

This section describes problems that may deteriorate system performance.

Unusual CPU Usage

You may notice unusual surges in your system's CPU usage. These surges may be caused by Array Manager's volume capacity monitoring. This function monitors NTFS volumes on the local server for the amount of space used. When the space used on an NTFS volume reaches 90%, a warning event is logged in the Array Manager and Windows event log. When the space used reaches 98%, an error event is logged.

If the surges in CPU usage pose a problem, you can disable volume capacity monitoring.

To disable volume capacity monitoring:

  1. Launch the Array Manager Utilities by clicking Start | Programs | Dell OpenManage Applications | Array Manager and selecting Array Manager Utilities.

  2. Deselect the Volume Capacity Monitoring check box on the Windows tab.

  3. Click Apply and then Close to exit the Array Manager Utilities.

For information related to volume capacity monitoring, see the following:


Dell PowerVault 660F and 224F Storage Systems Troubleshooting

This section presents possible problem situations with accompanying solutions for the Dell PowerVault 660F and 224F storage systems. The problem situations are organized as follows:

The situations in the first three topics are categorized by their event number. A brief discussion of event messages is included at the beginning of this section in the topic "Event Monitoring and Logging." The fourth topic describes general problems not related to a specific event.

You will also find a full listing of the events associated with the Dell PowerVault 660F Fibre Channel RAID controller at the end of this section in the topic "Events Generated by the PowerVault 660F Storage System."

Event Monitoring and Logging

Event messages help identify significant incidents such as an array disk failure or an array disk addition. Event monitoring and logging starts when the Array Manager managed system starts up. If the managed system service (Disk Management Service) stops in Microsoft Windows NT or the Array Manager Service stops in NetWare, then event monitoring and logging stops. If array disks are S.M.A.R.T. (Self Monitoring Analysis and Reporting Technology) enabled, the RAID controllers check array disks for failure predictions, and if found, pass this information on to the Array Manager console. Array Manager immediately displays an alert icon on the array disk and also raises an alert under the Events tab and in the Windows NT event log. Windows NT has three event logs; Array Manager uses the application log.

Note When a controller's I/O is paused, Array Manager does not receive S.M.A.R.T. events.

Fibre Channel RAID Controller Status Events

The following incidents are included in this topic:

Event 708, Rebuild stopped with error

Cause of Problem

Because of some unknown error on the Fibre Channel RAID controller, the rebuild has failed.

Solution

Try rescanning the controller: from the Array Manager tree view, click to expand the Arrays storage object, right-click the PV660F Subsystem object, and then select Rescan from the context menu that comes up. This action will update the controller status within the GUI.

If the controller has been removed and reinserted, check to see that the controller is inserted correctly: the DB9 connector should be located at the top of the module. For details, see the Dell PowerVault 660F and 224F Storage Systems Service Manual . Also, check that all cable connections are correctly and firmly connected. Try to rebuild again: right-click the Array Disk storage object in the tree view, and then select Rebuild from the context menu that comes up.

If controller and connections are correct and the problem continues, contact customer service.

Event 840, Controller has been reset

Cause of Problem

One of the following may have occurred on one of the Fibre Channel RAID controllers in the PowerVault 660F storage system:

Solution

Then access the Array Manager console, click to expand the Arrays storage object, right-click the PV660F Subsystem storage object, and select Rescan from the context menu that comes up. This action will update the controller status.

Event 857, Warm boot failed

Cause of Problem

A memory error was detected during the warm boot scan, which may result in possible data loss.

Solution

Power cycle the PV660F subsystem.

If the error persists:

Event 858, Controller entered Conservative Cache mode

Cause of Problem
Solution

If Enclosure Management has been enabled, check to see whether one or more LS modules has failed. LS modules are cards installed in slots at the front of the enclosure. Each LS module has an SES processor that monitors environmental functions and a Loop Redundancy Circuit (LRC) function, which maintains the viability of the Fibre Channel loop. See the Dell PowerVault 660F and 224F Storage Systems Installation and Troubleshooting Guide for troubleshooting tips.

If the controller entered Conservative Cache mode because of user's intended action, proceed per user's intent. When finished, right-click the controller and select either Enable Partner controller or Enable BBU to exit Conservative Cache mode.

If the BBU battery is low, recondition the battery. If the battery needs to be replaced, see the Dell PowerVault 660F and 224F Storage Systems Service Manual. Then access the Array Manager console, and in the tree view, click to expand the Arrays storage object, right-click the PV660F Subsystem storage object to bring up the context menu, and select Rescan. This action will update the controller.

If there is an Expand Capacity or Add Virtual Disk operation in progress, wait until this activity has finished. Then access the Array Manager console, and in the tree view, click to expand the Arrays storage object, right-click the PV660F Subsystem storage object to bring up the context menu, and select Rescan. This action will update the controller.

See also "Conservative Cache Mode" in the "The Dell PowerVault 660F Storage System" chapter.

Event 870, Killed partner

Cause of Problem
Note Nexus refers to the state in which both redundant controllers are in communication. In this state, each controller can copy write-back data to its partner controller and can determine whether the other controller is operating.
Solution

Go to the Array Manager console and click to expand the Arrays storage object, right-click the PV660F Subsystem storage object, and select Rescan from the context menu that comes up. This action will update the partner controller. If the situation does not improve, try one of the following (Rescan as before when necessary after troubleshooting the partner controller):

If none of these solutions apply, contact customer service.

Event 872, Controller boot ROM image needs to be reloaded

Cause of Problem

The Media Access Control (MAC) address is corrupted.

Solution

Replace the controller according to the Dell PowerVault 660F and 224F Storage Systems Service Manual. Then go to the Array Manager console and click to expand the Arrays storage object, right-click the PV660F Subsystem storage object, and select Rescan from the context menu that comes up. This action will update the controller status.

Event 873, Controller is using default nonunique world-wide name

Cause of Problem

The Media Access Control (MAC) address has been lost or was not set.

Solution

Replace the controller according to the Dell PowerVault 660F and 224F Storage Systems Service Manual. Then go to the Array Manager console and click to expand the Arrays storage object, right-click the PV660F Subsystem storage object, and select Rescan from the context menu that comes up. This action will update the controller status.

Event 887, Back End Fibre Dead

Cause of Problem
Note LS modules are cards installed in slots at the front of the enclosure. Each LS module has an SES processor that monitors environmental functions and a Loop Redundancy Circuit (LRC) function, which maintains the viability of the Fibre Channel loop.
Solution

Check that the Fibre Channel cable is connected to the controller and the switch box. If not, reconnect it as necessary. If the Fibre Channel cable is connected, try replacing the cable. See the Dell PowerVault 660F and 224F Storage Systems Installation and Troubleshooting Guide for troubleshooting the LS and I/O modules.

When troubleshooting is complete, go to the Array Manager console and click to expand the Arrays storage object, right-click the PV660F Subsystem storage object, and select Rescan from the context menu that comes up. This action will update the Fibre Channel.

If problem persists, contact customer service.

Enclosure Status Events

The following incidents are included in this topic:

Event 818 and 820, Fan failure or Fan is not present

Cause of Problem
Solution

To locate a fan, right-click the bad fan and click Properties. The Enclosure ID field indicates the ID number of the enclosure where this fan is located. Be aware that the enclosure ID number displayed by Array Manager does not match the Enclosure ID set on the switch module ID indicator on the back of the PowerVault 660F or 224F enclosures. When the switch module ID indicators on all the enclosures are configured properly, the enclosure ID numbers start at 0 and continue sequentially through 7. The enclosure ID numbers displayed by Array Manager, however, start at 1 and continue sequentially through 8. For this reason, the enclosure ID number displayed by Array Manager will be one greater than the number set on the switch module ID indicator on the back of the PowerVault 660F or 224F enclosures. For example, if the switch module ID indicator identifies the enclosure as 0, the Array Manager enclosure ID will be 1.

See the Dell PowerVault 660F and 224F Storage Systems Installation and Troubleshooting Guide for information on how to troubleshoot the Advanced Cooling Module (ACM). See the Dell PowerVault 660F and 224F Storage Systems Service Manual for part replacement procedures.

After troubleshooting or replacing the ACM, go to the Array Manager console and click to expand the Arrays storage object, right-click the PV660F Subsystem storage object, and select Rescan from the context menu that comes up. This action will update the fan status within the GUI.

Event 821 and 823, Power supply failure or Power supply is not present

Cause of Problem
Solution

See the Dell PowerVault 660F and 224F Storage Systems Installation and Troubleshooting Guide for information on how to troubleshoot the power supply. See the Dell PowerVault 660F and 224F Storage Systems Service Manual for part replacement procedures.

Note LS modules are cards installed in slots at the front of the enclosure. Each LS module has an SES processor that monitors environmental functions and a Loop Redundancy Circuit (LRC) function, which maintains the viability of the Fibre Channel loop.

After troubleshooting or replacing the power supply, go to the Array Manager console and click to expand the Arrays storage object, right-click the PV660F Subsystem storage object, and select Rescan from the context menu that comes up. This action will update the power supply status within the GUI.

Event 825, Temperature is above working limit

Cause of Problem
Solution

Check all fans to see whether they are functioning properly. If yes, check that the ambient temperature is within limit. If necessary, adjust the room temperature. If the problem persists, power-cycle the system. If this does not solve the problem, replace the affected Advanced Cooling Module (ACM). See the Dell PowerVault 660F and 224F Storage Systems Installation and Troubleshooting Guide for information on how to troubleshoot the ACM. See the Dell PowerVault 660F and 224F Storage Systems Service Manual for part replacement procedures.

After fixing the temperature problem, go to the Array Manager console and click to expand the Arrays storage object, right-click the PV660F Subsystem storage object, and select Rescan from the context menu that comes up. This action will update the temperature and/or the fan status within the GUI.

Event 828, Enclosure access is Critical

Cause of Problem

The LS module connection may be broken or the management hardware is bad. LS modules are cards installed in slots at the front of the enclosure. Each LS module has an SES processor that monitors environmental functions and a Loop Redundancy Circuit (LRC) function, which maintains the viability of the Fibre Channel loop

Solution

Check to see that the LS module is enabled.

Follow the Dell PowerVault 660F and 224F Storage Systems Installation and Troubleshooting Guide for troubleshooting an LS module. For part replacement, see the Dell PowerVault 660F and 224F Storage Systems Service Manual.

After resolving the hardware problem and providing corrective action, go to the Array Manager interface and click to expand the Arrays storage object, right-click the PV660F Subsystem storage object, and select Rescan. This action will update the enclosure status within the GUI.

Event 831, Enclosure Soft Addressing detected

Cause of Problem

The enclosure has duplicate loop IDs (soft addressing).

Solution

Make sure shelf ID switches on all PV660s and PV224s in the subsystem are set to different numbers. See the Dell PowerVault 660F and 224F Storage Systems Installation and Troubleshooting Guide on how to set shelf IDs.

Event 844, BBU power low

Cause of Problem

A battery backup unit (BBU) with a low charge was found on the controller.

Solution

If this message occurs without power failure, replace the BBU.

To replace the BBU, see the Dell PowerVault 660F and 224F Storage Systems Service Manual for part replacement procedures.

The BBU requires two reconditioning cycles prior to first time use. This reconditioning process will take several hours and cannot be interrupted. Refer to "Recondition" in the chapter "The Dell PowerVault 660F Storage System" for detailed instructions on performing a BBU recondition.

After troubleshooting or replacing the BBU, go to the Array Manager console and click to expand the Arrays storage object, right-click the PV660F Subsystem storage object, and select Rescan from the context menu that comes up. This action will update the BBU status within the GUI.

Event 863, BBU Recondition suggested

Cause of Problem
Solution

This message will appear each time the controller is (re)booted until the BBU is manually reconditioned. In this state, the battery is still protecting the controller cache, but the maximum level of the battery may be uncertain. To re-establish the maximum level, you can perform a BBU recondition at your convenience using the procedure described in the "Recondition" section. Be aware that the reconditioning process takes several hours. The recondition process must complete its full cycle without interruption. Interrupting the recondition will invalidate the recondition, thus the next invocation will start at the beginning of the cycle. In other words, the recondition process cannot be suspended and then restarted in the middle of the process.

The BBU requires an automatic and a manual recondition before the maximum level will be accurate. Until the first manual recondition is completed, the maximum level from the battery specifications is used. The automatic recondition will start if one of the following event occurs:

After the automatic recondition is complete, Array Manager will show "BBU Recondition Suggested" when the system is rebooted.

After manually reconditioning the BBU, go to the Array Manager console and click to expand the Arrays storage object, right-click the PV660F Subsystem storage object, and select Rescan from the context menu that comes up. This action updates Array Manager on the status of the BBU.

If the "BBU Recondition Suggested" error message continues to be displayed after you have manually reconditioned the BBU twice, then the battery may be defective and require replacement. In this case, see the Dell PowerVault 660F and 224F Storage Systems Service Manual for part replacement procedures.

Drive Status Events

The following incidents are included in this topic:

Procedure for Replacing a Drive

Many of the incidents in this section can be resolved by replacing one or more drives. The procedure for replacing a drive is as follows:

Event 702, Hard disk error found

Cause of Problem
Solution

If the virtual disk is offline, try forcing it online with the Force Online command. Right-click the disk and select Force Online from the context menu that appears. See "Force Online" in the chapter "The Dell PowerVault 660F Storage System" for detailed instructions.

If you cannot force the virtual disk online, remove and replace the affected hard drive. See the topic "Procedure for Replacing a Drive" at the beginning of this section for instructions on replacing a drive.

If you still get a hard disk error after replacing the drive, contact customer service.

Event 703, Hard disk PFA condition found

Cause of Problem

A hard disk predicted a future failure condition. This disk may fail soon.

Solution

Replace and rebuild the drive. See the topic "Procedure for Replacing a Drive" at the beginning of this section for instructions on replacing and rebuilding a drive.

Event 707, Rebuild is cancelled

Cause of Problem
Solution

If the rebuild was cancelled, that disk will remain in an unusable state until a successful rebuild has been performed on it.

See "Rebuild" in the chapter "The Dell PowerVault 660F Storage System" for instructions.

Event 709, Rebuild stopped with error: New device failed

Cause of Problem
Solution

Try to rebuild the drive again. See "Rebuild" in the chapter "The Dell PowerVault 660F Storage System" for instructions.

If the problem persists, replace the disk drive. It may be helpful to look at the other events that were generated in order to identify the malfunctioning drive.

See the topic "Procedure for Replacing a Drive" at the beginning of this section for instructions on replacing a drive.

Event 710, Rebuild stopped because logical drive failed

Cause of Problem
Solution

It may not be possible to recover from this error; therefore, you may lose your virtual disk.

Try to rebuild the virtual disk. See "Rebuild" in the chapter "The Dell PowerVault 660F Storage System" for instructions.

If the problem persists, contact customer service.

Event 711, A hard disk has failed

Cause of Problem

A hard disk failed because the user either changed the status to Offline or removed the hard disk.

Solution

A drive is usually manually taken offline to replace it. If the drive was physically removed from the enclosure, replace and rebuild the drive (using a drive at least as large as the other disk drives in the virtual disk).

See the topic "Procedure for Replacing a Drive" at the beginning of this section for instructions on replacing and rebuilding a drive.

Event 718, SCSI command timeout on hard device

Cause of Problem
Solution

If the drive has been removed or has failed, replace the drive.

If the time-out cannot be reset on the existing array disk, replace the disk.

See the topic "Procedure for Replacing a Drive" at the beginning of this section for instructions on replacing and rebuilding a drive.

It may be necessary to do a complete reboot after the drive is replaced.

If you need more help, contact customer service.

Event 721, Parity error found

Cause of Problem
Solution

Check all cables, making sure they are correctly and firmly connected and that none are crossed. See the Dell PowerVault 660F and 224F Storage Systems Installation and Troubleshooting Guide for cabling procedures.

Replace the affected disk. See the topic "Procedure for Replacing a Drive" at the beginning of this section for instructions on replacing and rebuilding a drive.

Event 749, Physical device status changed to Offline

Cause of Problem

The array disk encountered too many errors, causing the drive to fail and its status to change to Offline.

Solution

It is not possible to recover this physical drive. Replace the disk drive. See the topic "Procedure for Replacing a Drive" at the beginning of this section for instructions on replacing and rebuilding a drive.

If the replacement drive still does not work, contact customer service.

Event 753, Physical device failed to start

Cause of Problem

Drive failed to spin up during controller bootup.

Solution

Check that the new array disk is seated properly. If not, remove and reinsert the disk.

If the problem persists, see the Dell PowerVault 660F and 224F Storage Systems Installation and Troubleshooting Guide on how to troubleshoot the Fibre Channel hard disk drives.

When completed, go to the Array Manager console and click to expand the Arrays storage object, right-click the PV660F Subsystem storage object, and select Rescan. This action will update the array disk status within the GUI.

Event 756, Physical drive missing on startup

Cause of Problem

A previously configured array disk no longer appears in the Array Manager GUI.

Solution

Make sure that all enclosures are powered on.

Remove and reinsert the physical drive. Then go to the Array Manager console and click to expand the Arrays storage object, right-click the PV660F Subsystem storage object, and select Rescan to verify that the replacement drive has been recognized.

If the drive is still missing or not found, try replacing the drive. See the topic "Procedure for Replacing a Drive" at the beginning of this section for instructions on replacing and rebuilding a drive.

If the replacement drive does not work, contact customer service.

Event 758, Physical drive is switching from one channel to another

Cause of Problem

Communication to a drive on a particular channel has failed.

Solution

If this event appears for all existing drives, then a Loop ID problem may be present. See the Dell PowerVault 660F and 224F Storage Systems Installation and Troubleshooting Guide on how to troubleshoot the I/O module.

If the drive has failed, replace the drive. See the topic "Procedure for Replacing a Drive" at the beginning of this section for instructions on replacing and rebuilding a drive.

If the replacement drive does not work, contact customer service.

Event 765, Consistency check on logical drive error

Cause of Problem
Solution

Try performing a Consistency Check again. If the problem persists, replace the disk drive. It may be helpful to look at the other events that were generated in order to identify the malfunctioning drive. See the topic "Procedure for Replacing a Drive" at the beginning of this section for instructions on replacing and rebuilding a drive.

Event 766, Consistency check on logical drive failed

Cause of Problem
Note A virtual disk that is Critical will be shown as Degraded status within the Array Manager console.
Solution

Try performing a consistency check again. If the problem persists, replace the disk drive(s). It may be helpful to look at the other events that were generated in order to identify the malfunctioning drive(s). See the topic "Procedure for Replacing a Drive" at the beginning of this section for instructions on replacing and rebuilding a drive.

Event 767, Consistency check failed due to physical device failure

Cause of Problem

An array disk failed.

Solution

Replace the affected disk. See the topic "Procedure for Replacing a Drive" at the beginning of this section for instructions on replacing and rebuilding a drive.

Event 768, Logical drive has been made offline

Cause of Problem

If you have a non-fault-tolerant virtual disk, a single array disk failure may have caused the virtual drive to go offline. If you have a fault tolerant virtual disk, multiple array disk failures may have caused the virtual drive to go offline.

Solution

Try making the virtual drive Online.

Verify through the LED lights that power is supplied to the enclosure.

Identify the location of the failed drive(s). If necessary, refer to the Dell PowerVault 660F and 224F Storage Systems Installation and Troubleshooting Guide.

Replace the array disk(s) if necessary. See the topic "Procedure for Replacing a Drive" at the beginning of this section for instructions on replacing and rebuilding a drive.

It may not be possible to recover from this error. Contact customer service.

Event 769, Logical drive is Critical

Cause of Problem

One fault tolerant virtual disk has been degraded.

Solution

Replace the array disk. See the topic "Procedure for Replacing a Drive" at the beginning of this section for instructions on replacing and rebuilding a drive.

Event 781, Logical drive initialization failed

Cause of Problem
Solution

Reinsert the controller and power on the system and/or the controller.

Initialize the virtual disk manually. See "Initialize" in the chapter "The Dell PowerVault 660F Storage System" for instructions.

When completed, go to the Array Manager console and click to expand the Arrays storage object, right-click the PV660F Subsystem storage object, and select Rescan to verify that the virtual disk has been initialized and is recognized.

Replace the array disk. See the topic "Procedure for Replacing a Drive" at the beginning of this section for instructions on replacing and rebuilding a drive.

Event 786, Expand Capacity stopped with error

Cause of Problem
Solution

Replace the array disk. See the topic "Procedure for Replacing a Drive" at the beginning of this section for instructions on replacing and rebuilding a drive.

See "Expand Capacity" in the chapter "The Dell PowerVault 660F Storage System" for more information.

Event 787, Bad blocks found

Cause of Problem

A bad sector was found on an array disk during one of the following operations: consistency check, rebuild, or RAID expansion.

Solution

For information on consistency check and rebuild functions, see "Check Consistency" and "Rebuild" in the chapter "The Dell PowerVault 660F Storage System."

If the problem persists, replace the hard disk. See the topic "Procedure for Replacing a Drive" at the beginning of this section for instructions on replacing and rebuilding a drive.

Error Messages

This section describes error messages for the PowerVault 660F storage subsystem that may be displayed by Array Manager.

Degraded, LS module chip missing or failed

In addition to a missing or failed LS module chip, this error message may also be displayed when the SES firmware of the active LS module is version 1.9.5 while the SES firmware for the failover LS module is version 1.3b2. The SES firmware for both LS modules should be at the same level. You can resolve this error by upgrading the SES firmware so that both LS modules have the same SES firmware version.

Unsafe Removal of Device

When assigning virtual disk 0 (LUN 0) to a Windows server, a Windows "Unsafe Removal of Device" message may display for the PV660F PSEUDO SCSI Disk Device.

You can disregard the Disk Removed message. This is caused by the PV660F Pseudo Disk (used to manage the Fibre Channel array when no virtual disks are assigned) being replaced by the newly assigned virtual disk.

Disk Removed

A Windows Disk Removed dialog may display on the server under the following conditions:

You can disregard the Disk Removed message. This is caused by the PV660F Pseudo Disk (used to manage the Fibre Channel array when no virtual disks are assigned) being replaced by the newly assigned virtual disk.

Other Problems

This section contains the following general problem situations:

How to Detect, Recondition, and Replace a Failed Battery Backup Unit

The following sections describe how to troubleshoot problems with the PowerVault 660F battery backup unit (BBU).

How to Detect a Failed BBU

The PowerVault 660F generates error messages that indicate the battery is either not installed, not recognized, or too low on power to function properly. Here are a few examples of the types of event messages you may receive:

How to Recondition a Failed BBU

Certain conditions require that the BBU needs to be reconditioned before it can be used. If the power went off or if you are installing a new controller with a new BBU, then the BBU needs to be reconditioned, sometimes referred to as recharged. However, before you start a recondition process, you need to be aware that this operation cannot be interrupted for any reason. Do not try to initiate a fast charge or another recondition cycle, once you have started a recondition process. If for any reason, there is an interruption in the cycle, the process will need to be completely restarted.

To recondition a battery, right-click on a controller, click on C0 Battery or C1 Battery, and click on Recondition.

How to Replace a Failed BBU

When all attempts to recharge or recondition the BBU have failed, it is time to replace it. However, before you remove a BBU, it is best to put the system into a ready for Shutdown State. Then replace the component and clear the Shutdown State to re-enable caching. The BBU is contained in the Fan CRU (Dell/Euro) and can be replaced separately from the controller without shutting down the system.

Note Replacing a BBU or a controller can be done in a "hot" condition-you can leave the main system running and shut off power to or disable only the component being replaced.

For more information, refer to "Battery Properties."

How to Detect and Replace a Failed Controller

The PowerVault 660F has two controllers. Each controller is encased in its own canister and each controller is on one side of the enclosure. If you receive an event notice that indicates a controller has failed or is not responding, there are certain steps to follow to shutdown and replace the bad controller while maintaining accessibility and functionality to your system. If one controller goes down, the other controller will handle the same functions, (fault-tolerance with the controllers) although performance will be degraded until the bad controller is replaced and brought on line.

Causes of Controller Failure
Solutions

Before performing any of these procedures, be sure the good controller is functioning at full performance. You may now remove the faulty controller (pull it out of the enclosure).

Note You may test the faulty controller by removing it and inserting it into another system for diagnostics. In any case, be sure that the good controller is working. Replace the bad controller with a new one as soon as possible.
Note Do not use the power cycle as a preparation for a hot swap. You will lose access to data if the system is currently online through the surviving controller.
World Wide Name (WWN) Change

Upgrading to PowerVault 660F RAID controller firmware version 7.7 or higher can change the controller's World Wide Name (WWN) if the controller is at a version prior to 7.7. When the WWN changes, you may not be able to see the storage with Array Manager or another management application. Resolving this problem can include updating pathing of the FC HBA, as well as FC switch zoning configuration if zoning is being performed by WWN. See the latest SAN or PowerVault 660F documentation that came with your system or that is available from the Dell support site at http://support.dell.com for information on editing the FC switch zone table and resolving problems associated with a new WWN.

How to Detect and Replace a Failed Disk Drive and Restore a System to a Fault-tolerant Condition

The following sections describe how to troubleshoot problems with a failed drive on the the PowerVault 660F storage system.

Detect and Replace a Failed Disk Drive

If you receive an event notice that indicates a disk drive has failed or is not responding, there are certain steps to follow to help detect which disk drive has failed and how to do a spindown. There are several disk drives installed in the enclosure, so that you can configure them to your system requirements. You can use the Blink Disk command to locate a failed disk.

Restore a System to a Fault-tolerant Configuration

After the faulty disk drive has been replaced, it needs to be restored to full operation. Do this by applying the rebuild and rescan functions. Fault-tolerant is known as mirror technology in RAID configurations, commonly used as RAID 5, requiring three or more disk drives in an array. For more information, see "Dell PowerVault 660F Storage System Configuration Overview."

Cause of Problem

Other error (event) messages that may indicate a disk drive problem:

Solutions

Before you remove a physical disk, you must consider the logical drives that are mapped to the disk. If the drive is not marked dead, and it is configured, do not remove it, or data will be lost. It is imperative that only the known failed disk drive is the one removed. If a second drive fails for any reason, data will be lost.

Logical Drives: A logical drive is a partition you create within an extended partition on a basic disk. A logical drive can be formatted and assigned a drive letter. Only basic disks can contain logical drives. A logical drive cannot span multiple disks. For more information on basic disks, basic volumes, and extended partitions, see the chapters on "Disk Management" and "Volume Management."

If Automatic Rebuild Management (ARM) is enabled, then when the firmware detects a failed drive, it will look for a spare. If a suitable spare is found, then the failed drive is removed from all logical drives that it was part of, the spare is inserted into those logical drives, and a rebuild is started.

If ARM is not enabled, and the failed drive is removed and replaced, in the SAME SLOT, then the user may issue a manual rebuild start.

In order to use the ARM feature, go to the Advanced Controller Options settings: Right-click on a controller, click on Controller Options, and Click on the Advanced Tab to continue. When the Advanced Tab opens, make your custom changes. For more information, refer to "Advanced Fibre Channel RAID Controller Options."

Use the following procedures related to locating and removing the failed drive:

Cannot add a hot spare

Cause of Problem

The Assign Global Hot Spare function will be selectable only when disks are available within the system.

Solution

Add drives to the system. See the topic Dell PowerVault 660F and 224F Storage Systems Installation and Troubleshooting Guide for instructions on adding new drives.

After adding the new drives, go to the Array Manager console and click to expand the Arrays storage object, right-click the PV660F Subsystem storage object, and select Rescan.

This action will update the drive status within the GUI. You are now ready to create a hot spare.

Cannot expand a disk group

Cause of Problem
Solution

Make sure that the situation you have supports the Expand Capacity command. See "Expand Capacity" in the chapter "The Dell PowerVault 660F Storage System" for more information.

After creating a virtual disk, cannot locate it in the Disks folder

Cause of Problem

A virtual disk has not been made visible to the operating system.

Solution

For a direct connect, reboot the system to recognize the virtual disk.

For a SAN connection, two steps must be performed after creating a virtual disk before the virtual disk will appear in the Disks folder.

Once these two steps have been performed, the virtual disk will appear in the Disks folder of the Array Manager console.

Cannot create a volume on an NT disk

Cause of Problem

All of the available space on the NT disk has already been used in the creation of one or more volumes for that disk. A volume cannot be created on an NT disk when there is no available used space.

Solution

Add a new virtual disk or expand the capacity of the existing array disk. See "Add Virtual Disk" and "Expand Capacity" in the chapter "The Dell PowerVault 660F Storage System" for more information.

Cannot delete a volume

Cause of Problem

A volume that has been marked as a primary partition cannot be deleted. Primary partitions are protected because they contain a bootable operating system.

Solution

To override this protection feature, delete the virtual disk that the primary partition belongs to. See "Delete Virtual Disk" in the chapter "The Dell PowerVault 660F Storage System" for detailed instructions.

Cannot delete a virtual disk

Cause of Problem

Only the last virtual disk that was created can be deleted.

Solution

Delete the virtual disks in the reverse order that they were created. See the topic "Delete Virtual Disk" in the chapter "The Dell PowerVault 660F Storage System" for detailed instructions.

Firmware version mismatch

Cause of Problem

This problem may occur in the following situations:

Solution: Firmware mismatch warning after downloading SES firmware

Rescan the PV660F storage system.

Solution: The partner controller has different firmware

Both controllers must have the same version of firmware to operate in a redundant configuration. If a failed controller is replaced with a controller with a different version of firmware, the replacement controller will not be allowed to start and will be disabled by the existing controller.

Use the following procedure to download a common firmware image:

  1. Check the firmware version of the existing controller. See "Fibre Channel RAID Controller Properties" in the chapter "The Dell PowerVault 660F Storage System."

  2. Power off the subsystem. See the Dell PowerVault 660F and 224F Storage Systems Service Manual.

  3. Remove the existing controller. See the Dell PowerVault 660F and 224F Storage Systems Service Manual.

  4. Insert the replacement controller. See the Dell PowerVault 660F and 224F Storage Systems Service Manual.

  5. Power on the subsystem. See the Dell PowerVault 660F and 224F Storage Systems Service Manual.

  6. Check the firmware version of the replacement controller. See "Fibre Channel RAID Controller Properties" in the chapter "The Dell PowerVault 660F Storage System."

  7. If the replacement controller has an older version of firmware, download the same firmware version as the that on the existing controller. This may not be the latest version of firmware available.

  8. If the replacement controller has more recent firmware, power off the subsystem, exchange controllers, and download the newer firmware to the existing controller.

  9. Downgrading firmware versions is not recommended. See "Fibre Channel RAID Controller Properties" in the chapter "The Dell PowerVault 660F Storage System."

  10. Insert the second controller. See the Dell PowerVault 660F and 224F Storage Systems Service Manual.

  11. Issue Enable Partner if the second controller is in the Disable Partner mode (held in reset). See "Enable Partner" in the chapter "The Dell PowerVault 660F Storage System."

  12. Once both controllers have the same firmware version, the latest version of firmware can be downloaded to both controllers at the same time. See "Fibre Channel RAID Controller Firmware" in the chapter "The Dell PowerVault 660F Storage System."

Note If an empty enclosure is available, the firmware can be downloaded to the replacement controller without having to power off the subsystem. However, this will not work if the replacement controller has more recent firmware.

Unable to manage objects below the controller object

Cause of Problem

The Enclosure Management Advanced controller option is disabled.

Solution

Change the Enclosure Management Advanced controller option to enabled. See the "Advanced Fibre Channel RAID Controller Options" section of the "The Dell PowerVault 660F Storage System" chapter for details.

If the Enclosure Management Advanced controller option is already enabled, perform a Rescan on the PV660F Subsystem storage object. If this isn't successful, perform a Reset on the controller object. See "Reset" in the "The Dell PowerVault 660F Storage System" chapter for details.

No objects appear under Physical Array and Logical Array

It is possible that the Fibre Channel array might not discover some of the PV660F Fibre Channel arrays if any of the controllers are not fully rebooted.

To resolve this problem, right-click PV660F SubSystem and select Rescan.

Unable to reset the PowerVault 660F storage system

Cause of Problem

Either the controller has activity occurring, or the queue limit is set to zero.

Solution

Verify that no I/O is occurring to the PowerVault 660F array. You can verify this by viewing the activity LEDs on the controllers.

Verify that no controller activity is occurring. Virtual disk initializations, Expand Capacity, or any other controller activity will prohibit resets from occurring.

Verify that the Queue Limit in the controller options is not set to zero. If set to zero, reset it to a higher value (such as 32).

Application Transparent Failover autofails all LUNs on the PV650F

Starting the Array Manager service or doing a rescan when the Qlogic HBA to switch cable failure occurs on one of the servers in a SAN fabric may cause the Application Transparent Failover to autofail all LUNs on the PV650F that are assigned to other servers.

To resolve this problem, disable the PV660F management on the server with HBA failure and enable PV660F management on another server in the SAN until the cable failure is resolved.


Back to Contents Page