Dell OpenManage Array Manager 3.4
Common Troubleshooting Procedures
Problem Situations and Solutions
Dell PowerVault 660F and 224F Storage Systems Troubleshooting
This chapter contains status message information, troubleshooting procedures, and common problems and solutions. It also has a separate section for troubleshooting the Dell PowerVault 660F and 224F storage systems.
If a disk or volume fails, it is important to repair the disk or volume as quickly as possible to avoid data loss. Because time is critical, Array Manager makes it easy for you to locate problems quickly. In the Status column of the list view, you can view the status of a disk or volume. The status also appears in the graphical view of each disk or volume. If the status is not Healthy for volumes or Online for disks, use the status information to determine the problem and then fix it.
There are also various troubleshooting procedures for disks, volumes, and arrays.
Topics include:
One of the following disk status descriptions will always appear in the Status column of the disk in the right pane of the console window. If there is a problem with a disk, you can use this troubleshooting chart to diagnose and correct the problem
.
These definitions appear in the Status line and indicate the condition of array disks.
The following sections describe disk troubleshooting procedures:
See also the following sections for these and other troubleshooting procedures:
One of the following volume status descriptions will always appear in the graphical view of the volume and in the Status column of the volume in list view. If there is a problem with a volume, you can use this troubleshooter to diagnose and correct the problem.
The following sections describe common volume troubleshooting procedures:
See also the following sections for these and other troubleshooting procedures:
This section describes commands and procedures that can be used in troubleshooting. Topics covered include:
Verify that the power-supply cord and adapter cables are attached correctly. If the system is having trouble with read and write operations to a particular array (if the system hangs, for example), then make sure that the SCSI cables attached to the array are secure. If the connection is secure but the problem persists, you may need to replace a cable. See also the "Isolate SCSI device problems" section.
Make sure that the system meets all system requirements as described in the readme.txt file located in the installation directory. In particular, verify that the correct levels of firmware and drivers are installed on the system. For more information on drivers and firmware, see the "Drivers and Firmware" section.
Array Manager is tested with the supported controller firmware and drivers. The supported controllers and firmware are listed in the readme.txt file. To avoid possible conflicts or inconsistencies between the controller firmware and drivers, it is recommended that you only use the supported versions. The most current versions can be obtained from the Dell support site at http://support.dell.com.
In a SAN environment, all LS modules in an array should have the same firmware version. When upgrading the firmware on an LS module, make sure to upgrade the firmware on the other LS modules at the same time.
It is also recommended to obtain and apply the latest Dell PowerEdge Server System BIOS on a periodic basis to benefit from the most recent improvements. Please refer to the Dell PowerEdge system documentation for more information.
If you receive a "timeout" event related to a SCSI device or if you otherwise suspect that one of the SCSI devices is experiencing a hardware failure, then do the following to confirm the problem:
Use Rescan to update disk information. This operation may take a few minutes if there are a number of devices attached to the system. You will see a message "Getting hardware configuration. Please wait." while the rescan is occurring.
If this does not properly update the disk information, you may need to reboot your system.
The Check Consistency function determines the accuracy of mirrored data and parity information. When necessary, this feature rebuilds the parity information. For more information, see the following sections:
Caution When reactivating a volume, be aware that the volume's data is restored, even if it is stale, corrupt, or out-of-date. See "Reactivate a Dynamic Volume" for more information on the consequences of reactivating a volume. |
A RAID 5 volume's status can appear as Failed Redundancy and the disk's status is Offline. The disk's name may be Missing, and an error icon (X) appears on the missing or offline disk. In this case, do the following.
Caution When reactivating a volume, be aware that the volume's data is restored, even if it is stale, corrupt, or out-of-date. See "Reactivate a Dynamic Volume" for more information on the consequences of reactivating a volume. |
Reactivating a volume attempts to restart all volumes regardless of the volume's state. If data corruption exists, you can reactivate the volume and then run the chkdsk utility. However, in the case of a mirrored or RAID-5 volume, reactivating a volume with stale data can cause that data to be used when it is inaccurate.
Reactivating a volume should be done only if you understand that the volume's data, which might be corrupted, will be restored. For example, if one mirror in a mirrored volume fails and data is written to the remaining mirror, the data is now out of sync. Then, if the remaining mirror (the one with accurate data) fails and the first mirror is reactivated, the stale data becomes "real" data.
For this reason, it is important to act on data failures as soon as possible. You should use care when reactivating volumes.
Make sure that the underlying physical disk is turned on, plugged in, and attached to the computer. No other user action is possible for basic volumes unless the volumes are mirrored or RAID-5 volumes that were originally created in NT Disk Administrator. The repair of these volumes is covered in the next topic.
Caution When reactivating a volume, be aware that the volume's data is restored, even if it is stale, corrupt, or out-of-date. See "Reactivate a Dynamic Volume" for more information on the consequences of reactivating a volume. |
There are particular considerations regarding dynamic disks and volumes on NetWare, Windows Server 2003, and Linux. See "Dynamic Disk and Volume Support on NetWare, Windows Server 2003, and Linux" for more information.
You should be able to repair a RAID-5 volume if it is in a state of Failed Redundancy, and if there is unallocated space on another dynamic disk available. To avoid data loss, you should attempt to repair the volume as soon as possible.
Use Microsoft Windows NT Disk Administrator to repair basic mirrored or RAID-5 volumes if you are running Windows NT 4.0. For Windows 2000, there is a command available form the context menu for repairing basic mirrored or RAID-5 volumes.
Caution In Windows NT 4.0, Disk Administrator should never be used while Array Manager is running, especially if there are tasks running on the controller at the time. Data loss can occur if both applications are running simultaneously. |
After deleting a basic disk, the drive letter used by that disk may no longer be available. To correct this problem, reboot the server.
Drive mapping may not work properly on Windows NT and Windows 2000 systems with PERC 3/DC, PERC 3/DCL, PERC 3/QC, PERC 2/DC, PERC 3/SC, PERC 2/SC, PERC 4/SC, PERC 4/DC, PERC 4/Di, PERC 4/IM, and CERC ATA100/4ch controllers. After creating a virtual disk on these controllers, the disk may not be visible in the disk folder until the system is rebooted. After rebooting the system, the mapping between the newly created disk and the corresponding Windows NT or Windows 2000 disk may not be displayed in the Array Manager console.
Solution for Windows NT:
After creating a virtual disk and rebooting the system, do a console rescan by either clicking the Rescan button or selecting Rescan from the View pull-down menu.
Solution for Windows 2000:
When using a PERC 2/SC or 2/DC controller, upgrade your driver to MRAID 35X.SYS version 2.68 or later.
If the drive that you mistakenly removed is part of a redundant virtual disk that also has a hot spare, then the virtual disk rebuilds automatically either immediately or when a write request is made. After the rebuild has completed, the virtual disk will no longer have a hot spare since data has been rebuilt onto the disk previously assigned as a hot spare. In this case, you should assign a new hot spare.
If the drive that you removed is part of a redundant virtual disk that does not have a hot spare, then replace the drive and do a rebuild.
See the following sections for information on rebuilding drives and assigning hot spares:
You can avoid removing the wrong drive by blinking the LED display on the drive that you intend to remove. See the following sections for information on blinking the LED display:
If you upgrade the Windows operating system on a server, you may find that Array Manager no longer functions after the upgrade. The installation process installs files and makes registry entries on the server that are specific to the operating system. For this reason, changing the operating system can disable Array Manager.
To avoid this problem, you should uninstall Array Manager before upgrading. If you have already upgraded without uninstalling Array Manager, however, you should uninstall Array Manager after the upgrade.
After you have uninstalled Array Manager and completed the upgrade, reinstall Array Manager using the Array Manager install media. You can download Array Manager from the Dell support site at http://support.dell.com.
This section contains additional trouble-shooting problem areas. Topics include:
Note If you are using the Dell PowerVault 660F storage system and the PowerVault 224F enclosure, see "Dell PowerVault 660F and 224F Storage Systems Troubleshooting," for additional issues specific to the PowerVault 660F storage system and PowerVault 224F enclosure. |
A rebuild will not work in the following situations:
Check:
Check:
Check:
Situation:
Windows is not aware of the status of these disks. Most likely, the virtual disks that were associated with these have been deleted.
Check:
To remove these error status icons from the Disks object, the computer must be restarted to allow Windows to find the current information.
Situation:
If the type of disk shows No Signature, you need to write a signature to the disk. When creating a new virtual disk, the software must write a signature to the virtual disk that prepares it for use. This signature is not written automatically in case this disk has been merged from another operating system and the configuration information needs to be kept intact.
Check:
For instructions on writing a disk signature, see the section "Write a Disk Signature" in the "Disk Management" chapter.
The corresponding virtual disk has been removed, or the disk has been rendered inactive because of a problem.
Check:
Once you have repaired the disk, controller, or cable problem, you need to:
If the system is hanging, timing out, or experiencing other problems with read and write operations, then there may be a problem with the adapter cables or a SCSI device. For more information, see the "Cables attached correctly" and "Isolate SCSI device problems" sections.
If you install a PERC 2/SC or 2/DC controller after you have already installed Array Manager, you may experience problems with Drive mapping, system hangs, and other performance problems. Reinstall Array Manager to resolve these problems.
If you have implemented channel redundancy on a PERC 2/SC, 2/DC, 3/SC, 3/DCL, 3/DC, 3/QC, 4/SC, 4/DC, or 4/Di controller, a failure of one channel causes I/O to stop on the other channels included in the channel redundant configuration. The resolution to this problem is described in the "Considerations for Implementing Channel Redundancy" section.
The full message is: "The connection to the remote computer has terminated. Remote computer will be removed from view." The remote computer that you were connected to has been disconnected from your console. Most often, there is a problem with the network connection and the transmissions timed out. This can also occur if the remote machine was restarted or the service on the remote machine was stopped.
Check:
Make sure that the remote machine is turned on and is available to the network, and that the service is started. Reconnect to the resource.
Array Manager displays "The stripe depth is out of range" error message when you attempt to apply a RAID 0 or RAID 5 to more array disks than the controller can support in a single virtual disk. For example, the PERC 4/SC and 4/DC controllers can support up to 32 array disks in a virtual disk when using RAID 0 or RAID 5. Attempting to create a RAID 0 or RAID 5 using more than 32 array disks on these controllers will cause this error message to be displayed.
The installation detects any drivers that you have installed for PowerEdge RAID controllers. If these drivers (and/or the card itself) are installed after the software is installed, support for the controller will need to be added.
Check:
Close the console. Open the Array Manager Utilities and check the box next to the appropriate controller. This action will restart the service, and the disks should be available the next time you launch the console.
When an operation is inactive or dimmed in a menu, the task cannot be performed on the object at this time. Certain operations are valid only for certain types of objects. (For example: RAID levels that are not fault tolerant will not allow you to check the consistency of the virtual disk.) If there is a task currently running on that object, wait until it has finished and try again. Otherwise, the operation may not be appropriate at this time.
If this was a virtual disk, then check that the virtual disk still exists. If it no longer exists, use the Remove Disk command to remove the disk from the list of disks.
Repair any disk, controller, or cable problems and make sure that the physical disk is turned on, plugged in, and attached to the computer. From the View pull-down menu, select Rescan. The disk should change from Offline to Online, but the volumes remain Failed. (If they do not change to Online, you may need to reboot.) Right-click the disk and select Reactivate Disk. The disk status changes to Healthy. (You can also select each volume one at a time and select Reactivate Volume. It is recommended you do a chkdsk.
Caution When reactivating a volume, be aware that the volume's data is restored, even if it is stale, corrupt, or out-of-date. See "Reactivate a Dynamic Volume" for more information on the consequences of reactivating a volume. |
If the disk status remains Offline and Missing and you determine that the disk has a problem that cannot be repaired, you can remove the disk from the system (using the Remove Disk command). However, before you can remove the disk, you must delete all volumes on the disk. You can save any mirrored volumes on the disk by removing the mirror that is on the Missing disk instead of the entire volume. Deleting a volume destroys the data in the volume, so you should remove a disk only if you are absolutely certain that the disk is permanently damaged and unusable.
Use the Reactivate Disk command to bring the disk back online. If the disk status remains Offline, check the cables and disk controller, and make sure that the physical disk is healthy. Correct any problems and try to reactivate the disk again. If the disk reactivation succeeds, any volumes on the disk should automatically return to the Healthy status.
When you do a Prepare to Remove command on an array disk attached to a PERC 4/Di controller, you may find that the disk does not display in the Array Manager tree view even after doing a rescan or a reboot.
In this case, do the following to redisplay the disk in the Array Manager tree view:
The disk has been moved to your computer from another Microsoft Windows NT/2000 computer and has not been set up for use. Only dynamic disks display this status. To add the disk so that it can be used, right-click the disk and select Merge Foreign Disk. All existing volumes on the disk will be visible and accessible.
Because a volume can span more than one disk (e.g., a mirrored volume), it is important that you first verify your disk configurations and then move the entire disk set that the volume is on. If only part of the disk set is moved, some of the volumes will show Failed Redundancy or Failed error condition.
If you initialize a virtual disk that has been upgraded to dynamic, the status of the dynamic disk may change to "offline" or "foreign." You can view a disk's status by selecting the disk's General tab. When using a PERC 2/SC, 2/DC, 3/SC, 3/DCL, 3/DC, 3/QC, 4/SC, 4/DC, 4/Di, or CERC ATA100/4ch controller, you can resolve this problem by reverting the "offline" or "foreign" disk to a basic disk. See "Reverting a Dynamic Disk to Basic."
Because initializing a virtual disk destroys the data on the virtual disk, you may find that after initializing a virtual disk on a Windows system, a disk marked as "foreign" or "missing" is displayed under the Disks folder. In addition, initializing a virtual disk containing a dynamic disk changes the status of the dynamic disk to foreign or offline.
To reuse a Windows disk that is set to foreign or offline, right-click the disk and select Merge Foreign Disk or Revert to Basic Disk from the pop-up menu.
If the disk is marked as missing, right-click the disk and select Remove Disk.
If you format a virtual disk that has been upgraded to dynamic, the disk functions may become inactive or the status of the disk may change to "foreign" or "offline." You can view a disk's status by selecting the disk's General tab. When using a PERC 2, 2/Si, 3/Si, and 3/Di controller, you can resolve these problems by doing a global rescan.
The Help file uses a technology known as HTML Help, a Microsoft standard. Some software will attempt to update the core files with an older version of HTML Help and make Array Manager's Help file unusable. The required HTML Help update is located on the Array Manager CD-ROM in the Help Update folder. Double-click HHUPD.EXE and follow the instructions.
HH is Microsoft's HTML Help format, which reads precompiled HTML files for Array Manager's Help sections.
Check:
Delete the HH.DAT file in your Windows directory. Deleting this file will remove any customizations that have been made to your HTML help files.
Let autocheck run, but do not worry about the message. Autocheck will finish and the reboot will be complete. If you have a large system (more than 1 gigabyte), this may take about 10 minutes.
There are several situations where this occurs.
This occurs when you log in to the local computer originally as a local user, local administrator, or domain user and the remote computer is not in your domain or a trusted domain. The Windows security model does not allow you to have access under these circumstances. The workaround is to log in to your local computer with an account that has the same user name and password as an administrator account on the remote computer.
Access can be denied here if you do not type in a user name and password that match a local or domain administrator account on the remote computer or if you mistype the login information.
If the remote computer is not on or there are network problems, you will get the message "Connection Failed."
For a NetWare system, refer to "The Connection Failed" message displays when connecting to the NetWare server.
Another situation where you may get an error message is when you have just done a client-only installation of Array Manager and you bring up the Array Manager client and attempt to connect to a remote server that has Windows 2000 Disk Management.
Array Manager assumes that its client will connect first to a remote server running Array Manager before connecting to a system running Windows 2000 Disk Management.
Once you connect to a server with Array Manager, you will then be able to connect successfully to a remote system running Disk Management.
Windows 2000 Disk Management is the disk and volume management program that comes with Windows 2000. Because Array Manager and Disk Management are related programs, Array Manager is able to remotely manage the storage on a Windows 2000 computer with Disk Management.
If you are having problems connecting to a NetWare® server from a local machine, use the ping and nslookup TCP/IP network diagnostic tools to determine whether the NetWare server is accessible from the local machine and whether the system running the NetWare server has a valid DNS name. If the system running the NetWare server does not have a valid DNS name, then you can edit the Hosts file on the local machine with an entry for the system running the NetWare server. The Hosts file is located in the winnt/system32/drivers/etc directory. The entry in the Hosts file should consist of the IP address and the host or server name of the system running the NetWare server.
If you do not connect by using a valid DNS name or an entry in the Hosts file, then you will need to use the IP address.
When you want to connect to a NetWare server, Array Manager expects the server to be identified by one of three types of entries:
If you identify the name of the machine by a NetWare server's name that is not one of the three items above, the connection will fail. It is suggested that the name assigned to the NetWare server be the same name as its DNS or Hosts file entry.
Note that the DNS and Hosts file entries do not allow for a computer name that consists of all numbers. In addition, the DNS name does not allow a computer name that starts with a number. If the NetWare server has a numeric name or a name that starts with a number, you can use the IP address to identify that server. You can also put quotation marks around the computer's name for the entry in DNS or the Hosts file (such as "12345").
The Hosts file has to be on the client computer (local machine) that has the Array Manager console.
In addition, connecting to a remote system requires that you have administrator authority on both the local and remote system.
Note Dell does not offer NetWare in Japan. |
If there are no virtual disks configured at boot time on a PERC 2/SC, 2/DC, 3/SC, 3/DCL, 3/DC, 3/QC, 4/SC, 4/DC, 4/Di, or CERC ATA100/4ch controller on Windows 2000, the Windows disk driver may not be loaded. The solution is to reboot after creating the first virtual disk or create the first virtual disk in the bios (use Ctrl-m to invoke the BIOS utility).
If you are trying to connect to the NetWare server with the Array Manager console, you may receive a "The Connection Failed" error message. There can be a variety of reasons for why the connection between the Array Manager console and the NetWare managed system fails. (See also "Connection Failed" message.)
To identify why the connection failed, perform the following steps:
Note You may be able to avoid this connection problem by using the NetWare server's IP address instead of the server. |
Note Dell does not offer NetWare in Japan. |
When a system in a cluster attempts to rebuild a failed disk but the rebuild fails, then another system takes over the rebuild. In this situation, you may notice that the rebuilt disk continues to be marked as failed on both systems even after the second system has rebuilt successfully. To resolve this problem, perform a rescan on both systems after the rebuild completes successfully.
Activating the Windows hibernation feature may cause Array Manager to display erroneous status information and error messages. This problem resolves itself when the Windows operating system recovers from hibernation.
Certain conditions must be met before you can connect to a remote system from Windows Server 2003. For a description of these conditions, see "Remote Connection and Windows Server 2003."
This section describes problems that may deteriorate system performance.
You may notice unusual surges in your system's CPU usage. These surges may be caused by Array Manager's volume capacity monitoring. This function monitors NTFS volumes on the local server for the amount of space used. When the space used on an NTFS volume reaches 90%, a warning event is logged in the Array Manager and Windows event log. When the space used reaches 98%, an error event is logged.
If the surges in CPU usage pose a problem, you can disable volume capacity monitoring.
To disable volume capacity monitoring:
For information related to volume capacity monitoring, see the following:
This section presents possible problem situations with accompanying solutions for the Dell PowerVault 660F and 224F storage systems. The problem situations are organized as follows:
The situations in the first three topics are categorized by their event number. A brief discussion of event messages is included at the beginning of this section in the topic "Event Monitoring and Logging." The fourth topic describes general problems not related to a specific event.
You will also find a full listing of the events associated with the Dell PowerVault 660F Fibre Channel RAID controller at the end of this section in the topic "Events Generated by the PowerVault 660F Storage System."
Event messages help identify significant incidents such as an array disk failure or an array disk addition. Event monitoring and logging starts when the Array Manager managed system starts up. If the managed system service (Disk Management Service) stops in Microsoft Windows NT or the Array Manager Service stops in NetWare, then event monitoring and logging stops. If array disks are S.M.A.R.T. (Self Monitoring Analysis and Reporting Technology) enabled, the RAID controllers check array disks for failure predictions, and if found, pass this information on to the Array Manager console. Array Manager immediately displays an alert icon on the array disk and also raises an alert under the Events tab and in the Windows NT event log. Windows NT has three event logs; Array Manager uses the application log.
Note When a controller's I/O is paused, Array Manager does not receive S.M.A.R.T. events. |
The following incidents are included in this topic:
Because of some unknown error on the Fibre Channel RAID controller, the rebuild has failed.
Try rescanning the controller: from the Array Manager tree view, click to expand the Arrays storage object, right-click the PV660F Subsystem object, and then select Rescan from the context menu that comes up. This action will update the controller status within the GUI.
If the controller has been removed and reinserted, check to see that the controller is inserted correctly: the DB9 connector should be located at the top of the module. For details, see the Dell PowerVault 660F and 224F Storage Systems Service Manual . Also, check that all cable connections are correctly and firmly connected. Try to rebuild again: right-click the Array Disk storage object in the tree view, and then select Rebuild from the context menu that comes up.
If controller and connections are correct and the problem continues, contact customer service.
One of the following may have occurred on one of the Fibre Channel RAID controllers in the PowerVault 660F storage system:
Then access the Array Manager console, click to expand the Arrays storage object, right-click the PV660F Subsystem storage object, and select Rescan from the context menu that comes up. This action will update the controller status.
A memory error was detected during the warm boot scan, which may result in possible data loss.
Power cycle the PV660F subsystem.
If the error persists:
If Enclosure Management has been enabled, check to see whether one or more LS modules has failed. LS modules are cards installed in slots at the front of the enclosure. Each LS module has an SES processor that monitors environmental functions and a Loop Redundancy Circuit (LRC) function, which maintains the viability of the Fibre Channel loop. See the Dell PowerVault 660F and 224F Storage Systems Installation and Troubleshooting Guide for troubleshooting tips.
If the controller entered Conservative Cache mode because of user's intended action, proceed per user's intent. When finished, right-click the controller and select either Enable Partner controller or Enable BBU to exit Conservative Cache mode.
If the BBU battery is low, recondition the battery. If the battery needs to be replaced, see the Dell PowerVault 660F and 224F Storage Systems Service Manual. Then access the Array Manager console, and in the tree view, click to expand the Arrays storage object, right-click the PV660F Subsystem storage object to bring up the context menu, and select Rescan. This action will update the controller.
If there is an Expand Capacity or Add Virtual Disk operation in progress, wait until this activity has finished. Then access the Array Manager console, and in the tree view, click to expand the Arrays storage object, right-click the PV660F Subsystem storage object to bring up the context menu, and select Rescan. This action will update the controller.
See also "Conservative Cache Mode" in the "The Dell PowerVault 660F Storage System" chapter.
Note Nexus refers to the state in which both redundant controllers are in communication. In this state, each controller can copy write-back data to its partner controller and can determine whether the other controller is operating. |
Go to the Array Manager console and click to expand the Arrays storage object, right-click the PV660F Subsystem storage object, and select Rescan from the context menu that comes up. This action will update the partner controller. If the situation does not improve, try one of the following (Rescan as before when necessary after troubleshooting the partner controller):
If none of these solutions apply, contact customer service.
The Media Access Control (MAC) address is corrupted.
Replace the controller according to the Dell PowerVault 660F and 224F Storage Systems Service Manual. Then go to the Array Manager console and click to expand the Arrays storage object, right-click the PV660F Subsystem storage object, and select Rescan from the context menu that comes up. This action will update the controller status.
The Media Access Control (MAC) address has been lost or was not set.
Replace the controller according to the Dell PowerVault 660F and 224F Storage Systems Service Manual. Then go to the Array Manager console and click to expand the Arrays storage object, right-click the PV660F Subsystem storage object, and select Rescan from the context menu that comes up. This action will update the controller status.
Note LS modules are cards installed in slots at the front of the enclosure. Each LS module has an SES processor that monitors environmental functions and a Loop Redundancy Circuit (LRC) function, which maintains the viability of the Fibre Channel loop. |
Check that the Fibre Channel cable is connected to the controller and the switch box. If not, reconnect it as necessary. If the Fibre Channel cable is connected, try replacing the cable. See the Dell PowerVault 660F and 224F Storage Systems Installation and Troubleshooting Guide for troubleshooting the LS and I/O modules.
When troubleshooting is complete, go to the Array Manager console and click to expand the Arrays storage object, right-click the PV660F Subsystem storage object, and select Rescan from the context menu that comes up. This action will update the Fibre Channel.
If problem persists, contact customer service.
The following incidents are included in this topic:
To locate a fan, right-click the bad fan and click Properties. The Enclosure ID field indicates the ID number of the enclosure where this fan is located. Be aware that the enclosure ID number displayed by Array Manager does not match the Enclosure ID set on the switch module ID indicator on the back of the PowerVault 660F or 224F enclosures. When the switch module ID indicators on all the enclosures are configured properly, the enclosure ID numbers start at 0 and continue sequentially through 7. The enclosure ID numbers displayed by Array Manager, however, start at 1 and continue sequentially through 8. For this reason, the enclosure ID number displayed by Array Manager will be one greater than the number set on the switch module ID indicator on the back of the PowerVault 660F or 224F enclosures. For example, if the switch module ID indicator identifies the enclosure as 0, the Array Manager enclosure ID will be 1.
See the Dell PowerVault 660F and 224F Storage Systems Installation and Troubleshooting Guide for information on how to troubleshoot the Advanced Cooling Module (ACM). See the Dell PowerVault 660F and 224F Storage Systems Service Manual for part replacement procedures.
After troubleshooting or replacing the ACM, go to the Array Manager console and click to expand the Arrays storage object, right-click the PV660F Subsystem storage object, and select Rescan from the context menu that comes up. This action will update the fan status within the GUI.
See the Dell PowerVault 660F and 224F Storage Systems Installation and Troubleshooting Guide for information on how to troubleshoot the power supply. See the Dell PowerVault 660F and 224F Storage Systems Service Manual for part replacement procedures.
Note LS modules are cards installed in slots at the front of the enclosure. Each LS module has an SES processor that monitors environmental functions and a Loop Redundancy Circuit (LRC) function, which maintains the viability of the Fibre Channel loop. |
After troubleshooting or replacing the power supply, go to the Array Manager console and click to expand the Arrays storage object, right-click the PV660F Subsystem storage object, and select Rescan from the context menu that comes up. This action will update the power supply status within the GUI.
Check all fans to see whether they are functioning properly. If yes, check that the ambient temperature is within limit. If necessary, adjust the room temperature. If the problem persists, power-cycle the system. If this does not solve the problem, replace the affected Advanced Cooling Module (ACM). See the Dell PowerVault 660F and 224F Storage Systems Installation and Troubleshooting Guide for information on how to troubleshoot the ACM. See the Dell PowerVault 660F and 224F Storage Systems Service Manual for part replacement procedures.
After fixing the temperature problem, go to the Array Manager console and click to expand the Arrays storage object, right-click the PV660F Subsystem storage object, and select Rescan from the context menu that comes up. This action will update the temperature and/or the fan status within the GUI.
The LS module connection may be broken or the management hardware is bad. LS modules are cards installed in slots at the front of the enclosure. Each LS module has an SES processor that monitors environmental functions and a Loop Redundancy Circuit (LRC) function, which maintains the viability of the Fibre Channel loop
Check to see that the LS module is enabled.
Follow the Dell PowerVault 660F and 224F Storage Systems Installation and Troubleshooting Guide for troubleshooting an LS module. For part replacement, see the Dell PowerVault 660F and 224F Storage Systems Service Manual.
After resolving the hardware problem and providing corrective action, go to the Array Manager interface and click to expand the Arrays storage object, right-click the PV660F Subsystem storage object, and select Rescan. This action will update the enclosure status within the GUI.
The enclosure has duplicate loop IDs (soft addressing).
Make sure shelf ID switches on all PV660s and PV224s in the subsystem are set to different numbers. See the Dell PowerVault 660F and 224F Storage Systems Installation and Troubleshooting Guide on how to set shelf IDs.
A battery backup unit (BBU) with a low charge was found on the controller.
If this message occurs without power failure, replace the BBU.
To replace the BBU, see the Dell PowerVault 660F and 224F Storage Systems Service Manual for part replacement procedures.
The BBU requires two reconditioning cycles prior to first time use. This reconditioning process will take several hours and cannot be interrupted. Refer to "Recondition" in the chapter "The Dell PowerVault 660F Storage System" for detailed instructions on performing a BBU recondition.
After troubleshooting or replacing the BBU, go to the Array Manager console and click to expand the Arrays storage object, right-click the PV660F Subsystem storage object, and select Rescan from the context menu that comes up. This action will update the BBU status within the GUI.
This message will appear each time the controller is (re)booted until the BBU is manually reconditioned. In this state, the battery is still protecting the controller cache, but the maximum level of the battery may be uncertain. To re-establish the maximum level, you can perform a BBU recondition at your convenience using the procedure described in the "Recondition" section. Be aware that the reconditioning process takes several hours. The recondition process must complete its full cycle without interruption. Interrupting the recondition will invalidate the recondition, thus the next invocation will start at the beginning of the cycle. In other words, the recondition process cannot be suspended and then restarted in the middle of the process.
The BBU requires an automatic and a manual recondition before the maximum level will be accurate. Until the first manual recondition is completed, the maximum level from the battery specifications is used. The automatic recondition will start if one of the following event occurs:
After the automatic recondition is complete, Array Manager will show "BBU Recondition Suggested" when the system is rebooted.
After manually reconditioning the BBU, go to the Array Manager console and click to expand the Arrays storage object, right-click the PV660F Subsystem storage object, and select Rescan from the context menu that comes up. This action updates Array Manager on the status of the BBU.
If the "BBU Recondition Suggested" error message continues to be displayed after you have manually reconditioned the BBU twice, then the battery may be defective and require replacement. In this case, see the Dell PowerVault 660F and 224F Storage Systems Service Manual for part replacement procedures.
The following incidents are included in this topic:
Many of the incidents in this section can be resolved by replacing one or more drives. The procedure for replacing a drive is as follows:
If the virtual disk is offline, try forcing it online with the Force Online command. Right-click the disk and select Force Online from the context menu that appears. See "Force Online" in the chapter "The Dell PowerVault 660F Storage System" for detailed instructions.
If you cannot force the virtual disk online, remove and replace the affected hard drive. See the topic "Procedure for Replacing a Drive" at the beginning of this section for instructions on replacing a drive.
If you still get a hard disk error after replacing the drive, contact customer service.
A hard disk predicted a future failure condition. This disk may fail soon.
Replace and rebuild the drive. See the topic "Procedure for Replacing a Drive" at the beginning of this section for instructions on replacing and rebuilding a drive.
If the rebuild was cancelled, that disk will remain in an unusable state until a successful rebuild has been performed on it.
See "Rebuild" in the chapter "The Dell PowerVault 660F Storage System" for instructions.
Try to rebuild the drive again. See "Rebuild" in the chapter "The Dell PowerVault 660F Storage System" for instructions.
If the problem persists, replace the disk drive. It may be helpful to look at the other events that were generated in order to identify the malfunctioning drive.
See the topic "Procedure for Replacing a Drive" at the beginning of this section for instructions on replacing a drive.
It may not be possible to recover from this error; therefore, you may lose your virtual disk.
Try to rebuild the virtual disk. See "Rebuild" in the chapter "The Dell PowerVault 660F Storage System" for instructions.
If the problem persists, contact customer service.
A hard disk failed because the user either changed the status to Offline or removed the hard disk.
A drive is usually manually taken offline to replace it. If the drive was physically removed from the enclosure, replace and rebuild the drive (using a drive at least as large as the other disk drives in the virtual disk).
See the topic "Procedure for Replacing a Drive" at the beginning of this section for instructions on replacing and rebuilding a drive.
If the drive has been removed or has failed, replace the drive.
If the time-out cannot be reset on the existing array disk, replace the disk.
See the topic "Procedure for Replacing a Drive" at the beginning of this section for instructions on replacing and rebuilding a drive.
It may be necessary to do a complete reboot after the drive is replaced.
If you need more help, contact customer service.
Check all cables, making sure they are correctly and firmly connected and that none are crossed. See the Dell PowerVault 660F and 224F Storage Systems Installation and Troubleshooting Guide for cabling procedures.
Replace the affected disk. See the topic "Procedure for Replacing a Drive" at the beginning of this section for instructions on replacing and rebuilding a drive.
The array disk encountered too many errors, causing the drive to fail and its status to change to Offline.
It is not possible to recover this physical drive. Replace the disk drive. See the topic "Procedure for Replacing a Drive" at the beginning of this section for instructions on replacing and rebuilding a drive.
If the replacement drive still does not work, contact customer service.
Drive failed to spin up during controller bootup.
Check that the new array disk is seated properly. If not, remove and reinsert the disk.
If the problem persists, see the Dell PowerVault 660F and 224F Storage Systems Installation and Troubleshooting Guide on how to troubleshoot the Fibre Channel hard disk drives.
When completed, go to the Array Manager console and click to expand the Arrays storage object, right-click the PV660F Subsystem storage object, and select Rescan. This action will update the array disk status within the GUI.
A previously configured array disk no longer appears in the Array Manager GUI.
Make sure that all enclosures are powered on.
Remove and reinsert the physical drive. Then go to the Array Manager console and click to expand the Arrays storage object, right-click the PV660F Subsystem storage object, and select Rescan to verify that the replacement drive has been recognized.
If the drive is still missing or not found, try replacing the drive. See the topic "Procedure for Replacing a Drive" at the beginning of this section for instructions on replacing and rebuilding a drive.
If the replacement drive does not work, contact customer service.
Communication to a drive on a particular channel has failed.
If this event appears for all existing drives, then a Loop ID problem may be present. See the Dell PowerVault 660F and 224F Storage Systems Installation and Troubleshooting Guide on how to troubleshoot the I/O module.
If the drive has failed, replace the drive. See the topic "Procedure for Replacing a Drive" at the beginning of this section for instructions on replacing and rebuilding a drive.
If the replacement drive does not work, contact customer service.
Try performing a Consistency Check again. If the problem persists, replace the disk drive. It may be helpful to look at the other events that were generated in order to identify the malfunctioning drive. See the topic "Procedure for Replacing a Drive" at the beginning of this section for instructions on replacing and rebuilding a drive.
Note A virtual disk that is Critical will be shown as Degraded status within the Array Manager console. |
Try performing a consistency check again. If the problem persists, replace the disk drive(s). It may be helpful to look at the other events that were generated in order to identify the malfunctioning drive(s). See the topic "Procedure for Replacing a Drive" at the beginning of this section for instructions on replacing and rebuilding a drive.
An array disk failed.
Replace the affected disk. See the topic "Procedure for Replacing a Drive" at the beginning of this section for instructions on replacing and rebuilding a drive.
If you have a non-fault-tolerant virtual disk, a single array disk failure may have caused the virtual drive to go offline. If you have a fault tolerant virtual disk, multiple array disk failures may have caused the virtual drive to go offline.
Try making the virtual drive Online.
Verify through the LED lights that power is supplied to the enclosure.
Identify the location of the failed drive(s). If necessary, refer to the Dell PowerVault 660F and 224F Storage Systems Installation and Troubleshooting Guide.
Replace the array disk(s) if necessary. See the topic "Procedure for Replacing a Drive" at the beginning of this section for instructions on replacing and rebuilding a drive.
It may not be possible to recover from this error. Contact customer service.
One fault tolerant virtual disk has been degraded.
Replace the array disk. See the topic "Procedure for Replacing a Drive" at the beginning of this section for instructions on replacing and rebuilding a drive.
Reinsert the controller and power on the system and/or the controller.
Initialize the virtual disk manually. See "Initialize" in the chapter "The Dell PowerVault 660F Storage System" for instructions.
When completed, go to the Array Manager console and click to expand the Arrays storage object, right-click the PV660F Subsystem storage object, and select Rescan to verify that the virtual disk has been initialized and is recognized.
Replace the array disk. See the topic "Procedure for Replacing a Drive" at the beginning of this section for instructions on replacing and rebuilding a drive.
Replace the array disk. See the topic "Procedure for Replacing a Drive" at the beginning of this section for instructions on replacing and rebuilding a drive.
See "Expand Capacity" in the chapter "The Dell PowerVault 660F Storage System" for more information.
A bad sector was found on an array disk during one of the following operations: consistency check, rebuild, or RAID expansion.
For information on consistency check and rebuild functions, see "Check Consistency" and "Rebuild" in the chapter "The Dell PowerVault 660F Storage System."
If the problem persists, replace the hard disk. See the topic "Procedure for Replacing a Drive" at the beginning of this section for instructions on replacing and rebuilding a drive.
This section describes error messages for the PowerVault 660F storage subsystem that may be displayed by Array Manager.
In addition to a missing or failed LS module chip, this error message may also be displayed when the SES firmware of the active LS module is version 1.9.5 while the SES firmware for the failover LS module is version 1.3b2. The SES firmware for both LS modules should be at the same level. You can resolve this error by upgrading the SES firmware so that both LS modules have the same SES firmware version.
When assigning virtual disk 0 (LUN 0) to a Windows server, a Windows "Unsafe Removal of Device" message may display for the PV660F PSEUDO SCSI Disk Device.
You can disregard the Disk Removed message. This is caused by the PV660F Pseudo Disk (used to manage the Fibre Channel array when no virtual disks are assigned) being replaced by the newly assigned virtual disk.
A Windows Disk Removed dialog may display on the server under the following conditions:
You can disregard the Disk Removed message. This is caused by the PV660F Pseudo Disk (used to manage the Fibre Channel array when no virtual disks are assigned) being replaced by the newly assigned virtual disk.
This section contains the following general problem situations:
The following sections describe how to troubleshoot problems with the PowerVault 660F battery backup unit (BBU).
The PowerVault 660F generates error messages that indicate the battery is either not installed, not recognized, or too low on power to function properly. Here are a few examples of the types of event messages you may receive:
Certain conditions require that the BBU needs to be reconditioned before it can be used. If the power went off or if you are installing a new controller with a new BBU, then the BBU needs to be reconditioned, sometimes referred to as recharged. However, before you start a recondition process, you need to be aware that this operation cannot be interrupted for any reason. Do not try to initiate a fast charge or another recondition cycle, once you have started a recondition process. If for any reason, there is an interruption in the cycle, the process will need to be completely restarted.
To recondition a battery, right-click on a controller, click on C0 Battery or C1 Battery, and click on Recondition.
When all attempts to recharge or recondition the BBU have failed, it is time to replace it. However, before you remove a BBU, it is best to put the system into a ready for Shutdown State. Then replace the component and clear the Shutdown State to re-enable caching. The BBU is contained in the Fan CRU (Dell/Euro) and can be replaced separately from the controller without shutting down the system.
Note Replacing a BBU or a controller can be done in a "hot" condition-you can leave the main system running and shut off power to or disable only the component being replaced. |
For more information, refer to "Battery Properties."
The PowerVault 660F has two controllers. Each controller is encased in its own canister and each controller is on one side of the enclosure. If you receive an event notice that indicates a controller has failed or is not responding, there are certain steps to follow to shutdown and replace the bad controller while maintaining accessibility and functionality to your system. If one controller goes down, the other controller will handle the same functions, (fault-tolerance with the controllers) although performance will be degraded until the bad controller is replaced and brought on line.
Before performing any of these procedures, be sure the good controller is functioning at full performance. You may now remove the faulty controller (pull it out of the enclosure).
Note You may test the faulty controller by removing it and inserting it into another system for diagnostics. In any case, be sure that the good controller is working. Replace the bad controller with a new one as soon as possible. |
Note Do not use the power cycle as a preparation for a hot swap. You will lose access to data if the system is currently online through the surviving controller. |
Upgrading to PowerVault 660F RAID controller firmware version 7.7 or higher can change the controller's World Wide Name (WWN) if the controller is at a version prior to 7.7. When the WWN changes, you may not be able to see the storage with Array Manager or another management application. Resolving this problem can include updating pathing of the FC HBA, as well as FC switch zoning configuration if zoning is being performed by WWN. See the latest SAN or PowerVault 660F documentation that came with your system or that is available from the Dell support site at http://support.dell.com for information on editing the FC switch zone table and resolving problems associated with a new WWN.
The following sections describe how to troubleshoot problems with a failed drive on the the PowerVault 660F storage system.
If you receive an event notice that indicates a disk drive has failed or is not responding, there are certain steps to follow to help detect which disk drive has failed and how to do a spindown. There are several disk drives installed in the enclosure, so that you can configure them to your system requirements. You can use the Blink Disk command to locate a failed disk.
After the faulty disk drive has been replaced, it needs to be restored to full operation. Do this by applying the rebuild and rescan functions. Fault-tolerant is known as mirror technology in RAID configurations, commonly used as RAID 5, requiring three or more disk drives in an array. For more information, see "Dell PowerVault 660F Storage System Configuration Overview."
Other error (event) messages that may indicate a disk drive problem:
Before you remove a physical disk, you must consider the logical drives that are mapped to the disk. If the drive is not marked dead, and it is configured, do not remove it, or data will be lost. It is imperative that only the known failed disk drive is the one removed. If a second drive fails for any reason, data will be lost.
Logical Drives: A logical drive is a partition you create within an extended partition on a basic disk. A logical drive can be formatted and assigned a drive letter. Only basic disks can contain logical drives. A logical drive cannot span multiple disks. For more information on basic disks, basic volumes, and extended partitions, see the chapters on "Disk Management" and "Volume Management."
If Automatic Rebuild Management (ARM) is enabled, then when the firmware detects a failed drive, it will look for a spare. If a suitable spare is found, then the failed drive is removed from all logical drives that it was part of, the spare is inserted into those logical drives, and a rebuild is started.
If ARM is not enabled, and the failed drive is removed and replaced, in the SAME SLOT, then the user may issue a manual rebuild start.
In order to use the ARM feature, go to the Advanced Controller Options settings: Right-click on a controller, click on Controller Options, and Click on the Advanced Tab to continue. When the Advanced Tab opens, make your custom changes. For more information, refer to "Advanced Fibre Channel RAID Controller Options."
Use the following procedures related to locating and removing the failed drive:
The Assign Global Hot Spare function will be selectable only when disks are available within the system.
Add drives to the system. See the topic Dell PowerVault 660F and 224F Storage Systems Installation and Troubleshooting Guide for instructions on adding new drives.
After adding the new drives, go to the Array Manager console and click to expand the Arrays storage object, right-click the PV660F Subsystem storage object, and select Rescan.
This action will update the drive status within the GUI. You are now ready to create a hot spare.
Make sure that the situation you have supports the Expand Capacity command. See "Expand Capacity" in the chapter "The Dell PowerVault 660F Storage System" for more information.
A virtual disk has not been made visible to the operating system.
For a direct connect, reboot the system to recognize the virtual disk.
For a SAN connection, two steps must be performed after creating a virtual disk before the virtual disk will appear in the Disks folder.
Once these two steps have been performed, the virtual disk will appear in the Disks folder of the Array Manager console.
All of the available space on the NT disk has already been used in the creation of one or more volumes for that disk. A volume cannot be created on an NT disk when there is no available used space.
Add a new virtual disk or expand the capacity of the existing array disk. See "Add Virtual Disk" and "Expand Capacity" in the chapter "The Dell PowerVault 660F Storage System" for more information.
A volume that has been marked as a primary partition cannot be deleted. Primary partitions are protected because they contain a bootable operating system.
To override this protection feature, delete the virtual disk that the primary partition belongs to. See "Delete Virtual Disk" in the chapter "The Dell PowerVault 660F Storage System" for detailed instructions.
Only the last virtual disk that was created can be deleted.
Delete the virtual disks in the reverse order that they were created. See the topic "Delete Virtual Disk" in the chapter "The Dell PowerVault 660F Storage System" for detailed instructions.
This problem may occur in the following situations:
Rescan the PV660F storage system.
Both controllers must have the same version of firmware to operate in a redundant configuration. If a failed controller is replaced with a controller with a different version of firmware, the replacement controller will not be allowed to start and will be disabled by the existing controller.
Use the following procedure to download a common firmware image:
Note If an empty enclosure is available, the firmware can be downloaded to the replacement controller without having to power off the subsystem. However, this will not work if the replacement controller has more recent firmware. |
The Enclosure Management Advanced controller option is disabled.
Change the Enclosure Management Advanced controller option to enabled. See the "Advanced Fibre Channel RAID Controller Options" section of the "The Dell PowerVault 660F Storage System" chapter for details.
If the Enclosure Management Advanced controller option is already enabled, perform a Rescan on the PV660F Subsystem storage object. If this isn't successful, perform a Reset on the controller object. See "Reset" in the "The Dell PowerVault 660F Storage System" chapter for details.
It is possible that the Fibre Channel array might not discover some of the PV660F Fibre Channel arrays if any of the controllers are not fully rebooted.
To resolve this problem, right-click PV660F SubSystem and select Rescan.
Either the controller has activity occurring, or the queue limit is set to zero.
Verify that no I/O is occurring to the PowerVault 660F array. You can verify this by viewing the activity LEDs on the controllers.
Verify that no controller activity is occurring. Virtual disk initializations, Expand Capacity, or any other controller activity will prohibit resets from occurring.
Verify that the Queue Limit in the controller options is not set to zero. If set to zero, reset it to a higher value (such as 32).
Starting the Array Manager service or doing a rescan when the Qlogic HBA to switch cable failure occurs on one of the servers in a SAN fabric may cause the Application Transparent Failover to autofail all LUNs on the PV650F that are assigned to other servers.
To resolve this problem, disable the PV660F management on the server with HBA failure and enable PV660F management on another server in the SAN until the cable failure is resolved.