SCSI and SAS Failures Troubleshooting Flowchart
Copyright 2013 by Morris Rosenthal All Rights Reserved
Excerpted from Computer Repair with Diagnostic Flowcharts Third Edition
Copyright 2013 by Morris Rosenthal
All Rights Reserved
Diamond symbols linked to decision text. Unplug
ATX power supply before working inside PC.
SCSI and RAID Hard Drive and performance issues for SAS and SATA
Are you using a SAS (Serial Attached SCSI) adapter? SAS has replaced traditional SCSI in all modern applications, but SCSI was popular in high performance PCs and PC based servers before SATA came along. SAS adapters can usually work with SAS drives and SATA drives, both being serial bus devices, the difference is mainly in software. SAS drives can run higher signaling voltage, allowing for longer cable lengths.
Is your SCSI adapter recognized by the BIOS at power up? All modern SCSI (pronounced "skuzzy") adapters carry their own SCSI BIOS that must be recognized and loaded at boot time. There are some ancient "dumb" SCSI cards kicking around for running scanners or old CD drives which are run through an operating system driver, but I've never seen a PCI version. When the SCSI adapter BIOS loads, it will flash an on-screen message, like "Press CTRL-A" to access the SCSI BIOS (Adaptec).
SCSI adapters are pretty sophisticated, practically single board computers. There are still some of 5V PCI SCSI adapters kicking around in old business servers, but new PCI slots only support 3.3V adapters. This is the first thing to check in the documentation of your SCSI adapter and motherboard if you've done a motherboard replacement.
Have you tried moving the adapter to a new slot? Make sure you screw in the hold-down screw and that the card is evenly seated in the slot. The sophistication of SCSI adapters makes them a little more finicky than other PCI adapters, and the order in which the PC BIOS reckons them up may make a difference. You can go into CMOS Setup and play with the PCI bus legacy settings if the adapter BIOS won't load. Quality SCSI adapters are equipped with an onboard LED to confirm the adapter status and report error codes. If you can't get the adapter to report in on power up, proceed to Conflict Resolution.
Does the BIOS screen generated by the SCSI Adapter list any of the SCSI devices you have installed? If the SCSI support is integrated on the motherboard, this information may be combined with the standard BIOS boot screen. The device should be identified by manufacturer, model, SCSI ID and LUN (Logical Unit Number, largely irrelevant unless you're building a large array). The SCSI adapter itself should appear in this accounting, generally on SCSI ID 7.
Does the SCSI BIOS see all of the SCSI devices you've installed? If it does, whatever access problem you're having is most likely the result of outdated operating system drivers or SCSI application software. If everything works fine but you have intermittent SCSI problems, proceed as if you had answered "no" to this question, and pay special attention to termination, cable quality and SCSI limit issues.
Are all of your SCSI device IDs unique? It's almost certain that you believe they are unique since you set them before you installed the SCSI devices, but double check. The most common failing that can leave you with two devices sharing the same SCSI ID is a misplaced or defective jumper. It's not the black plastic that makes the connection, it's the little metal spring clip within the plastic housing. If the jumper is defective, it won't set the ID bit. Some SCSI jumpers are so tiny that it's easy to miss one of the pins when you place them. If you were trying to set two drives on SCSI addresses "0" and "2" and the jumper on the ID 1 pair missed one of the posts, you'd end up with two drives set to ID 0.
Although the binary addressing should be fully explicated in your SCSI documentation, you're probably working with old stuff for which the documentation is long gone and may not even exist on the Internet, so the general deal is as follows.
Older, "Narrow" SCSI controllers supported up to 8 devices (including the controller), so they required three ID bits to select a unique set of addresses. The standard is to label these as, ID 0, ID 1, and ID 2, where ID 0 is the low bit. An "X" represents a jumper.
Newer, "Wide" SCSI controllers and devices support up to 16 addresses, where the controller takes up one, so a single controller can support 15 devices. The only difference between the jumper setting for new and old devices is that newer devices have a fourth SCSI ID selection, ID 3, which is jumpered for addresses 8 - 15. The lower ID bits are set exactly as above, but you add 8 to each address when ID 3 is jumpered.
Is the SCSI bus terminated on both ends? This has gotten particularly tricky since more recent SCSI adapters could auto-sense termination requirements and handle it themselves, and the newer (but still old) LVD scheme provides termination on one end of the cable. The newer SCSI devices that work with the Wide LVD cable should ship with termination disabled, though they may have onboard termination available for compatibility with the older SCSI technology. You really need to check the documentation or hop onto the manufacturer's website for details about the SCSI termination for particular devices.
The important thing to know about SCSI termination is that both ends of the bus must be terminated. The bus is a physical thing, not a theoretical conception. All of the devices on a SCSI bus share the same parallel transmission line for data and signals, and the devices at both ends of the line (or the line itself) must provide termination. There are four basic possibilities for the SCSI bus architecture.
1) You have an internal SCSI adapter and one or more internal SCSI devices. The devices are attached to the SCSI adapter by a ribbon cable. The SCSI adapter is at one end of the bus and must have termination enabled or set on automatic (usually done through the SCSI BIOS, but the oldest SCSI adapters employed a physical jumper). The SCSI device at the end of the bus must be terminated if you're using a 50 wire ribbon cable, or it must be attached to the last connector before the terminator on the end of a 68 wire LVD cable.
2) You have an internal SCSI adapter and one or more external SCSI devices. The SCSI adapter must be terminated, and the last external SCSI device on the daisy chain must be terminated. Some external SCSI devices are equipped with a termination switch, others require installation of a SCSI terminator on their outgoing SCSI connector.
3) You have one or more internal SCSI devices and one or more external SCSI devices, all attached to the same internal SCSI adapter. The last external device on the daisy chain must be terminated and the last internal device on a 50 pin ribbon cable must be terminated or attached to the last connector on the 68 wire LVD cable. The adapter must have termination disabled because it is in the middle of the bus.
4) You combine any of the above scenarios with a SCSI adapter that supports two internal SCSI busses, a high speed 68 wire LVD bus and an older 50 pin bus. If you have both types of SCSI devices, it's recommended that you install them on separate cables for best performance, even though adapters or dual connectors may be available for the device. Any time you have two internal cables attached to the SCSI adapter, they must be terminated according to their type. The last connector on the 50 wire cable must be connected to a terminated drive, while the last connector on the 68 wire LVD cable must be connected to an unterminated drive.
Are all of the cable connectors seated? When you start getting up to 50 pins mating into an old fashioned connector, it can take a bit of pushing. After you think the connector is seated, push on each end of the connector in turn to make sure that it isn't rocking on an obstruction in the middle. A properly seated cable connector won't move, wherever you push on it. The newer 68 pin LVD connectors make the connection in a smaller form than the older 50 pin connectors, and the unfortunate side effect is that the pins themselves are more fragile. Be careful when seating the connectors, and if you have to pull the connector off for troubleshooting, inspect the pins to make sure none are bent over.
Have you obeyed all the SCSI limits? These limits are entirely dependent on the SCSI adapter and devices you are using, not to mention the number of devices on the bus and the type and quality of the cables. Ultra SCSI 160, 320 and 640 have focused on data throughput rather than increasing the maximum bus length. The limitations subject is far too complicated for this discussion, but keep in mind it works both ways. For example, if you build your own cables, you can't put the connectors two inches apart just to keep the cabling in your case neat. There are minimum as well as maximum distances involved, depending on the SCSI technology and the speed at which you are running the bus.
Have you replaced the cables? SCSI cables are the weak links in older machines, particularly when you've made and unmade the connectors a number of times. The stress relief on the connector can fail, particularly if you use the cable to pull the connection apart. Pushing the two halves back together just doesn't cut it in high speed communications.
Can you get the adapter to recognize a single device? If the adapter termination is on automatic, you can try putting it on manual and forcing termination on, though it's rarely the issue. If you're using an older Narrow SCSI device for the test, make sure it's terminated. If you're using a Wide device on an LVD cable, make sure it's unterminated and connected to the last connector on the terminated end of the LVD cable. If you can't get the SCSI adapter to see the device, try another one, if you have one available. SCSI adapters ship with pretty good onboard diagnostics, and there may be a further piece of diagnostic software available on the driver CD or the manufacturer's web site. SCSI adapters are one of the higher quality items in the PC industry, but they do fail, so if you absolutely can't get a SCSI adapter to register a device, even when testing with multiple cables and devices, it's probably dead.
If you get the SCSI adapter working with one or more devices but still have problems, it comes down to process of elimination. If it's a reliability issue, try running the bus at lower speed, or with the slower or older devices temporarily detached. If you never got all of the SCSI devices recognized, try them in different combinations and triple check the IDs, though it's always possible that some devices are good and others are dead.
Is your SAS (Serial Attached SCSI) adapter live and reporting a BIOS screen on boot? All SAS adapters feature their own BIOS which you can access through a hot key combination (CTRL-A for Adaptec) which will show you the status of the adapter, all attached drives, and allow you to control various adapter functions.
If the adapter is not recognized and loading its BIOS, the first step is to reseat it in the slot. Make sure the motherboard PCI version is 2.2 or better, unless it's an early SAS adapter that explicitly states it will run on an older version. Make sure that the number of PCI Express lanes are correct. An SAS x4 adapter requires a PCIe x4 slot. There's no point running an expensive SAS adapter in a lower speed slot than it requires because even if it limped by, you'd just be throwing away your investment in the high performance adapter and drives. And check the adapter maker website for an updated list of hardware the SAS adapter has been tested on, especially motherboards.
Are you using an external enclosure for SAS or SATA drives? Most SAS implementations use external enclosures for grouping large numbers of hard drives that can't fit into a PC case. In fact, if your application doesn't involve at least three or more hard drives, you can probably get better performance cheaper by using solid state drives on a PCIe or SATA controller.
Is a single drive in the enclosure reported as bad? SAS controllers are designed for working with large numbers of external drives in enclosures, and some offer the neat feature of being able to blink the status LED of a particular drive in an enclosure to help you find it. The enclosure itself may be able to identify certain drive failures with a status LED.
If you are having trouble getting any of the drives in an enclosure to work, check the fan-out cable, make sure that the drives, adapter and enclosure are all certified as compatible by the adapter maker, and of course, check the enclosure power. It also pays never to mix SAS and SATA drives in an enclosure, even if the specs appear to support it. It doesn't make any sense to use an SAS enclosure as a dumping ground for random drives when performance and reliability are the only justification for the cost.
Does the SAS adapter BIOS see every drive? If you are having trouble with a single drive, check the connections, try replacing the cable, try swapping the drive power if it doesn't spin up. Download the test software from the drive maker for SAS drives for full diagnostics. SATA drives aren't supported quite as well by the manufacturer's software, but there should be some test software available.
Did an auto rebuild initiated by the RAID result in failure? The time to test your RAID implementation is before it fails, not after, and you should follow the adapter maker's guidelines for testing, don't just pull the power plug on one of your server drives to see what happens. In the event an auto rebuild fails, check the obvious things first, like that the replacement hard drive is at least as large as the failed drive, that it's on the adapter manufacturer compatibility list, that it's the same technology (SAS or SATA), and that it's been initialized. Otherwise, you'll want to try a manual rebuild after consulting the documentation for your adapter.
Are you using a mixture of SAS and SATA drives? Many SAS controllers support a mix of SAS and SATA drives but that doesn't mean it's a good idea. Sometimes people start with an SAS RAID and get sticker shock when they need to replace a single drive, comparing the SAS to SATA pricing. But there's little reason to invest in an SAS controller and any SAS drives if you're going to end up mixing the technologies, and it can result in compatibility problems, even if the adapter maker states that the drives are supported.
Have you flashed the adapter BIOS? Unless you have specific instructions from the adapter maker that you need to flash the adapter BIOS for compatibility reasons, it's a sort of last ditch measure. The adapter maker is the only source for the BIOS upgrade, and they should also supply a program and detailed instructions for carrying out the update.
If you've already flashed the BIOS and none of the drives in the array are bad, go back to the adapter maker's website and check compatibility of the adapter with the motherboard and with all attached SAS devices, by model and manufacturer. If you've been trying to install drives in a new enclosure, take one out and mount it in the PC just to see if it will function there. The fan-out cables for SAS are the apex of PC cabling, so inspect them carefully and make sure you're using a quality cable.
If all of the hardware is in good working order, at least as far as you can determine by status LEDs and the adapter BIOS reporting for the drives, the problem is likely in the operating system setup. Make sure that the adapter and the operating system have been tested together, that any updated drivers have been installed, and try the adapter maker's support forums. Just because an SAS adapter is new and came with a DVD that included drivers for your operating system doesn't mean they are up to date.