Troubleshooting Hard Drive Boot and Performance
Copyright 2013 by Morris Rosenthal All Rights Reserved
Excerpted from Computer Repair with Diagnostic Flowcharts Third Edition
Copyright 2013 by Morris Rosenthal
All Rights Reserved
Diamond symbols linked to decision text. Unplug
ATX power supply before working inside PC.
SATA AHCI, SMART Errors, Speed, SSD and RAID Hardware
Do you get at least a partial operating system load on a boot drive? This includes the Windows splash screen, or even just the running dots when a kernel is loading. Has the drive been moved from another system? Try booting in Safe Mode, since all the drivers for the underlying hardware will be incorrect in Windows. It's possible for a drive that's been moved from another system to begin booting even if the drive parameters and addressing mode have been detected incorrectly. Old PCs accepted manual input of hard drive parameters that can't always be auto-detected by a new motherboard. Otherwise, boot failure may be unrelated to the drive itself and due to a hardware conflict, data corruption, a bad install, etc. If your performance issue is with a second (non-boot) hard drive, follow the "Yes" branch of the flowchart.
Have you checked the boot order of all the installed devices in CMOS Setup? The boot order specifies which drive should be tried first and should be set to your boot hard drive unless you are having trouble with an operating system install. Boot order problems come up quite frequently these days since practically all newer PCs support booting from a USB storage device, a memory stick or an SSD. If the boot order is set to USB and anything that can be interpreted as a drive is attached, including the memory card of a camera left in a digital film reader, the BIOS may try to boot it and simply lock up rather than going on to the next device. If you absolutely prefer to leave USB or the DVD as the first boot device, unplug any USB attached devices and remove any media from those drives. If the hard drive still fails to boot, set it to be the first boot device and try again.
Can you boot from a bootable USB memory stick, SSD or from an operating system DVD or CD? If not, this is where you change the boot order to try the USB device or the DVD first. The USB test isn't perfect because if you've never tried booting a USB device on the PC before, the BIOS support may not be 100%. You should use a USB port on the back of the PC (in the motherboard I/O core) rather than a front mounted USB port, which may be slower or shared. Likewise, if you are trying to boot an operating system DVD or an old CD, it could be that the CD/DVD drive is taking too long to spin up and the BIOS is timing out. Try ejecting the disc, hitting the reset button on the PC, and then sliding the tray back in so that the drive will be spinning up by itself by the time the BIOS checks for a CD. Confirm that the DVD is bootable in another PC, try wiping it off with a flannel shirt if it's covered with finger prints.
If you can boot from a USB device, but not from a hard drive with a known good operating system installed or a bootable operating system DVD, it sounds like an ATA controller or cabling problem and you need to return to the ATA Failure chart. If you can't boot from a USB device or a DVD when the PC has done so in the past, it's more likely a motherboard failure issue, so see the Motherboard, CPU and RAM chart.
Have you just switched to AHCI in CMOS Setup to boost the performance for your SATA hard drive or SSD? AHCI (Advance Host Controller Interface) is an Intel created standard that allows SATA devices to perform at their best, rather than being limited by a BIOS that only understands the older IDE devices and basically runs the SATA devices in compatibility mode. But switching to AHCI after the operating system has already been installed on the boot drive can lead to boot failure. The reason is that some Windows releases that support SATA (Windows Vista and Windows 7) don't install the AHCI drivers unless AHCI is enabled in the BIOS before the operating system install.
If you have this problem, it's too late to tell you that you should have enabled AHCI first, but it's not too late to make the switch. You will have to revert to the original BIOS setting in order to get the PC booted, and then you'll have to install any required motherboard drivers for the SATA controller in Windows and edit the registry entry. This varies with the operating system version and service pack, so do an Internet search for exact instructions with screen shots. After you've made all of the required changes, you can re-enable AHCI in the BIOS in CMOS Setup and reboot.
After you've booted from a USB device or an operating system DVD, can you exit to the command line and read the information on the hard drive that failed to boot? Can you read the hard drive data if it's installed in an external USB shell? If you can, it's likely the operating system has been corrupted. This could be due to a virus, an actual error writing to the hard drive, or a piece of software running amok and writing data to the wrong location on the drive. Back up any critical data while you can access the drive, then try running ScanDisk, CHKDSK or an equivalent and see if it can repair the drive. Otherwise, you can try to use the Windows DVD to repair or reinstall (options depend on particular OS and PC manufacturer). Most operating systems allow you to reinstall without wiping out any of your data or programs. They (should) always prompt you to see if you want to continue before actually destroying any data. But if the hard drive was making a repeated clicking sound when it tried to boot, there's a good chance there is physical damage in the boot sector, and getting the data off is the best you can hope for.
Do you have a good backup? A good backup doesn't just mean that you've plugged in an external hard drive that's supposed to handle automatic back-ups, it means that you've actually checked the external hard drive to make sure that the files you need are on there and that you can access them. Internet back-up services are actually more reliable than local back-ups, and if you only care about a small number of files, like the novel you are writing, e-mailing it as an attachment to your Yahoo!, Hotmail or Gmail account is all that it takes. If you do have a good backup and the OS recovery has failed, you can try deleting the partitions on the drive through Windows Disk Management or FDISK and starting from scratch. This means losing all of the information on your drive, so if you have any critical data and you aren't sure of what you're doing, seek professional help.
If you don't have a good backup and the data is critical, you might want to invest in the latest disk doctor and virus doctor software you can find. If the data is going-out-of-business critical, you can send the drive out to a data recovery outfit. Data recovery is expensive, from the mid-hundreds to thousands of dollars, but they can usually recover data from a drive as long as it hasn't been maliciously wiped out and the data platters haven't been physically damaged. Unfortunately, data recovery from SSD drives is more difficult as SSD drives are subject to catastrophic software control failures and the information on the chip level is not organized as simply as the data on magnetic hard drives.
Is the drive slow, meaning slower than it used to be or slower than you expected based on experience? Is the drive noisy, either in terms of volume ("That's one loud drive") or simply because it never stops seeking? You can download test software from the Internet that will report on hard drive performance through read/write tests and can turn up problems with the interface or BIOS settings, but hardware testing won't tell you anything about the presence of malware or data fragmentation. If the hard drive status LED on the front of the computer blinks continually after the operating system has finished loading and before you've begun working, count that as "slow."
Can you actually hear the drive spinning up whenever you do something that requires drive access? If you have an SSD hard drive, you must be hearing fan noise from the power supply or a heatsink fan because the drive has no mechanical parts. When "regular" hard drives cycle up and down, it either indicates a bad power supply, bad instructions from the controller, or that the drive is failing.
You can eliminate bad instructions from the controller by simply disconnecting the data cable and seeing if the cycling stops. Obviously, the system won't be able to boot during this test and you'll probably see a "hard drive failure" message. It could just be that the power management settings for the drive are too aggressive and that either the operating system or the BIOS (through CMOS Setup) is telling the hard drive to power down any time you don't access it for 60 seconds. Turn off power management in CMOS Setup, at least for the hard drive, if that's an option. Check the power management settings in Windows Control Panel and turn off hard drive power management.
If the hard drive cycles up and down even without the data cable connected, it's either the power supply or failing drive electronics. Try a different power supply lead to the drive, and if that doesn't help, test the hard drive in another PC or in an external powered USB cage to see if it still cycles up and down before giving up. Since the failure is with the drive electronics, it's a good candidate to send out for data recovery if you have critical data that isn't backed up.
SMART (Self-Monitoring, Analysis and Reporting Technology) has been around forever and it's generally implemented well by hard drive manufacturers. But it's not always supported by the motherboard BIOS, and is often disabled in the BIOS by default even when it is supported. SMART tracks dozens of hard drive operational parameters, error counts, temperatures, etc, which can be accessed through the operating system if you download a free testing program that includes SMART data. The hard drive manufacturers all supply free hard drive diagnostic software on their websites, but for their brand of drive only. When SMART is implemented properly through the BIOS, you'll get a warning of impending failure when the error counts reach a pre-defined critical point for the drive. If you get a legitimate SMART error message from the BIOS, it's time to back up critical data and replace the drive.
Hard drive data errors can manifest as error messages from the BIOS that specify HDD errors, they can show up when running ScanDisk or CHKDSK, or they can cause BSOD (Blue Screen Of Death) errors, commonly followed by a Windows message on the next boot-up that the system was shut down improperly and a scan to test data integrity is required. Newer hard drives all have the built-in ability to transparently manage a reasonable number of magnetic media failures by moving around data and closing off bad sectors on the drive, so when errors are serious enough to report, it's worth taking notice.
Does your drive accumulate data errors over time? Have you had to reinstall the operating system more than once, or is CHKDSK or ScanDisk constantly telling you that it's recovering lost files? Try downloading and running the hard drive manufacturer's diagnostic software, or on older PCs, running a "Thorough" scan with ScanDisk, which verifies the physical disk surfaces and can take all night on a large hard drive. If physical errors on the drive are identified and repaired (the software marks them as bad and removes them from use), and new errors are discovered the next time you run the test, the drive is failing.
As always, check the IDE or SATA data cable, and if you've been fooling around in the case quite a bit, it's worth a shot to replace it. Check for viruses. Errors can result from the drive running too hot, so if it's a hundred degrees in the room, consider air conditioning or moving the PC. It can also get pretty hot in the case even in an air-conditioned environment if there isn't enough air circulation in the case and the drives are stacked in like pancakes. RF interference on the data cable is another (remote) possibility, caused by a poorly designed or partially failed adapter on the bus that's acting like a broadcast antenna at just the wrong frequency and overwhelms error correction. Back up all of your data (really the first thing you should do when you start seeing data errors on a drive) and reformat the drive. Do a slow format, not the fast format some OS installs allow. You know it's a slow format when it takes hours.
The most common RAID implementation these days is RAID 0, which is strictly for performance. If you had two drives installed in your PC for the sake of data security, then you are probably running a RAID 1 array. RAID 1 mirrors every bit of data on both drives, so if one drive fails, the system should inform you that it needs to be replaced but continue running as usual. The problem is, many people confuse drive integrity with data security. If you get a virus or other malware infestation, or if you accidentally delete data, the deletion or infestation is mirrored to both drives. The only true security is in incrementally backing up your data and storing multiple copies offsite.
If you have three or more hard drives installed in a server, you're probably running a RAID 5 configuration. RAID 5 employs both data striping and parity (a form of error checking) so that in the case of a single drive failure, the array can rebuild itself with no data loss when the failed drive is replaced with an empty new drive. If you've reached this point without finding your hard drive performance issue, it's likely a software problem (some copy protection software can slow drive performance) or an OS compatibility problem with the controller.
Have you installed the SATA drivers that came with the motherboard and updated the SATA drivers to the latest version? Many home builders skip this step in their eagerness to get the system up and running, and they never know the difference (except in performance) because the BIOS can manage the interface between the SATA controller and Windows without the drivers, it just doesn't do as good a job of it. Performance is strongly dependent on efficient communications between the SATA controller and the operating system, and that's why the drivers are needed.
The #1 reason for slow boot times followed by slower than normal performance for the first half hour or so of operation is that a virus scan is running in the background. Don't launch any programs after starting the PC, just wait for the hard drive activity LED to stop. If the hard drive LED is still indicating heavy usage five minutes and fifteen minutes after boot, it usually means that virus software is running a full scan on start-up. This happens when the virus software is set to scan the PC on boot, or the scan is scheduled for the middle of the night (when the PC is powered down) and the software tries to catch up on its schedule the next time the PC is powered up.
First make sure the drive isn't getting too full. My rule of thumb, used to be keeping 20% free space on a drive for temporary files and the Windows swap file, but the drives have gotten so big that 10% is probably enough. Many programs create large temporary files without really documenting the fact, so you can be sure you always need more free space than you think. When you delete files and directories to free up space on the drive, you also have to empty your Windows trash, or the space doesn't actually become available. The main culprits are downloaded music and videos. It would take ten thousand Shakespeares writing for a thousand years to fill up a hard drive with text.
Have you checked the BIOS selections for the hard drive controller in CMOS Setup? SATA drives should use AHCI (Advanced Host Controller Interface) unless the BIOS doesn't support it, but if AHCI isn't selected, see the decision point for "Just switched to AHCI?" before taking action. Parallel ATA (IDE) controllers should be set for the fastest DMA or UDMA mode supported, and never for PIO. You should also check older versions of Windows for the hard drive interface mode if you had some hard drive errors after which Windows slowed to a crawl. Some Windows versions would automatically revert to PIO after a number of hard drive time-outs. This problem is more common with secondary drives than the boot drive, but it can happen with any PATA device, including DVD and CD drives. If you search the Internet for "DMA reverts to PIO" you'll find some free tools that will save you from having to edit the registry if that's not in your comfort zone.
Is the drive with performance issues an SSD? Both SSD and mechanical drives share a number of performance troubleshooting steps associated with the operating system and these were addressed above. But the underlying technology differences between SSD and magnetic hard drives means the final troubleshooting steps diverge. For example, you never want to run DEFRAG on an SSD.
Have you defragmented the disk recently? Run Defrag by right clicking the drive in My Computer, choosing Properties and then Tools (in older Windows versions, this is found under Programs > Accessories > System Tools). If Defrag gives you any grief, try running CHKDSK or ScanDisk first. Some Windows versions also have a tool called System File Checker which runs as the Admin command: "sfc /scannow" for which Microsoft offers a tutorial on their website.
If CHKDSK or ScanDisk doesn't make it all the way through the drive, make sure you aren't running any other programs (you can often use ctrl-alt-del to end all the non-critical tasks), and try again. If it still doesn't work, restart in Safe Mode and try. If you still can't get through ScanDisk, consider backing up your data and reformatting the drive. It's a tough call at this point, if you've defragged the drive, it's not full, and it's still slowly beating itself to death. Buy or download some decent virus doctor and spyware eradicator software, because you probably have a virus or a mess of spyware on the drive making life miserable.
If your drive is continually noisy, sounds like a quiet airplane as it spins up and then makes high pitched rotational noise all the time, it's just mechanical noise. While it's not a good sign, and I would suggest replacing the drive, I've also seen noisy hard drives linger for years and years. If the drive starts sounding like a penny rattling around in a tin can, it's time to back up your data, but if it's sounded that way for years, it's just poorly built. Make sure there are four screws securing the drive in both cases.
If there were no physical symptoms or signs of error, and the drive diagnostics software didn't turn anything up, it's possible that a third party driver is interfering with hard drive performance. Some gamers report that the Starforce copy protection scheme for games, at least in some versions, uses drivers that can degrade drive performance. So check the user forums for the game software installed on your PC and see if there are known issues. Otherwise, it's possible that the performance bottleneck you are experiencing with the hard drive is really due to a different hardware issue, so check the other flowcharts.
Is your SSD brand new? Some home builders and upgraders add an SSD for use with games or to replace their boot drive and immediately start running performance tests to see what they got for their money. The problem is that SSD drive performance can degrade rapidly in the first few hours and weeks of use as the drive fills up. This isn't a failure a failure as long as the performance stops degrading at a level that's equal to or higher than advertised.
There are several figures of merit used for specifying SSD performance, but I like using MB/s for apples-to-apples comparison with standard hard drives. For starters, keep in mind that the fastest standard interface for hard drives, SATA 3, has a maximum transfer rate of around 600 MB/s, but the best mechanical hard drives rarely manage 100 MB/s, unless the data happens to be in cache.
Rates of 600 MB/s and higher are possible with SSDs, though currently, the only drives capable of rates above the SATA 3 maximum are found in servers using Serial SCSI or FibreChannel interfaces. You may see your SSD rated in IOPS, Input/Output Operations Per Second, which can actually be translated into MB/s if you know the size of the test transfer in bytes. When rated in IOPS, high performance magnetic hard drives (10,000 rpm) on an SATA 3 interface can achieve ratings as high as 150 IOPS and Serial Attached SCSI drives for servers can break 200 IOPS. The IOPS ratings for SSD drives are at least an order of magnitude higher, in the thousands or tens of thousands (higher for server drives). But the IOPS metric favors SSDs for their random access ability, the transfer rate for large amounts of data is not always that much higher in PC applications, sometimes as low as 100 MB/s.
Have you tweaked the Windows and BIOS settings for your particular SSD? SSD drives are relatively new, so both manufacturers and gamers are still determining how to get the most out of them. Certainly you should disable any scheduled drive maintenance tasks in Windows (especially DEFRAG), and disable Indexing under the properties tab for the SSD. Other suggestions range from disabling Windows write-back cache to enabling disk data cache. You should also make sure that TRIM is enabled (TRIM is a command, not an acronym) in Windows 7 and later versions. Some experts will suggest you disable System Restore, but that means losing the ability easily cure many simple malware and accidental registry corruption issues.
If you've tried performance tweaks in software settings and you still aren't happy with the transfer rate, make doubly sure you have all of the drivers for the SSD and the SATA controller installed. And although SATA cables aren't always rated for speed, if your SATA 3 SSD is on an SATA 3 controller, make sure the cable is rated for 6 Gb/s. Check the SSD manufacturer website to see if they recommend a firmware update, remembering that a failed firmware update can leave you with nothing. See also the Motherboard, CPU and RAM Performance diagnostics in case the problem is really slow overall performance and not the fault of the SSD.