Troubleshooting CPU, RAM and Motherboard PerformanceExcerpted from Computer Repair with Diagnostic Flowcharts Third Edition |
|
Fun, intelligent SciFi without wars or potty-mouth heroes.
Copyright 2018 by Morris Rosenthal All Rights Reserved |
Diamond symbols linked to decision text. Unplug
ATX power supply before working inside PC.
Slow Speed and Random Freeze-ups from Overheating and BIOS or Driver IssuesDoes the system reboot itself for no apparent reason, either during the boot process or at any point once you're up and running? Random reboots and freezes are often caused by mechanical or thermal problems. However, keep in mind that an inadvertent power management setting may be sending your system into sleep mode after one minute of inactivity. A corrupted operating system or a virus can also cause repeated reboots or freeze-ups, as can the power supply and bad memory. We'll work through these by process of elimination. Return to diagnostic flowchart I've seen bad power supplies that cause a system to reboot if somebody walks across the room or sets a coffee cup down on the table. But people are often quick to blame the power supply for random issues that are actually caused by problems with the electric utility distribution system that the power supply is drawing on. If you live somewhere with frequent brown-outs (the brightness of incandescent light bulbs will droop noticeably), or with frequent fluctuations in the distribution voltage caused by demand from large industrial facilities, or with off-the-grid power from a variety of alternative energy solutions, the fix may be to purchase a quality UPS (uninterruptible power supply) that protects from surges and brown-outs. It also makes sense to use a simple plug-in tester to see if the circuits in your house are properly grounded. Return to diagnostic flowchart Is the CPU temperature stable? If the CPU temperature continues to rise and nears the maximum allowed by the manufacturer, it will usually result in a reboot or lock-up as the CPU shuts down to protect itself. If the fan on the CPU heatsink never spins up, you've found the problem and can probably replace the fan without replacing the heatsink. Also remember that a working fan's ability to cool the heatsink depends on the temperature of the air in the case, so if the case fans have failed or are blocked, or if intake and exhaust fans are working against each other (both doing intake or exhaust) the air temperature is probably too high for efficient CPU cooling. The same is true for running in hot weather without air conditioning. If you need to replace the fan and heatsink, see the text for "Fan and heatsink active?" in the Motherboard, CPU and RAM failure chapter. If you have a multi-core CPU, another option is to disable one or more of the cores to see if that cures the overheating problem. While you may be unwilling to live with a CPU that performs below its potential, it's a great troubleshooting step because it can pin the problem squarely on CPU overheating. If the CPU and GPU are combined on the same chip, you can also try disabling the GPU and using an add-in graphics adapter as a test. Return to diagnostic flowchart Have you run a full virus and malware scan with an updated security suite and have you tried setting the system back to a pre-problem Windows restore point? An author could write a much fatter book than this one simply about software issues that can cause random glitches, but our focus in this book is hardware, so we can only treat software issues with a big hammer. First, update your security suite or prepare a bootable DVD or memory stick with the latest trial software from a reputable anti-virus company and do a full scan. This can take several hours if you have a lot of stuff on your hard drive. If the security scan doesn't turn up any problems, it doesn't necessarily mean that you're in the clear, just that it's nothing obvious. Next, on Windows systems starting with XP, you can run System Restore from the System Tools, which is buried in the Programs menu under Accessories. System Restore is not a panacea, it often runs through the restoration process only to announce on reboot that the restoration attempt has failed, but I've never had it leave a system in worse shape than it started. Sometimes choosing a different Restore point will fix the problem, but other times you'll find that System Restore simply won't run to completion due to some conflicting software that's been installed. Microsoft online support is actually helpful in these cases if the operating system is still supported. If you don't have automatic Windows updates turned on, visit the Microsoft website and install their tool that will check the status of your operating system and download and install all of the patches required. This can be a lengthy process, especially if you've just reinstalled an older operating system that requires several service pack updates before it can even get to the latest patches. If Microsoft abandons support for a particular configuration, as they did with the original Windows XP years before abandoning support for the last XP version, you may need to find the service pack to update your factory installation disc before you can make use of Microsoft's update tool. Another test is to boot in Safe Mode and see if the problems go away. If so, you can be pretty confident that the issue is software. Try updating any drivers, and if the problem started after you installed a new program or a new piece of hardware, by all means remove it and see if the problem clears up. You can also try reinstalling Windows which usually won't affect your data (pictures, documents, etc), though you may have trouble with some installed programs, but Windows is pretty good about warning you if a repair or reinstallation effort is likely make things worse. Return to diagnostic flowchart If you haven't updated the BIOS in a long time (or ever), do an Internet search to see if the motherboard manufacturer or chipset maker has a new version that has been tested with your motherboard. Remember that your operating system has likely been updated several times since the original BIOS was created, not to mention new hardware coming on the market that didn't exist when your motherboard was manufactured, even though the standards indicated the new hardware would be supported. Flashing the BIOS may clear up any problems you are having, but take heed of any warnings about the flash process and ensuring that the BIOS upgrade is compatible with your motherboard lest a blown update leaves you with a brick. Return to diagnostic flowchart Have you run a software based memory test? Download a free memory testing tool from a reputable website and allow it to run as long as required to thoroughly test the installed RAM. Unlike a hard drive, which can tolerate and hide a certain number of hardware errors through dynamic reallocation, memory modules must test out as error free. Return to diagnostic flowchart Have you done everything within your means to test the power supply? Start with the crude tests you can do easily, like tapping on the power supply to see if it causes the system to reboot, and moving the computer to a different circuit in the house. If you've noticed the PC locking-up or rebooting when your old laser printer cycles on or when it gets cold and your electric heater comes into use, it could be that the circuit the power supply is plugged into is experiencing its own private brown-outs. Very few people own a power supply tester and repair shops usually do without them because most only test whether the correct voltage appears on each output, without varying the load or detecting ripple. If you are comfortable with using a DVM on live circuitry in an open case, you can measure the voltages through the top of the ATX connector if there's room. An accurate picture of AC ripple on a DC voltage really requires an oscilloscope, but some sophisticated multimeters may do a decent job if used properly. See the last text section of the power supply troubleshooting chapter for testing suggestions. Return to diagnostic flowchart Have you upgraded the graphics adapter drivers to the latest version and confirmed that it's cooling properly? The latest software drivers may be necessary to work with software that has been released since the original drivers were installed. For OEM (generic) video cards, you can usually obtain driver updates through the GPU makers website (Nvidia, AMD Radeon), Intel (chipset and integrated CPU/GPU). A video BIOS update is rarely necessary, and I wouldn't do one without an explicit suggestion from the manufacturer. Most of the video BIOS update information you'll find on the Internet is generated by and for overclockers. High performance GPUs monitor their own temperatures and report the temperature out to the configuration and control software provided by the manufacturer. GPU temperature will vary greatly in accordance with the tasks being performed, with gaming and certain engineering modeling applications causing the greatest sustained temperature spikes. Of course, if the fan on the GPU heatsink has failed, you can expect the temperature to run away and eventually cause the GPU to throttle back performance to protect itself, and failing that, overheat and sustain damage. If you have a dual graphics card system, one of the cards will be further from the power supply and auxiliary case fan than the other, and will see appreciably less air flow. Motherboard manufacturers generally have the sense to space their SLI or CrossFire compatible PCI Express slots as far apart as real estate allows, but if the cooling fan on the second card ends up a fraction of an inch from the back of the first card, you know it's begging for overheating problems. Some graphics adapters use two slot spaces at the back of the case, with one serving as the exhaust port for a doublewide fan/heatsink system. Return to diagnostic flowchart Have you tested the system in minimal bare-bones configuration, both hardware and software? Normally when we talk about bare-bones, we mean the minimal hardware configuration required to test against a particular problem. Bare-bones testing for a no-video condition when you power on doesn't require a hard drive while bare-bones testing for no-boot does. The hardware bare-bones test is easy to do, so strip the PC down to the minimum required to boot and get Windows loaded, and see if your random freezes or reboots go away. If so, you can rebuild a component at a time to find the culprit. But if the problem isn't due to hardware such as RAM, the power supply or an add-in video adapter, possibilities you were able to eliminate through bare-bones testing or swapping with known good components, you're left looking at the motherboard and CPU and guessing that one could be defective. If the CPU isn't overheating, it's more likely the motherboard, but another major possibility is software. There are two basic options for testing bare-bones software. The first is to boot in Safe Mode and try to keep yourself busy playing in Windows Paint or some other innocuous software for long enough to see if your random reboot or freeze is going to repeat. If the PC now tests OK, it either means that you have software corruption, conflicts, a malware infestation, or that the particular game you were playing or Internet site you were visiting whenever the PC froze was responsible. The second and more thorough way to test bare-bones software is to swap in a new hard drive and do a fresh operating system installation. Don't connect to the Internet, pick a single task like playing a movie DVD to keep the PC busy, and see if the problem clears up. If it does, connect to the Internet and let all of the Windows updates install, and give it a day to see if the PC remains stable. Then you can start reinstalling all of your necessary applications, and try to restrain yourself from loading the PC up immediately with software you don't use every day. Either you will eventually get everything installed and the PC will run great, which means the problem was software corruption or malware, or the problem will return, which will tell you that the most recent software installed is the problem. Return to diagnostic flowchart Does the system clock keep losing the date and time, or does the system ever enter CMOS Setup for no apparent reason? Some ancient desktop boards may even give a "Low Battery" warning at boot. Some motherboards have a replaceable battery likely to be a large watch type battery, though universal replacements for any given voltage are available. The battery really shouldn't fail during the usable life of the PC, so if it does, the problem may turn out to be that something is causing it to drain too quickly. If the forgotten CMOS settings are stored in an EEPROM, then the EEPROM replacing, a new battery won't help. Return to diagnostic flowchart Does the PC go to sleep from your choice on the Shut Down menu, from the "sleep" key on an enhanced keyboard, or from power saver settings, and then fail to wake up when you move the mouse, hit a key, or press the power switch, depending on whether it's sleep or stand-by? You may see a similar issue with a screensaver not giving up the desktop despite keyboard activity. The issue is usually conflicting software drivers or the need to patch a driver. Start with keyboard, mouse and motherboard drivers, and if you upgraded any hardware or installed a new program recently, try undoing the change for the sake of testing. Return to diagnostic flowchart Does the PC hang up when you are installing a new driver? This isn't the same as a true freeze, as you can usually access the task list (Ctrl-Alt-Del) and close the driver installation window. But the problem remains if it's a required driver and you are unable to install it, especially if it's the initial motherboard or graphics adapter drivers as opposed to an update. The first step is to just wait, it may take five minutes or more for the software to complete its hardware detection and assessment process. Next, make sure that the driver you have downloaded or received on DVD actually matches the hardware you have installed. Manufacturer websites often do a poor job of guiding you to the correct driver or update for your installed hardware, especially if the components are generic and you are relying on a driver from the chipset maker rather than the component manufacturer. If the driver fails to install from the manufacturer DVD, check the DVD for scratches and if possible, see if you can obtain the same software from the manufacturer website. If you only have it on DVD, also see the DVD performance flowchart for possible issues. If the driver software is downloaded from the Internet, try downloading it again. Corrupted downloads are more common than people think, the error checking depends on the download type and it's not perfect in any case. It's also possible that the manufacturer has recently posted a driver that hasn't been extensively tested for compatibility, so see if they allow you to download a previous version. Return to diagnostic flowchart Did PC performance degrade noticeably in a short span of time in recent days or weeks? The answer here is usually malware, though it can also be something as simple as the hard drive filling up and the operating system running out of room for swapping memory to disk. Right click on your C: drive symbol in Windows (My Computer and other locations) and click properties to ensure that free space is at least 10% of the drive. I recommend against allowing compression to save space. Take the time to delete files you are no longer using or archive them on DVD. Rather than repeating an entire page of text here, see the suggested actions for answering "No" to "Ran scan, System Restore?" a couple pages back. Return to diagnostic flowchart Are you seeing high CPU temperatures reported in CMOS Setup or through a monitoring utility after boot, without suffering from lock-ups or reboots? CPU's didn't always offer DTS (Digital Thermal Sensors or Digital Temperature Signaling depending on the manufacturer), and motherboards included a thermocouple to read the CPU temperature by contact. Before you get worried about a high CPU temperature, make sure that it is generated by DTS and not a thermocouple that can't actually measure the temperature inside the processor die. Both multi-core CPUs and CPUs that integrate the GPU on the same chip may only record a single temperature. In this case, you may see widely varying readings depending on the loading of the cores and the GPU, and you may be able to prevent overheating by switching to an add-in graphics adapter or disabling one or more CPU cores through the operating system. Return to diagnostic flowchart Does your operating system fail to individually register all of the cores of your multi-core CPU? Rather than assuming hardware failure, which is very rare, start by making sure that multi-core operation hasn't been disabled. A quick Internet search will get you instructions for enabling or disabling cores for your particular operating system. If the cores are enabled, you may need to update both the BIOS and the motherboard drivers. If you've upgraded a single core processor with a multi-core, you may have to jump through some hoops to get them all working. Search the Microsoft Forums, and be warned ahead of time that some people just give up and reinstall Windows from scratch. Return to diagnostic flowchart Are you seeing lower than expected speed for your installed RAM on the boot splash screen or through memory testing software? There are a number of reasons for RAM to underperform its specifications, from being installed in combinations the motherboard doesn't recognize to improper settings in CMOS Setup. For example, some motherboards offer more DIMM sockets than the chipset supports at top speed, resulting in all memory operations defaulting to a lower speed if every socket is populated. See your motherboard manual for details. Return to diagnostic flowchart Are the DIMMs exactly matched and are the DIMMs within each bank ganged? While motherboard specifications generally allow for different DIMMs using the same speed chips and the identical technology (single sided, double sided), it's much safer to use DIMMs that are exactly matched, by speed, brand, and circuit card generation. All of the DIMMs on the motherboard should ideally be identical, and certainly the DIMMs within a single bank. And in most cases, the motherboard will detect the lowest speed DIMM and force all memory operations to default to that speed. Whether or not ganging DIMMs in banks will increase performance depends on the CPU's memory bus as not all CPUs can take advantage of 128 bit or even 192 bit (triple ganged DIMMs) wide memory. And it's important that the BIOS settings for memory and chipset timing are all correct, they may not be autodetected. Ganging was seen as a step up from the old interleaving (memory banks have been around forever), but modern multi-core CPUs probably benefit more from unganged operation where the DIMMs are divvied up between the cores. Return to diagnostic flowchart Do the PCI Express versions of the PCIe adapters you are using match the PCIe version of the motherboard slots, and are the adapters in slots that run the proper number of lanes? Five different revisions of PCI Express (1.0a, 1.1, 2.0, 2.1 and 3.0) have been used in PCs and you may still encounter adapters and motherboard slots that only support the early versions. PCIe 4.0 is scheduled but wasn't available at press time. Each major revision of the PCIe standard has doubled, or nearly doubled the transfer rate of each serial bus lane, and each slot offers from 1 to 16 lanes depending on the design. Adapters have generally been backwards compatible in terms of fitting in slots of the rated lane width (x1, x4, x8, x16), but were not always backwards compatible with their power demands. Motherboards with PCI Express 1.1 slots may have a BIOS update available to handle the requirements of 2.1 and higher adapters, but performance will be limited to PCIe 1.0. The performance increase from PCI Express 2.0 or 2.1 to 3.0 won't generally be seen by users since bandwidth was not yet an issue for PCIe 2.0 adapters when PCIe 3.0 adapters became available. The more lanes a PCIe slot requires, the more expensive it is to add that slot to a motherboard. In order to physically support multiple high performance adapters, motherboard makers can save money while offering a full complement of high capacity slots, but not provide the motherboard circuitry to run all of the lanes. So you might see a motherboard indentify slots as x4 (x1) or x16 (x8) which means they accept a x4 adapter and run it as x1, or allow you to install an x16 adapter, but only provide x8 support in the slot. Note also that some older motherboards designed for AMD CrossFire or nVidia SLI required a terminator, sometimes called a PCIe switch, to be installed in the empty x16 slot if only one x16 slot was used. A less sophisticated approach taken by some motherboard makers is to leave the end of lower performance (x1, x4, x8) slots open, so that a higher performance adapter can be installed with the end hanging out in space. Some adapters will correctly sense the number of available lanes and adjust down, but performance will be limited by the number of lanes. And make sure that the exposed contacts on the free edge of the adapter aren't in danger of contacting any motherboard components. Return to diagnostic flowchart Do you have a problem with any of the motherboard integrated I/O (sound, network, keyboard or mouse ports, modem, USB) or with PATA or SATA drive controllers? For all of the above, refer to the relevant flowchart (mouse, keyboard and USB are included under the general category of peripherals). Every one of these motherboard functions can be upgraded or replaced with inexpensive add-in adapters, providing you have open motherboard slots and can disable the motherboard device. If you've reached this point in the flowchart and you haven't found your complaint, try looking through the other performance flowcharts that may relate to the problem. Also, make sure that any of the hardware involved is approved by the motherboard manufacturer, and upgrade the motherboard BIOS if the manufacturer is still in business and an update is available. For older no-name motherboards, you may just be out of luck, and all manufacturers eventually abandon support for PC components, so you may be unable to get a perfectly good component to function properly under the latest Windows. Microsoft also abandons support for their older Windows versions as the years pass, so you can't run a PC forever unless you keep it off the Internet and don't try to upgrade components. It's always possible that the components you have installed are too new for the BIOS to know what to make of them or for the motherboard to take advantage of their capabilities. Sometimes the BIOS will identify the component correctly, like a CPU or a hard drive, but will only operate it at the highest speed or capacity that the motherboard is capable of. This problem isn't anybody's fault, it's just not possible for motherboard manufacturers to be prepared for everything that may come down the pike in the next couple years. Not to mention testing compliance with hardware that doesn't exist yet. |