The Scientific Method
This page discusses a general troubleshooting technique that can be applied to any troubleshooting situation. This reduces all troubleshooting to a set of common steps that can be adapted to suit your needs. It is not specific to any technology so no specific tools can be covered here. The idea is to learn principles of troubleshooting. You can learn about how to use specific troubleshooting tools elsewhere on this site.
To get the most out of this tutorial, it is highly recommended that you either know, or learn the OSI Model. Learn to combine the Scientific Method with the OSI Model and network troubleshooting will be far easier.
The Scientific Method is an investigative process that uses logic to test theories through observation and methodical experimentation. It is the basis of how mankind derives knowledge from the natural world around him. The Scientific Method has been around since mankind first started investigating why and how and shows up as early as 3,000 years ago in India's and Egypt's historical records.
How does the Scientific Method apply to Information Technologies and specifically to troubleshooting? If you want to solve a technical problem, you need a logical and systematic procedure that can be used to sift through the available information, discard what is irrelevant, discover other useful facts and make logical conclusions in order to arrive at the source of the problem. In most cases, you will use the Scientific Method not once but several times to arrive at the source of the problem.
THE SCIENTIFIC METHOD
- Gather Information
- State the Problem
- Form a hypothesis
- Test the hypothesis
- Observe Results & Draw conclusions
- Repeat when necessary
- Gather Information
- You must gather information about what is occurring in order to discover
what is not functioning properly. That information should come from multiple
sources:
- Observe power lights, status indicators, readouts, LCD panels etc.
- Seek commonalities and exclusivities between the symptoms. Are all applications having the same problem? Does the same problem occur after a certain number of minutes regardless of application? Is the problem occurring only when the user opens a specific file?
- Ask the users what they are experiencing, but treat this information source with extreme caution. Most users are not technical people and thus make unwarranted conclusions about what is wrong. Users also lie on occasion, especially when they think they might be held responsible for whatever is broken or inoperative.
- Ask yourself: "Is the problem a real technical failure, or is it just not doing what the user thinks it should be doing?"
- ALWAYS try to reproduce the fault or failure. Observe the actual failure as it occurs. It is often a good idea to turn on additional logging or diagnostic modes, run the command in verbose mode or use other diagnostic tools to gather information.
- Check error and log files for the system and/or for the application you are troubleshooting. Examples of error and log files include the Windows System Error Logs and UNIX/Linux SYSLOG facility.
- Check or verify the configuration of the application or the computer. You can often use a known-working version of the application to verify against the operation of a system that is failing. Are different users getting the same result, is the same problem occurring in more than one application.
- State the Problem
- This is the process of reviewing all available information and getting a clear understanding of the percieved failure or dysfunction. For example, let's say you are supporting a user who is complaining that they can't drag and drop an Outlook folder into a floppy disk. The true problem is that they want to make a copy of the contents of that folder in another location. This would NOT be a technical issue, it is a user training issue as drag and drop of Outlook folders into real directories is not supported in Outlook 2002 and earlier.
- Form A Hypothesis
- After collecting information and clearly stating exactly what the problem
is in the form of a question that can be investigated and proven correct
or incorrect.
- "_______ was working before and is not working now. At what point in time did _______ break and what broke _______?"
- "Why won't _______ install/start/run etc.?"
- Test Your Hypothesis
- Once you have stated the problem, devise a method to test your hypothesis of the problem. Your testing must assist you in eliminating a single possible cause by changing one and only one setting, value or condition at a time. If the user is having a problem with a particular PDF file, shut down all other programs and try opening the same file again. If the file will not open, reboot the computer and try opening the file. Each test should be performed after changing one and only one setting, value or condition. If you make more than one change, you will not be able to eliminate possible causes and will probably never find the root cause of the actual problem. If you do not identify the root cause, the problem will reoccur at a less opportune time.
- Observe Results & Draw Conclusions
- After each test, note whether the change you made did, or did not solve
the problem. You must note the results of your test, gather any new information
from the system, application or user and draw a conclusion as to whether the
problem is solved or whether the change you made had any affect on the problem.
Once you have drawn conclusions, you can devise new tests to eliminate other
possible causes. To quote Sir Arthur Conan Doyle's Shirlock Holmes:
..when you have eliminated the impossible, whatever remains, however improbable, must be the truth. - Repeat when Necessary
- The entire scientific method for troubleshooting process must be repeated until a solution is found. This troubleshooting method relies on identifying possible causes and eliminating each cause until the true, root cause of the problem is found. You cannot find and fix the true root cause of the problem unless you apply the scientific method to your troubleshooting.
| << TELECOM | FUNDAMENTALS >> |