Debugging is an art
Debugging is an art. It’s detective work and it’s not easy. Software Engineering, with myriad data flows and multiple people working on the same project, makes debugging complex and challenging. Modern software is a combination of many technologies – there is a frontend technology for user interaction, and there is storage which in itself could be of many kinds. We have queues and caches. All the modern softwares have probably ML models aiding it for better user experience and ease in decision making. And then we have third-party libraries and services which makes our life easier by allowing us to use ready-made solution instead of reinventing the wheel. With all these complexities, bugs are not uncommon. However, solving them needs some guidelines.
I came across a book – Debugging: The 9 Indispensable Rules for Finding Even the Most Elusive Software and Hardware Problems. I found it highly useful in order to set a framework for debugging that we can use in our day-to-day life. I will try to summarise the key points here.
Know what you are debugging
Until you know the system that you are trying to debug thoroughly, you would be ill-equipped to solve the problem. For example, if you see your latency graph at 60 seconds, probably the first thing that should hit you is default time out of the HTTP library is set as 60 seconds.
How do you know your system better?
- Start writing code in the system – The best way to understand any system to get your hand dirty and to know the system by writing code for small feature and small bugs. This will make a tremendous difference when you are trying to debug the next time.
- Read the documentation – Time and again, you will read or listen to a war story by any developer and you will come across a familiar phrase – “There was one line in the documentation which I found after going through the documentation millionth time and that solved the problem.”
- Read the code from the point where the first method of the system is called to the absolute last method of the system where the data leaves your system.
Reproduce the bug
It’s imperative that you reproduce the bug in the exact same condition as it is reported. This will help you get a feel for it. You would likely notice what kind of input triggers the bug, and does time has an impact on the bug. Does the bug depend on the user or particular geography? Does it happen at all times or is it intermittent? What is the error message displayed?
It will be helpful to debug if the following points are taken into consideration while writing code:
- Proper Instrumentation – Instrument your code with appropriate metrics and visualize it properly. We at nurture farm do that with a combination of Prometheus, Grafana, and Elastic APM. It would be very helpful if the metrics are represented well in terms of visualization and ever-ready dashboards. The dashboard should ideally give you a ready view of whatever could potentially go wrong – memory usage, CPU utilization, latency, and throughput.
- Proper logging – Logs should be added wherever you think the program can go wrong and the log could give you a direction in which you should proceed searching for the source of the bug. Logs should not be generic and they should contain important contextual information – say user id or booking id. The logs should give some more insight into the bug, for example – the first few lines of the stack trace. We need to be careful in order to not add too many logs as they can lead to further complications.
Know your tools
You can’t change a flat tyre effectively if you don’t have a jack and other appropriate tools. Similarly, debugging is easy if you have tools at your disposal. Take for example if you are debugging Out of Memory in JVM, Memory Analyzer Tool – MAT effectively helps in analyzing the heap dump and find which class is taking up memory. Jmeter is a very useful tool if you want to do a load testing of your application. If you know different functions provided by prometheus, you could make your visualisation more effective and useful.
Once you are stuck in a particular problem, chances are someone might have gone through that bug before and a simple google search might expose you to tools and tricks to solve the problem.
Binary Search is your friend
Say, your data flows through methods as depicted in the image below. Method D() is the buggy method which is where the bug is introduced in the system.
Now, one quick way to start your data flow from the start and check what is the expected and real output at method C. If the expected value matches the real output then the chances are that your flow from start to method C is working as expected. You can focus your attention on the data flow from method D to the end.
You can also utilize binary search when the bug is data-dependent.
However, it’s important to note that if the system has some existing bug that might affect your debugging process. You should fix the existing and known bugs in order to avoid being caught in unnecessary noise.
What changed since the last deployment?
The first instinct when encountering a problem is to do something. Let’s do this small experiment.
One of the senior engineers I work with asks a simple question when called for debugging any issue.
“When was the last deployment?”
He does not change a thing until he understands what is the nature of the problem, how did it start, when did it start and what was the last code change made and when. It’s important to not change anything before you know what you are dealing with because you might start another problem totally orthogonal to the issue at hand.
Another senior engineer always stresses making one change at a time and writing step by step each change made in event of any outage or debugging session so that it’s easy to revert back if necessary.
Ask for help
There is always someone out there who would have encountered this problem before or someone who has worked on that particular system for more time than you have. You should take advantage of expertise and experience but make sure that you have spent enough time solving on your own before your approach someone else. While explaining probem to someone else, be patient and help then with their queries. Try not to tell them your theories and wait for them to come up with their theory.
You can also try the rubber duck approach where you explain your code line by line to someone who does not know anything about your problem.