Troubleshooting Topic - Ask the Right Questions Get the right answers

This article was last updated on: February 7, 2024 pm

In many companies, when IT, data center, and business systems fail, many people are called to the war room (a conference room where everyone is brought together to solve the problem), but there is often no clue as to whether they can fix the problem, whether they should be responsible.

作战室

“Evidence” (infrastructure monitoring data, log files, user complaints, etc.) indicates symptoms, but is not related to root cause. Only a lot of log information and high-level alarms won’t give you an answer that is really relevant to the root cause of this question.

In order to stay away from this scenario, what should be the real “evidence”? What questions should you ask?

Is it a user complaint or are all users affected?

“Just” the CEO complaining about a problem because a BI report doesn’t work on his old IE7? Or is it “just” an end user using Unicom? Understanding whether a problem occurs in a very small user base, or whether users across China are affected, is a top priority.

告警世界地图

Is there a problem with the delivery chain (e.g. CDN, third party, ISP, cloud provider, hosting service, mobile network)?

Contemporary web applications, mobile services, Internet services, O2O services and other services rely on a long chain of delivery chains. Knowing the impact of each will tell you whether you should check your data center, or if you should call the service provider.

Are critical transactions affected?

Are critical businesses such as insurance applications affected? Or is the page reporting the error no longer used? You need to monitor the most critical business performance.

Is it a problem with this app?

The application is complex. If you know that the problem is happening in the application, you then need to isolate the problem, and then let the corresponding developers and architects locate the problem more efficiently.

If the customer is slow to load, has a poor experience, and has a slow application response time, the first question should be whether it is related to bad code. You need to analyze code-level performance hotspots to find out if the cause is inefficient algorithms or a lack of code and architectural best practices.

This problem is in virtual machines, containers, middleware… Inside?

If a virtual machine (e.g. VMware, EC2…) Or your container (Docker) or your middleware or your application runtime (e.g. Tomcat) does not have the correct size, or there is resource contention with other virtual machines and containers may also cause performance problems. If you know that the performance of the virtual machine affects the application, you will know to bring in VM experts, not application developers, to solve this problem.
The same goes for containers, middleware, and application runtimes.

Is the infrastructure causing the problem?

What if it’s not the app itself, but because the app is running on under-resourced infrastructure? What if I need to run a garbage-collected CPU that is unavailable because it is overused? Then it’s time to consider splitting apps or scaling infrastructure.

Is it an application server problem?

Because of incorrect configuration or incorrect deployment, application servers can also be the cause of performance issues. The correct size of the resource pool (threads, data sources, etc.), security configuration or log parameters can all affect performance. If you find that it is an application server problem, if it is a commercial application server, you need to contact IBM, Oracle, Microsoft experts; If it is an open source application server, you need to contact your relevant middleware experts.

summary

How What Who Why

With the answers to these questions, you can eliminate war rooms, quickly locate the root cause of problems, optimize and find solutions. So instead of a 20 people war room, you only need 3 people – one development, one testing, one operations – to evaluate the performance of detailed insight, and bring in the experts needed. Perfect!


Troubleshooting Topic - Ask the Right Questions Get the right answers
https://e-whisper.com/posts/20032/
Author
east4ming
Posted on
September 27, 2021
Licensed under