Just read a blog series (App in a box) from Peter Hack at https://www.dynatrace.com/news/blog/app-in-a-box/, https://www.dynatrace.com/news/blog/app-in-a-box-customer-perspective/ and https://www.dynatrace.com/news/blog/app-in-a-box-part-3-logs/.
Infrastructure monitoring (HW, OS, processes, network) is important, but not enough, because it can’t tell about the application health, neither the customers’ perspective of your applications.
Health checks may tell you whether your application is available or not. However, such tests should be done from a certain “distance”, as close to your users as possible. A health check may be fine checked from the next host in the same data center. But what if your host becomes unavailable from the Internet, because the network access of the datacenter is down? Then your green health check results won’t help the users. So synthetic tests are best performed from another location, another datacenter, etc. Also note that uptime is not the same as availability.
Real-User Monitoring (RUM) helps you understand the behavior of your users better. Using some monitoring tools you may follow your users’ journeys on your site to detect behavioral bottlenecks in your applications, and even the need for design optimizations. Developers may identify and fix page load problems and performance bottlenecks in the browser.
However, resource usage, customer experience, and availability only can tell whether it’s working, but can’t tell why it’s not working. This is where the logs of your application may help you. Logs can help identify problems that have occurred, and pinpoint areas where improvements of the application are necessary.
You also need some metrics of the application as well. Piler has a tool (pilerstats) to reveal some inside info about the application: