Observe all your applications (With TimeLine)


Editor's Review: A great talk, on a subject very different from developing application functionality. The speaker is an expert on ensuring production reliability and performance. He talks about identifying, analyzing and debugging problems in production systems using Sentry, CollectD, Graphana, and OpenTracing.

Follow to receive video recommendations   a   A

You just deployed your new version of an application or micro-service; how do you know everything works as expected?   You run your comprehensive test suite to verify functional correctness for known scenarios and performance tests before deploying, but does your application really work at the moment or is it just responding with error messages to all incoming requests? I’m part of the team that runs a huge infrastructure for the SAP HANA development. This infrastructure is vital for nearly all development &testing activities of SAP HANA. As this infrastructure is powered by multiple in-house developed applications, we immediately want to know if an application starts to fail and we need to be able to quickly diagnose what caused the failure.This talk will give you an overview how we monitor our full stack from the 2000 physical machines up to the 10,000 parallel running Python application processes, micro-service instances and batch processing jobs. It includes a review about the used tools, bad and good examples of instrumentation in Python code, the resulting visualization and an outlook on upcoming improvements.


0::  Describes his job in Quality Assurance.

2:15 Anything which could go wrong, will go wrong.

2:40 Identify something is wrong

3:30 Analyze the problem

4:15 Observability

6:05 How To Log

10:00 How to look at logs

10:53 Distributed Logs

15:20 Can you fix the problem /Sentry

18:26  Error Metrics / CollectD

20:25 Visualize Metrics / Grafana

22:25 Distributed Tracing / Open Tracing

25:05 Visualization of Distrubuted Tracing

26:15 Conclusion

27:10 Include Developers in Decisions

Editor's Note:

    I actually know two developers with this rare skill set.  One guy wrote the software for monitoring 500 servers.  If you are looking to hire someone with these skills, then please contact me.  

Editors Note:

I would like to work with open source projects to create a branch of the tree with all of the best videos for your open source project. Please send me an email if you are interested.