Prometheus is an open-source systems monitoring and alerting toolkit. Prometheus scrapes metrics from instrumented jobs, either directly or via an intermediary push gateway for short-lived jobs. It stores all scraped samples locally and runs rules over this data to either aggregate and record new time series from existing data or generate alerts. Grafana or other API consumers can be used to visualize the collected data.
Interesting metrics would be:
- number of reqeusts
- response time of requests
- agent activity
- memory/thread/cpu/io usage
- replication times
- cache efficiancy