<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Monitoring on Janik von Rotz</title>
    <link>https://janikvonrotz.ch/tags/monitoring/</link>
    <description>Recent content in Monitoring on Janik von Rotz</description>
    <generator>Hugo</generator>
    <language>en</language>
    <lastBuildDate>Mon, 07 Sep 2020 11:26:05 +0200</lastBuildDate>
    <atom:link href="https://janikvonrotz.ch/tags/monitoring/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Monitor cron jobs with Prometheus, Grafana and Node exporter</title>
      <link>https://janikvonrotz.ch/2020/09/07/monitor-cron-jobs-with-prometheus-grafana-and-node-exporter/</link>
      <pubDate>Mon, 07 Sep 2020 11:26:05 +0200</pubDate>
      <guid>https://janikvonrotz.ch/2020/09/07/monitor-cron-jobs-with-prometheus-grafana-and-node-exporter/</guid>
      <description>&lt;p&gt;Nobody wants to be notified by email anymore, especially if its a failed cron job. We have advanced monitoring systems that tell if somethings wrong. In my case I use &lt;a href=&#34;https://grafana.com/&#34;&gt;Grafana&lt;/a&gt; and &lt;a href=&#34;https://prometheus.io/&#34;&gt;Prometheus&lt;/a&gt; and &lt;a href=&#34;https://github.com/prometheus/node_exporter&#34;&gt;Node exporter&lt;/a&gt; to collect host metric, visualize them and send out alerts. Usually, one would set up an exporter to monitor an new piece of software, but for cron there isn&amp;rsquo;t any exporter available. In contraire there are a lot of online service to monitor your cron jobs, such as &lt;a href=&#34;https://cronitor.io/&#34;&gt;Cronitor.io&lt;/a&gt;. But we do not want to add another dependency for simply monitoring cron jobs.&lt;/p&gt;&#xA;&lt;p&gt;In this tutorial I will elaborate on how I look after cron jobs with Prometheus and Grafana. We are going to configure the textfile collector of the Node exporter, define custom metrics and visualize them in a Grafana dashboard.&lt;/p&gt;&#xA;&lt;p&gt;I assume that there is machine running with cron jobs. This machine has multiple cron jobs and a configured Node exporter. The Node metrics are scrapped by Prometheus and visualized in Grafana.&lt;/p&gt;&#xA;&lt;p&gt;First, we are going to add a bash script to write custom Node exporter metrics. Copy the script below to the host.&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;/usr/local/bin/write-node-exporter-metric&lt;/strong&gt;&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;#!/bin/bash&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# Display Help&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Help&lt;span style=&#34;color:#f92672&#34;&gt;()&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;{&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    echo&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    echo &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;write-node-exporter-metric&amp;#34;&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    echo &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;##########################&amp;#34;&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    echo&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    echo &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Description: Write node-exporter metric.&amp;#34;&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    echo &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Syntax: write-node-exporter-metric [-n|-c|-v|help]&amp;#34;&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    echo &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Example: write-node-exporter-metric -n cron_job -c \&amp;#34;Renew certs for proxy01\&amp;#34; -v 0&amp;#34;&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    echo &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;options:&amp;#34;&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    echo &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;  -n    Reference of custom metric type. Defaults to &amp;#39;cron_job&amp;#39;&amp;#34;&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    echo &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;  -c    Code for metric value.&amp;#34;&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    echo &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;  -v    Value of metric.&amp;#34;&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    echo &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;  help  Show write-node-exporter-metric help.&amp;#34;&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    echo&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;}&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# Show help and exit&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;[[&lt;/span&gt; $1 &lt;span style=&#34;color:#f92672&#34;&gt;==&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;help&amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;]]&lt;/span&gt;; &lt;span style=&#34;color:#66d9ef&#34;&gt;then&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    Help&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    exit&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;fi&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# Process params&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;while&lt;/span&gt; getopts &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;:n :c: :v:&amp;#34;&lt;/span&gt; opt; &lt;span style=&#34;color:#66d9ef&#34;&gt;do&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#66d9ef&#34;&gt;case&lt;/span&gt; $opt in&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    n&lt;span style=&#34;color:#f92672&#34;&gt;)&lt;/span&gt; TYPE&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;$OPTARG&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    ;;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    c&lt;span style=&#34;color:#f92672&#34;&gt;)&lt;/span&gt; CODE&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;$OPTARG&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    ;;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    v&lt;span style=&#34;color:#f92672&#34;&gt;)&lt;/span&gt; VALUE&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;$OPTARG&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    ;;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#ae81ff&#34;&gt;\?&lt;/span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;)&lt;/span&gt; echo &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Invalid option -&lt;/span&gt;$OPTARG&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt; &amp;gt;&amp;amp;&lt;span style=&#34;color:#ae81ff&#34;&gt;2&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    Help&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    exit;;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#66d9ef&#34;&gt;esac&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;done&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;# Fallback to environment vars and default values&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;${&lt;/span&gt;TYPE:=&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;cron_job&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;}&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;[[&lt;/span&gt; -z &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;$CODE&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;]]&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;{&lt;/span&gt; echo &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Parameter -c|code is empty&amp;#34;&lt;/span&gt; ; exit 1; &lt;span style=&#34;color:#f92672&#34;&gt;}&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;[[&lt;/span&gt; -z &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;$VALUE&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;]]&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;{&lt;/span&gt; echo &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Parameter -v|value is empty&amp;#34;&lt;/span&gt; ; exit 1; &lt;span style=&#34;color:#f92672&#34;&gt;}&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;[&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt;$TYPE&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;==&lt;/span&gt; &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;cron_job&amp;#34;&lt;/span&gt; &lt;span style=&#34;color:#f92672&#34;&gt;]&lt;/span&gt;; &lt;span style=&#34;color:#66d9ef&#34;&gt;then&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    echo &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;Write metric node_cron_job_exit_code for code \&amp;#34;&lt;/span&gt;$CODE&lt;span style=&#34;color:#e6db74&#34;&gt;\&amp;#34; with value &lt;/span&gt;$VALUE&lt;span style=&#34;color:#e6db74&#34;&gt;.&amp;#34;&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    ID&lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;$(&lt;/span&gt;echo $CODE | shasum | cut -c1-5&lt;span style=&#34;color:#66d9ef&#34;&gt;)&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    cat &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;lt;&amp;lt; EOF &amp;gt;&amp;gt; /var/tmp/node_cron_job_exit_code.$ID.prom.$$&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;# HELP node_cron_job_exit_code Last exit code of cron job.&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;# TYPE node_cron_job_exit_code counter&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;node_cron_job_exit_code{code=&amp;#34;$CODE&amp;#34;} $VALUE&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#e6db74&#34;&gt;EOF&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    mv /var/tmp/node_cron_job_exit_code.$ID.prom.$$ /var/tmp/node_cron_job_exit_code.$ID.prom&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;fi&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;And make it executable.&lt;/p&gt;&#xA;&lt;p&gt;&lt;code&gt;chmod +x /usr/local/bin/write-node-exporter-metric&lt;/code&gt;&lt;/p&gt;&#xA;&lt;p&gt;By default this script writes metric text files to &lt;code&gt;/var/tmp&lt;/code&gt;. This folder is watched by Node exporter. Set the &lt;a href=&#34;https://github.com/prometheus/node_exporter#textfile-collector&#34;&gt;textfile collector directory flag&lt;/a&gt; &lt;code&gt;--collector.textfile.directory&lt;/code&gt; for the Node exporter. If you are using Docker to run the exporter, set the following config:&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-yml&#34; data-lang=&#34;yml&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;...&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;volumes&lt;/span&gt;:&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  - &lt;span style=&#34;color:#ae81ff&#34;&gt;/:/hostfs&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#f92672&#34;&gt;command&lt;/span&gt;: &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;--collector.textfile.directory=/hostfs/var/tmp&amp;#39;&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;...&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Let&amp;rsquo;s write a custom metric and see if it scrapped by Prometheus.&lt;/p&gt;&#xA;&lt;p&gt;Run &lt;code&gt;write-node-exporter-metric -c &#39;Renew certs for proxy01&#39; -v 0&lt;/code&gt; on the command line.&lt;/p&gt;&#xA;&lt;p&gt;Check the metrics interface of the host and search for &lt;code&gt;node_cron_job_exit_code&lt;/code&gt;.&lt;/p&gt;&#xA;&lt;p&gt;Use this curl command if you want to stick to the console:&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;curl --silent --user username:password &lt;span style=&#34;color:#ae81ff&#34;&gt;\&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  https://host.example.com/node-exporter/metrics | &lt;span style=&#34;color:#ae81ff&#34;&gt;\&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  grep node_cron_job_exit_code&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If the value has been exposed, open Grafana and explore the metrics.&lt;/p&gt;&#xA;&lt;p&gt;Create a new panel and use this query:&lt;/p&gt;&#xA;&lt;p&gt;&lt;code&gt;sum by (instance) (node_cron_job_exit_code)&lt;/code&gt;&lt;/p&gt;&#xA;&lt;p&gt;This query sums all cron jobs exit codes by instance. If the sum is not null something went wrong.&lt;/p&gt;&#xA;&lt;p&gt;Create an alert that triggers if the metric is greater than 0.&lt;/p&gt;&#xA;&lt;p&gt;When setting up cron jobs &lt;code&gt;crontab -e&lt;/code&gt; from now on you simply have to add the write metric command at end of the line. Here is an example:&lt;/p&gt;&#xA;&lt;p&gt;&lt;code&gt;45 0 * * 0 /usr/share/cerbot/renew-certs; write-node-exporter-metric -c &#39;Renew certs for proxy&#39; -v $?&lt;/code&gt;&lt;/p&gt;&#xA;&lt;p&gt;No matter if the job succeeds or fails, the exit code is written and forwarded to Prometheus.&lt;/p&gt;&#xA;&lt;p&gt;What do you think? Do you like this solution? Let me know how you monitor cron jobs.&lt;/p&gt;</description>
    </item>
    <item>
      <title>System Observability and Chaos Engineering</title>
      <link>https://janikvonrotz.ch/2018/09/16/system-observability-and-chaos-engineering/</link>
      <pubDate>Sun, 16 Sep 2018 14:58:02 +0200</pubDate>
      <guid>https://janikvonrotz.ch/2018/09/16/system-observability-and-chaos-engineering/</guid>
      <description>&lt;p&gt;At the last &lt;a href=&#34;https://jazoon.com/&#34;&gt;Jazoon tech days&lt;/a&gt; I learned about the current state of DevOps and other related topics. During the talks &amp;ldquo;chaos engineering&amp;rdquo; and &amp;ldquo;observability&amp;rdquo; were mentioned many times. I didn&amp;rsquo;t know either and became curious what it was about.&lt;/p&gt;&#xA;&lt;h1 id=&#34;observability&#34;&gt;Observability&lt;/h1&gt;&#xA;&lt;p&gt;TL:DR; Monitoring tells you whether a system is working, observability lets you ask why it isn&amp;rsquo;t working.&lt;/p&gt;&#xA;&lt;p&gt;A very insightful presentation on observability was given by &lt;a href=&#34;https://twitter.com/mipsytipsy&#34;&gt;Charity Majors&lt;/a&gt;.&lt;/p&gt;&#xA;&lt;div style=&#34;position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;&#34;&gt;&#xA;      &lt;iframe allow=&#34;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share; fullscreen&#34; loading=&#34;eager&#34; referrerpolicy=&#34;strict-origin-when-cross-origin&#34; src=&#34;https://www.youtube.com/embed/oGC8C9z7TN4?autoplay=0&amp;amp;controls=1&amp;amp;end=0&amp;amp;loop=0&amp;amp;mute=0&amp;amp;start=0&#34; style=&#34;position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;&#34; title=&#34;YouTube video&#34;&gt;&lt;/iframe&gt;&#xA;    &lt;/div&gt;&#xA;&#xA;&lt;p&gt;In her talk she covered the shortcomings of traditional metrics and logs. She pointed out that monitoring as we know it is dead (including dashboards), and that we need to focus on observability.&lt;/p&gt;&#xA;&lt;p&gt;Enhancing system observability for a software project requires a culture shift rather than implementing a new technology. Software projects become more complex and thus we cannot detect and prevent any kind of error that occurs in the future. Increased observability helps finding, debugging and fixing an issue more easily.&lt;/p&gt;&#xA;&lt;p&gt;On an abstract level the following question provides a good viewpoint on what observability is about.&lt;/p&gt;&#xA;&lt;blockquote&gt;&#xA;&lt;p&gt;Can you understand what&amp;rsquo;s happening inside your system, just by asking questions from the outside?&lt;br&gt;&#xA;&amp;ndash; Charity Majors&lt;/p&gt;&#xA;&lt;/blockquote&gt;&#xA;&lt;p&gt;If the system is a blackbox it becomes very difficult to understand its behavior. That might sound obvious, but of course it is impossible to build a complex system without unknown components.&lt;/p&gt;&#xA;&lt;p&gt;One of the drivers that makes monitoring obsolete and supports the idea of observability is the unpredictability of system failures. The next quotes puts it very well.&lt;/p&gt;&#xA;&lt;blockquote&gt;&#xA;&lt;p&gt;Distributed systems haven an infinitely long list of almost-impossible failure scenarios that make staging environments particularly worthless.&lt;br&gt;&#xA;&amp;ndash; Charity Majors&lt;/p&gt;&#xA;&lt;/blockquote&gt;&#xA;&lt;p&gt;This fact has probably lead us to the discipline of Chaos Engineering.&lt;/p&gt;&#xA;&lt;h1 id=&#34;chaos-engineering&#34;&gt;Chaos Engineering&lt;/h1&gt;&#xA;&lt;p&gt;TL:DR; Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the system’s capability to withstand turbulent conditions in production.&lt;/p&gt;&#xA;&lt;p&gt;Aaron Blohowiak worked as a Chaos Engineer at Netflix and showed how their multi-region infrastructure deals with outages.&lt;/p&gt;&#xA;&lt;div style=&#34;position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;&#34;&gt;&#xA;      &lt;iframe allow=&#34;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share; fullscreen&#34; loading=&#34;eager&#34; referrerpolicy=&#34;strict-origin-when-cross-origin&#34; src=&#34;https://www.youtube.com/embed/nkndbc_Qp7Q?autoplay=0&amp;amp;controls=1&amp;amp;end=0&amp;amp;loop=0&amp;amp;mute=0&amp;amp;start=0&#34; style=&#34;position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;&#34; title=&#34;YouTube video&#34;&gt;&lt;/iframe&gt;&#xA;    &lt;/div&gt;&#xA;&#xA;&lt;p&gt;The fundamental idea of chaos engineering is already part of the company&amp;rsquo;s culture.&lt;/p&gt;&#xA;&lt;blockquote&gt;&#xA;&lt;p&gt;In general, freedom and rapid recovery is better than trying to prevent error. We are in a creative business, not a safety-critical business.&lt;br&gt;&#xA;&amp;ndash; jobs.netflix.com/culture&lt;/p&gt;&#xA;&lt;/blockquote&gt;&#xA;&lt;p&gt;During a back stage talks Aaron talked about Chaos Engineering in practice. The goal of Chaos Engineering is the improvement of system resilience. Instead of practicing error prevention and you focus on running experiments with your system and make sure that it is able to recover from failures.&lt;/p&gt;&#xA;&lt;blockquote&gt;&#xA;&lt;p&gt;Due to the inevitability of errors, investing in isolation, recovery and remediation is superior to investing in prevention.&lt;br&gt;&#xA;&amp;ndash; Aaron Blohowiak&lt;/p&gt;&#xA;&lt;/blockquote&gt;&#xA;&lt;p&gt;Once system resilience improves, the system can withstand unpredictable errors in production.&lt;/p&gt;&#xA;&lt;p&gt;If you are interested in this topic start reading the &lt;a href=&#34;https://principlesofchaos.org/&#34;&gt;Principles of Chaos Engineering&lt;/a&gt; manifesto.&lt;/p&gt;&#xA;&lt;p&gt;Hope you learned as much as I did! 😃💡 Otherwise leave a question in the comments!&lt;/p&gt;</description>
    </item>
    <item>
      <title>Monitor and audit Active Directory user and group management</title>
      <link>https://janikvonrotz.ch/2017/10/12/monitor-and-audit-active-directory-user-and-group-management/</link>
      <pubDate>Thu, 12 Oct 2017 15:54:08 +0000</pubDate>
      <guid>https://janikvonrotz.ch/2017/10/12/monitor-and-audit-active-directory-user-and-group-management/</guid>
      <description>&lt;p&gt;Traceability is key when collaborating in the Active Directory (AD). Multiple admins changing and updating permissions and policies makes it difficult being compliant with the company&amp;rsquo;s policies. It is important to monitor mutations in the directory. By default audit policies are disabled for Domain Controllers (DC) and must be enabled explicitly. Enabling auditing for the DCs is quite easy, querying the logs for a specific event is a bit more difficult.&lt;/p&gt;&#xA;&lt;p&gt;In this guide you&amp;rsquo;ll learn how to enable auditing for a specific case and how to query the audit logs for a specific event.&lt;/p&gt;&#xA;&lt;p&gt;The  tutorial assumes that there is a:&lt;/p&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;Domain Controller.&lt;/li&gt;&#xA;&lt;li&gt;Group policies, security groups, users, &amp;hellip;&lt;/li&gt;&#xA;&lt;li&gt;Admins with DC access.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h1 id=&#34;enable-auditing&#34;&gt;Enable Auditing&lt;/h1&gt;&#xA;&lt;p&gt;Let&amp;rsquo;s start by have a look on the already enabled audit categories.&lt;/p&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;Log into the DC.&lt;/li&gt;&#xA;&lt;li&gt;Open PowerShell as admin.&lt;/li&gt;&#xA;&lt;li&gt;Run &lt;code&gt;auditpol /get /category:*&lt;/code&gt;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;p&gt;The command returns a list of audit categories and its status. These settings have been enabled by either the auditpol tool or via GPOs.&lt;/p&gt;&#xA;&lt;p&gt;In our scenario we would like to track management of users and groups, which is part of the &lt;strong&gt;Audit Account Management&lt;/strong&gt;. To enable this audit category create a new group policiy for the DC.&lt;/p&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;Open the GPO management console.&lt;/li&gt;&#xA;&lt;li&gt;Right-click the &lt;em&gt;Domain Controllers&lt;/em&gt; organizational unit.&lt;/li&gt;&#xA;&lt;li&gt;Create new GPO and open it in the GPO editor.&lt;/li&gt;&#xA;&lt;li&gt;Enable logging for subcategories: &lt;code&gt;Computer Configuration &amp;gt; Policies &amp;gt; Windows Settings &amp;gt; Security Settings &amp;gt; Local Policies &amp;gt; Security Options &amp;gt; Audit: Force audit policy subcategory settings...&lt;/code&gt;&lt;/li&gt;&#xA;&lt;li&gt;Increase the security log size to 4GB: &lt;code&gt;Computer Configuration &amp;gt; Windows Settings &amp;gt; Security Settings &amp;gt; Event Log &amp;gt; Maximum security log size: 4268032&lt;/code&gt;&lt;/li&gt;&#xA;&lt;li&gt;Then navigate to &lt;code&gt;Computer Configuration &amp;gt; Windows Settings &amp;gt; Security Settings &amp;gt; Advanced Audit Policy Configuration &amp;gt; Account Management&lt;/code&gt;&lt;/li&gt;&#xA;&lt;li&gt;Enable the required audit categories.&lt;/li&gt;&#xA;&lt;li&gt;Make a &lt;code&gt;gpupdate /force&lt;/code&gt; on the DC.&lt;/li&gt;&#xA;&lt;li&gt;Run &lt;code&gt;auditpol /get /Category:*&lt;/code&gt; and double-check whether the settings are correct.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;p&gt;If you open the security event log on the DC there should be events logging account management mutations.&lt;/p&gt;&#xA;&lt;p&gt;Source: &lt;a href=&#34;https://docs.microsoft.com/en-us/windows-server/identity/ad-ds/plan/security-best-practices/monitoring-active-directory-for-signs-of-compromise&#34;&gt;Microsoft Docs - Monitoring Active Directory for Signs of Compromise&lt;/a&gt;&lt;/p&gt;&#xA;&lt;h1 id=&#34;query-audit-logs&#34;&gt;Query Audit Logs&lt;/h1&gt;&#xA;&lt;p&gt;As mentioned querying the event log is a bit more difficult. The event log viewer offers limited features for filtering events and searching by specific keywords. In contrast with PowerShell it is possible to filter and search the event log by any property and keyword.&lt;/p&gt;&#xA;&lt;p&gt;Here is a simple example:&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-powershell&#34; data-lang=&#34;powershell&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$LogName = &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;security&amp;#34;&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$StartTime = Get-Date(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;2017-10-12 12:50&amp;#34;&lt;/span&gt;)&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$EndTime = Get-Date(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;2017-10-12 13:00&amp;#34;&lt;/span&gt;)&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$SearchKey = &lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#34;username&amp;#34;&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Get-WinEvent -FilterHashtable @{LogName=$LogName; StartTime=$StartTime;EndTime=$EndTime} | Where-Object {$_.Message &lt;span style=&#34;color:#f92672&#34;&gt;-match&lt;/span&gt; $SearchKey} | select Id, TimeCreated, Message | Format-List&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Source: &lt;a href=&#34;https://blogs.technet.microsoft.com/heyscriptingguy/2015/10/20/filtering-event-log-events-with-powershell/&#34;&gt;Hey, Scripting Guy! Blog - Filtering Event Log Events with PowerShell&lt;/a&gt;&lt;/p&gt;</description>
    </item>
    <item>
      <title>Monitoring a SharePoint Environment with Zabbix</title>
      <link>https://janikvonrotz.ch/2014/04/14/monitoring-a-sharepoint-environment-with-zabbix/</link>
      <pubDate>Mon, 14 Apr 2014 15:16:08 +0000</pubDate>
      <guid>https://janikvonrotz.ch/2014/04/14/monitoring-a-sharepoint-environment-with-zabbix/</guid>
      <description>&lt;p&gt;&lt;em&gt;This post of is part of my &lt;a href=&#34;https://janikvonrotz.ch/projects/install-sharepoint-2013-three-tier-farm/&#34;&gt;Install SharePoint 2013 Three-tier Farm&lt;/a&gt; project.&lt;/em&gt;&lt;/p&gt;&#xA;&lt;p&gt;Zabbix is an OpenSource monitoring tool.&#xA;And offers everything from simple storage availabilty reporting up to end to end web service calls.&#xA;It&amp;rsquo;s extensible with plugins and shell scripts. There isn&amp;rsquo;t a thing you can&amp;rsquo;t monitor with Zabbix.&lt;/p&gt;&#xA;&lt;p&gt;In our company we use Zabbix to monitor the SharePoint infrastructure.&#xA;One of the great features for this case is the web monitor. Zabbix can request specific websites and authenticate against the webserver.&#xA;That&amp;rsquo;s why I don&amp;rsquo;t need an IIS warm up script for our SharePoint infrastructure. Zabbix always makes a request against the SharePoint website after an IIS recycle.&lt;/p&gt;&#xA;&lt;p&gt;In case you already have a running Zabbix server you can monitor your SharePoint web application with this web rule:&lt;/p&gt;&#xA;&lt;p&gt;&lt;img src=&#34;https://janikvonrotz.ch/wp-content/uploads/2014/04/SharePoint-2013-Zabbix-web-monitoring-scenario.jpg&#34; alt=&#34;SharePoint 2013 - Zabbix web monitoring scenario&#34;&gt;&lt;/p&gt;&#xA;&lt;p&gt;&lt;img src=&#34;https://janikvonrotz.ch/wp-content/uploads/2014/04/SharePoint-2013-Zabbix-web-monitoring-step.jpg&#34; alt=&#34;SharePoint 2013 - Zabbix web monitoring step&#34;&gt;&lt;/p&gt;</description>
    </item>
  </channel>
</rss>
