Managing Kubernetes certificates with Python

I run into a small stumbling block the other evening while working on my ‘site domain manager’ project (for want of a better name). This is essentially a REST API running in a daemon service that manages the mappings of domains to websites, and uses ‘agents’ to automate the configuration via API calls to the various services involved (domain registrars, DNS servers, WAF providers, SSL certs etc.)

The problem arose because the manager class I was writing to interact with kubernetes needed to manage certificates. The kubernetes python client library has a whole bunch of useful higher-level API and model classes for listing, creating, updating the main models I needed to manage, such as the ConfigMap, DaemonSet, Service and Ingress, but because the Certificate is part of the cert-manager package, it doesn’t have the equivalent higher-level methods I needed.

In the end, we solved it by using the lower-level call_api method, as follows:

        siteid = sitename.split(".")[0]
        rawyaml = """
apiVersion: certmanager.k8s.io/v1alpha1
kind: Certificate
metadata:
  name: {{siteid}}-cert
  namespace: {{namespace}}
spec:
  acme:
    config:
    - dns01:
        provider: route53
      domains:
      - {{mainhostname}}
  commonName: {{mainhostname}}
  dnsNames:
  - {{mainhostname}}
  issuerRef:
    kind: ClusterIssuer
    name: letsencrypt-production
  secretName: {{siteid}}-tls
"""
        rawyaml = rawyaml.replace("{{siteid}}", siteid)
        rawyaml = rawyaml.replace("{{namespace}}", self.namespace)
        rawyaml = rawyaml.replace("{{mainhostname}}", mainhostname)
        model = yaml.load(rawyaml, Loader=yaml.SafeLoader)
        model['spec']['acme']['config'][0]['domains'] = aliases
        model['spec']['dnsNames'] = aliases

        path_params = {}
        auth_settings = ['BearerToken']
        header_params = {
            "Content-Type": "application/json"
        }
        query_params = []

        # Fetch latest state of resource to apply changes to
        response = self.api_client.call_api(f"/apis/certmanager.k8s.io/v1alpha1/namespaces/{self.namespace}/certificates", 'GET',
            path_params,
            query_params,
            header_params,
            auth_settings=auth_settings,
        )
        r = json.loads(self.api_client.last_response.data)
        certs = {x['metadata']['name']:x for x in r['items']}

        # If it doesn't exist, create it...
        certid = f"{siteid}-cert"
        try:
            if certid not in certs.keys():
                _logger.info(f"Creating certificate '{certid}' with {len(aliases)} hostnames...")
                response = self.api_client.call_api(f"/apis/certmanager.k8s.io/v1alpha1/namespaces/{self.namespace}/certificates", 'POST',
                    path_params,
                    query_params,
                    header_params,
                    body=model,
                    auth_settings=auth_settings,
                )
            else:
                _logger.info(f"Updating certificate '{certid}' with {len(aliases)} hostnames...")
                # Transplant metadata to allow update to work
                model['metadata'] = certs[certid]['metadata']

                # Attempt to update...
                response = self.api_client.call_api(f"/apis/certmanager.k8s.io/v1alpha1/namespaces/{self.namespace}/certificates/{certid}", 'PUT',
                    path_params,
                    query_params,
                    header_params,
                    body=model,
                    auth_settings=auth_settings,
                )
        except Exception as e:
            _logger.exception(e)
            return str(e)

You can see this in context here.

Testing SMTP creds with Docker

One of our sites stopped sending it’s mail a few days ago. Unfortunately, the SMTP plugin used does not provide any debug logs of the SMTP connection, and it’s ‘test’ tool just says that it sent the mail successfully. The logs for the SMTP service provider suggest they haven’t seen the connection. I issue a new password and reconfigure, still the same symptoms.

So, I just want to do a simple check of the new SMTP creds. I also wanted to use a simple tool to avoid having to recall (Google) the magic SMTP protocol incantations and perform them via ‘telnet’ or whatever. The ‘ssmtp’ tool sprung to mind for some reason, so I figured I’d see how simple that would be to use in this case.

I fired up a disposable docker container to perform this in.

docker run --rm -ti alpine sh

Then, installed and configured ssmtp as follows:

apk -U add ssmtp
cat >/etc/ssmtp/ssmtp.conf <<EOF
Mailhub=smtp.mailserver.com:587
Hostname=mydomain.com
AuthUser=29a0ca06-ec73-488f-883e-f8bda2225a99
AuthPass=29a0ca06-ec73-488f-883e-f8bda2225a99
UseSTARTTLS=YES
EOF

Now I can send test mail like this:

echo "Test mail" | ssmtp -v you@yourdomain.com

The output will show the SMTP transaction, and ultimately whether or not the authentication was successful.

Monitoring Windows processes from Nagios

So, something I had to do recently was to set up monitoring for a couple of specific Windows processes, so that we get notification via a Discord channel if those processes are not running on various hosts.

Typically, you’d do this with something like NSClient++ but this was proving to be too problematic and time-consuming to get the client side configured and working correctly.

So, I decided to try a different approach to keep things quick and simple. However, things turned out to be neither quick nor simple. I’ll save the rant, needless to say I generally leave managing our Windows hosts to our Windows people. I haven’t used Windows since ’98. I typically steer clear, and I’m not even sure why I took this on, or why I’m writing this up. I hope it helps someone else, but I personally hope never to have to come back here!

My assessment of the requirement suggested that all that is really needed here is an endpoint the monitoring server can contact that would return ‘OK all is well’ or ‘ERROR your process isn’t running’ as text responses. I wanted to do this with tools already available on Windows if possible, without having to download and install any third-party agents or libraries. As far as I know, that pretty much means Powershell. So, I took a deep breathe and rolled up my sleeves…

Try {
    $listener = New-Object System.Net.HttpListener
    $listener.Prefixes.Add("http://+:8123/")
    $listener.Start()
} Catch {
    Write-Host "Failed to configure or start listener: $($_.exception.message)"
    Exit 1
}

while($listener.IsListening) {
	$listenerContext = $listener.GetContext()
	$process = $listenerContext.Request.Url.LocalPath -replace "/",""

    $running = Get-Process $process -ErrorAction SilentlyContinue

    if($running) {
	    $Content = [System.Text.Encoding]::UTF8.GetBytes("OK - $($process) is running")
    }
    else {
	    $Content = [System.Text.Encoding]::UTF8.GetBytes("ERROR - $($process) is not running")
    }

	$listenerContext.Response.ContentType = "text/plain"
	$listenerContext.Response.OutputStream.Write($Content, 0, $Content.Length)
	$listenerContext.Response.Close()

	#$listener.Stop()
}

That needs to go in a file somewhere, such as C:\Scripts\ProcessCheck.ps1.

Now, for security purposes, we need a user this script can run as. I created a user in our ActiveDirectory service called ‘processcheck’ for this purpose. This user must have ‘Log on as batch job’ rights, which can be setup as follows:

  1. In the Control Panel, open Administrative Tools, then Local Security Policy.
  2. Beneath Security Settings, open Local Policies, and highlight User Rights Assignment.
  3. Locate Log on as a batch job. Open the properties and add the ‘processcheck’ user.
  4. When finished, save your changes and close the Local Security Settings window.

This script will still fail if the user running it does not have permission to listen at the URL it registers to. To work around this, run a PowerShell window ‘As Administrator’ (or as a user with sufficient perms) and run:

netsh http add urlacl url=http://+:8123/ user=processcheck

Next, we need to ensure our monitoring servers can access our endpoint, but the rest of the world cannot. This is done by adding a new rule to the Windows Firewall. The main configuration for the new rule can be seen from the ‘General’ and ‘Ports and Protocols’ tabs…

Firewall – General tab
Firewall – Protocls and Ports tab

Now, we need to ensure that this script is running in the background at all times. Typically, you’d do this by running it as a service, but as it’s Powershell, it doesn’t look like that’s simple (possible?) without using a third-party wrapper to provide an ‘.exe’ to interact with that has all the expected properties of a Windows service. Instead, we’re just going to tell Task Scheduler to run it as a recurring process. The theory here is that the script will remain running, and if it dies the scheduler will just restart it automatically within a minute or two.

To create the task, start ‘Task Scheduler’, then ‘Create Task’. The main settings that needed to be configured can be seen in the ‘General’, ‘Actions’ and ‘Settings’ tabs…

Task Scheduler – General tab
Task Scheduler – Actions tab
Task Scheduler – Settings tab

A simple browser-based check on the Windows machine itself confirms that the endpoint is working locally:

Now we should be able to check this using curl from our monitoring servers.

# curl http://host1.ournetwork.lan:8123/iexplore
ERROR - iexplore is not running

So, on to the monitoring server side of things. I will assume you’re using Nagios or a derivative. We use Icinga2, so I’ll give examples using their configuration scheme.

If you’re feeling lazy at this point, you could just configure a basic check_http ensuring the response contains ‘OK’. However, that doesn’t make for a very meaningful error message when the process is not found or an error occurs.

So, best to add a custom check_process command that will make the HTTP call to the endpoint and act on the ‘OK’ or ‘ERROR’ response, or set the service to ‘UNKNOWN’ if it gets an unexpected outcome.

#!/usr/bin/env python3

"""
    Nagios plugin to check status of a process on a remote Windows server
    via a custom URL.
"""

from optparse import OptionParser
import sys
import requests

pluginver = '0.1'

# Parse commandline options:
parser = OptionParser(usage="%prog -h <host_and_port> [ -h ]",version="%prog " + pluginver)
parser.add_option("-H", "--host",
    action="store", type="string", dest="host", help="Host and port")
parser.add_option("-p", "--process",
    action="store", type="string", dest="process", help="Process name")
(options, args) = parser.parse_args()


def main():
    # Check commandline options
    if not options.host:
        print("UNKNOWN: Missing host value (e.g. 'yourhost1.yournetwork.lan:8123').")
        sys.exit(3)
    if not options.process:
        print("UNKNOWN: Missing process name (e.g. 'iexplore').")
        sys.exit(3)

    # Fetch status message
    r = requests.get("http://{}/{}".format(options.host, options.process))
    if r.status_code != 200:
        print("UNKNOWN: Endpoint returned unexpected HTTP status " + str(r.status_code))
        print(r.text)
        sys.exit(3)

    # Parse first line of status message
    l = r.content.decode("utf-8").split('\n')[0]
    if l[0:2] != "OK":
        print(l)
        sys.exit(1)
    else:
        print(l)
        sys.exit(0)

if __name__ == '__main__':
    main()

For the purposes of this blog article, I’ll assume the script has been installed in /etc/icinga2/scripts/check_windows_processand that we are checking hosts for the presence of a running iexplore.exe process. Also, in this case, I am using Icinga2, so you may need to adapt this to your monitoring system.

I’ll start with the command definition:

object CheckCommand "windows-process" {
  import "ipv4-or-ipv6"

  #command = [ PluginDir + "/check_windows_process" ]
  command = [ "/etc/icinga2/scripts/check_windows_process" ]

  arguments = {
    "-H" = {
      value = "$host_address$"
      description = "Hostname (and port) to check (i.e. 'myhost1:8123')"
    }
    "-p" = {
      value = "$process_name$"
      description = "Name of process to check for (i.e. 'iexplore')"
    }
  }

  vars.host_address = "$address$:8123"
  vars.process_name = false
}

Then, a template that can be used by the service definitions:

template Service "application-scheduler" {
    check_command = "windows-process"
    vars.process_name = "MyScheduler2"
    vars.disable_sms_alerts = true
}

You’ll have to fill in the rest according to your specific scenario and needs, but otherwise I hope you find this article useful.

Nginx Ingress access logs to ElasticSearch via syslog

There are several ways to extract the logs from a Kubernetes Nginx Ingress deployment into an ElasticSearch instance. One way I found was to use Elastic Filebeats, but I couldn’t find any really good examples of how to apply that to our cluster, and I felt it would clutter up the proxy servers with more containers they may not necessarily need.

Instead, I chose to use nginx’s syslog facility, which is a little more lightweight, and serves our purposes for now.

Essentially, in this post I will configure syslog access and error logging on the K8S nginx ingress deployment, and have that send access logs via syslog UDP packets to a NodeRED flow which passes them through to ElasticSearch.

I’ve chosen to use a NodeRED to translate the syslog packets for consumption by ElasticSearch as it allows me a lot more flexibility, but you could probably also use the syslog input plugin for Logstash if it’s more suitable for you.

This is just a high-level overview to keep it brief. I do not try to cover the setup or configuration of the components used (i.e. the K8S manifests etc.)

Pre-requisites

We’ll assume that you have ElasticSearch running somewhere, and that it is accessible via the hostname ‘elasticsearch’, and we can post entries to it via HTTP port 9200. We’ll also assume you have configured Kibana, for querying the ElasticSearch indexes.

We’ll assume that you have a NodeRED instance or container running in close proximity to the ElasticSearch instance or container, and that it is configured to listen to UDP port 1514, using the hostname ‘nodered’. In my scenario, it’s another deployment in the same K8S namespace, on the same host node. For the real setup I prefer to use internal IP addresses rather than a hostname, as if the hostname lookup fails nginx ingress can fail to start up, which is not a good thing to happen to production clusters when you least expect it.

We’ll assume you have a Kubernetes cluster running Nginx Ingress, and that it refers to a ConfigMap called ‘nginx-configuration’.

Nginx to NodeRED

First, update the ConfigMap to direct the access logs via syslog…

apiVersion: v1
kind: ConfigMap
metadata:
  name: nginx-configuration
  namespace: ingress-nginx
data:
  enable-syslog: "true"
  log-format-escape-json: "true"
  log-format-upstream: '{ "time": "$time_iso8601", "remote_addr": "$proxy_protocol_addr","x-forward-for":
    "$proxy_add_x_forwarded_for", "request_id": "$req_id", "remote_user":"$remote_user",
    "bytes_sent": $bytes_sent, "request_time": $request_time, "status":$status, "vhost":
    "$host", "request_proto": "$server_protocol", "path": "$uri","request_query":
    "$args", "request_length": $request_length, "duration": $request_time,"method":
    "$request_method", "http_referrer": "$http_referer", "http_user_agent":"$http_user_agent"
    }'
  syslog-host: nodered
  syslog-port: "1514"

To check that changes have been applied, I tend to get a shell on one containers and grep log_format /etc/nginx/nginx.conf. 

If that shows the JSON log format, the sites are still working, and there are no significant errors/warnings in the container logs for the nginx containers, then we can move on.

NodeRED to ElasticSearch

Now, on the NodeRED instance, you need to import a new flow…

https://gist.github.com/rossigee/6510ca9226c31bb021d40dcf72855101

This will look a bit like this…

If you enable the toggle on the poorly-named ‘msg.payload’ node, you should start to see the flow of entries in the debug panel on the right.

If you do not see any activity in the debug logs, and you are sure your proxies are receiving hits, you should check that the syslog packets are being transmit correctly between the nginx proxy hosts and transported correctly to the NodeRED flow correctly. You can check for transmission errors in the nginx proxy logs and you can use tcpdump/wireshark to confirm/deny the presence of UDP port 1514 traffic between the two.

Elasticsearch and Kibana

So, if all is going well, ElasticSearch is now gathering a proxylogs index containing all the latest nginx ingress access logs. You just need to tell Kibana about that index. Go to ‘Management > Index Patterns’ in the Kibana UI, then select the proxylogsindex.

Then, in the ‘Dashboard’, you should be able to start drilling down into the logs gathered so far.

Monitoring

Additionally, the NodeRED flow above incorporates a simple health check designed to be monitored by our Prometheus servers. If the ElasticSearch doesn’t return a success response, it increments a ‘dropped entry’ counter, which it presents in Prometheus scrape format at the /web-proxy-logs/metrics endpoint. Prometheus can then raise an alert if that goes above zero for any significant amount of time. You may not need this and can remove that functionality from the flow if it’s not of use to you.

Alerting

Of course, once your live access logs are being fed into ES you can start to consider setting up alerts to notify you of undesirable conditions, such as excessive 50x status codes, excessive login attempts, page load times etc. There are lots of things you might like to be alerted about and several mechanisms for doing this. Hopefully I will have more time to look into ElasticSearch’s own Alerting¬†solution and maybe write a future blog post about that.

However, for the purposes of this post I’ll demonstrate yet another use case for NodeRED. It really is an incredibly flexible tool, and worthy of a place in any sysadmin’s toolkit. In this case, I added a simple flow to check the ES logs every two minutes and let me know via Discord whether there were any 50x status codes since the last run.

https://gist.github.com/rossigee/3ec5db7a68660474dfa96c5e04bb3696

Security

It is worth protecting the UDP port from public access. You should create a firewall rule or NetworkPolicy to ensure UDP packets from the proxy host IPs to port 1514 are accepted, and others rejected.

Simple LDAP proxy container

So, you have an LDAP server running happily on port 636 but one of your client applications doesn't seem to be happy with the SSL connection for whatever reason. You need an intermediary container to handle the SSL connection to the LDAP server on port 636, presenting it to the local application on port 389.

First, we write a Dockerfile that will describe a container that runs up an haproxy daemon.

FROM alpine:latest
RUN apk -U add haproxy
COPY haproxy.cfg /etc/haproxy/haproxy.cfg
EXPOSE 389
CMD ["/usr/sbin/haproxy", "-db", "-f", "/etc/haproxy/haproxy.cfg"]

Now, we just need to provide the haproxy configuration:

frontend main
    bind *:389
    default_backend ldapserver

backend ldapserver
    server static ldap.yourdomain.com:636 ssl verify none

Now, run it up to test it:

docker build -t ldap-proxy .
docker run -d --rm -p 389:389 ldap-proxy
ldapsearch -W -x -H ldap://localhost

Merging Confluence users

One of my clients is running Confluence. Somewhere along the line, two user accounts had been created for one user, and content had been added using both users.

So muggins here to the rescue. Unfortunately, not much help to be found Googling, so I roll up my sleeves and dig into the Confluence DB schema. Oh what fun.

The main users table appears to be the ‘user_mapping’ table, where each user has a record. The user appears to have a long hash-like ID that represents them in any other records. I chose the ID of the account I would be merging from, and attempted to find the other tables involved in the database that would link to the record I am about to delete.

mysqldump --extended-insert=0 confluence | grep 8a8181c846f172ee014700f866ee0003 | cut -d\` -f2 | uniq

Which returns…

AO_6384AB_DISCOVERED
AO_92296B_AORECENTLY_VIEWED
AO_9412A1_AOUSER
AO_B8E7F9_TALK_SETTINGS
AO_CB7416_KARMA_USER
ATTACHMENTS
BODYCONTENT
CONTENT
CONTENT_LABEL
CONTENT_PERM
FOLLOW_CONNECTIONS
LABEL
NOTIFICATIONS
OS_PROPERTYENTRY
logininfo
user_mapping

Great. That’s 15 tables I need to process, excluding the user_mapping table. By process, I mean I need to check the schema, identify the user ID field, what it’s used for in that table and write an appropriate SQL statement to repoint things to the target user.

So, as they say on Blue Peter, here’s one I prepared earlier.

#!/bin/sh

OLDID=8a8181c846f172ee014700f866ee0003
NEWID=8a8181b34817190d014855c3954c0003

cat <<EOF | mysql -vv -f confluence
UPDATE AO_6384AB_DISCOVERED SET USER_KEY = '$NEWID' WHERE USER_KEY = '$OLDID';
UPDATE AO_92296B_AORECENTLY_VIEWED SET USER_KEY = '$NEWID' WHERE USER_KEY = '$OLDID';
DELETE FROM AO_9412A1_AOUSER WHERE USERNAME = '$OLDID';
DELETE FROM AO_B8E7F9_TALK_SETTINGS WHERE KEY LIKE "%$OLDID";
DELETE FROM AO_CB7416_KARMA_USER WHERE USER_KEY = '$OLDID';
UPDATE ATTACHMENTS SET CREATOR = '$NEWID' WHERE CREATOR = '$OLDID';
UPDATE ATTACHMENTS SET LASTMODIFIER = '$NEWID' WHERE LASTMODIFIER = '$OLDID';
UPDATE BODYCONTENT SET BODY = REPLACE(BODY, '$OLDID', '$NEWID') WHERE BODY LIKE "%$OLDID%";
UPDATE CONTENT SET CREATOR = '$NEWID' WHERE CREATOR = '$OLDID';
UPDATE CONTENT SET LASTMODIFIER = '$NEWID' WHERE LASTMODIFIER = '$OLDID';
UPDATE CONTENT SET USERNAME = '$NEWID' WHERE USERNAME = '$OLDID';
UPDATE CONTENT_PERM SET CREATOR = '$NEWID' WHERE CREATOR = '$OLDID';
UPDATE CONTENT_PERM SET LASTMODIFIER = '$NEWID' WHERE LASTMODIFIER = '$OLDID';
UPDATE CONTENT_PERM SET USERNAME = '$NEWID' WHERE USERNAME = '$OLDID';
UPDATE FOLLOW_CONNECTIONS SET FOLLOWER = '$NEWID' WHERE FOLLOWER = '$OLDID';
UPDATE FOLLOW_CONNECTIONS SET FOLLOWEE = '$NEWID' WHERE FOLLOWEE = '$OLDID';
UPDATE LABEL SET OWNER = '$NEWID' WHERE OWNER = '$OLDID';
UPDATE NOTIFICATIONS SET CREATOR = '$NEWID' WHERE CREATOR = '$OLDID';
UPDATE NOTIFICATIONS SET LASTMODIFIER = '$NEWID' WHERE LASTMODIFIER = '$OLDID';
UPDATE NOTIFICATIONS SET USERNAME = '$NEWID' WHERE USERNAME = '$OLDID';
DELETE FROM OS_PROPERTYENTRY WHERE entity_name LIKE "%-$OLDID";
DELETE FROM logininfo WHERE USERNAME = '$OLDID';
EOF

This may or may not be useful to other Confluence admins. This user had only done a handful of edits, so there may be more tables involved in your case (YMMV).

And don’t forget your backups.

GStreamer pipeline for RTSP stream

A simple gstreamer pipeline to display at RTSP stream (from an Aircam)…

gst-launch -m rtspsrc location=rtsp://172.16.2.251/live/ch00_0 ! rtph264depay ! ffdec_h264 ! ffmpegcolorspace ! autovideosink

Harvest e-mail addresses from stdin

A little python script to do this:

#!/usr/bin/env python

import sys
import re

bulkemails = sys.stdin.read()

# regex = whoEver@wHerever.xxx
r = re.compile("[-a-zA-Z0-9._]+@[-a-zA-Z0-9_]+.[a-zA-Z0-9_.]+")
results = r.findall(bulkemails)

emails = ""   
for x in results:
	print str(x)


Magento API problem setting additional_attributes

For some reason, it’s damn hard to get the additional_attributes set via Magento’s v2 API. Even the example code in their API docs doesn’t cover it. After trying many permutations, I finally managed to get it working with the following snippet of code:

<?php

$soapopts = array('trace' => 1, 'exceptions' => 1, 'features' => SOAP_SINGLE_ELEMENT_ARRAYS);
$client = new SoapClient ( 'http://www.yourmagentosite.com/api/v2_soap/?wsdl', $soapopts);
$session = $client->login ( 'apiuser', 'apipassword' ); 

$productData = (object)array(
	'additional_attributes' => (object)array(
		'single_data' => array(
			(object)array(
				'key' => 'custom_image_url',
				'value' => 'http://www.yourmagentosite.com/nicepic.jpg',
			),
		),
	),
);
$result = $client->catalogProductUpdate($session, 'abjb91', $productData);
print $client->__getLastRequest();
var_dump($result);