OpenHardware MUC statistics

Why?

One of my users, namely kikuchiyo, approached me with his idea to determine the distribution of users in his MUC room. The given hypothesis was, that the user count peaks during the weekend. As my general monitoring does not track the user count per room I was fairly interested in the data. Unfortunately ejabberd is not able to export these statistics, thus some custom steps were required.

This was a temporary statistics gathering process only. No user data was exposed, all data points are acquirable through any participants local history. The usage of server side tools only reduced the required time counting.

data aggregation take 1

The first kind of rough data collection was done through a small bash script appending the collected data points to a CSV file. Due to the script using the sleep command I wasn’t counting on it being highly time wise precise.

#!/bin/bash
DATAFILE=./openhardware.csv
MAX_POINTS=20160
START=0

until [[ $START -eq $MAX_POINTS ]]
do
  (
    PRESENCES=$(ejabberdctl get_room_occupants_number openhardware conference.magicbroccoli.de)
    UNIQUE=$(ejabberdctl get_room_occupants openhardware conference.magicbroccoli.de | cut -d / -f1 | sort | uniq | wc -l)
    DATE=$(date -I"seconds")

    echo "$PRESENCES,$UNIQUE,$DATE"
  ) >> $DATAFILE

  (( START=START+1 ))

  sleep 30
done

The resulting data set after a week was adequate to the simplicity of the approach. The average time between measurements was 30.7149 seconds with a standard deviation of 2.9799 seconds.

As assumed the time wise precision isn’t great, however not as bad as one may have guessed. The main issue with this approach was though that the data collection thread kept getting killed by the system due to various reasons. Which only added to the highly precise nature of the approach.

timestamps are a bitch and a half

Because I am known as the personified foresight I gathered the timestamps in the ISO-8601 format including the timezone. As I am probably not the only human being interested in looking at these values NumPy fortunately was able to help me with my genius. To convert the ISO 8601 timestamps to a proper format I had to parse and update the whole column.

I am a NumPy rookie, so please tell me if I did something terribly wrong.

import numpy as np
from dateutil.parser import parse as dtp

data = np.genfromtxt('openhardware_week.csv', delimiter=",", skip_header=True)

with open('openhardware_week.csv', 'r', encoding='utf-8') as f:
    # skip header line
    next(f)

    for idy, line in enumerate(f):
        ts = line.strip().split(sep=",")[-1]
        data[idy-1][2] = dtp(ts).timestamp()
>>> data[:: ,2]
array([1600236987, 1600237018, 1600237048, ..., 1600856077, 1600856107,
       1600856138], dtype=int32)

Thankfully I learned my lesson and started using proper UNIX timestamps from here on out.

data aggregation take 2

We quickly realized that a single week wasn’t enough for a reasonable assessment. Luckily I had some ideas to further optimize the data gathering process - Telegraf. Telegraf allows for custom exec scripts to run with its core metrics process.

By removing the pesky until condition and sleep command the script took shape.

#!/bin/bash
set -e

PRESENCES=$(ejabberdctl get_room_occupants_number openhardware conference.magicbroccoli.de)
USERCOUNT=$(ejabberdctl get_room_occupants openhardware conference.magicbroccoli.de | cut -d / -f1 | sort | uniq | wc -l)

echo "OpenHardware presence=${PRESENCES}i,usercount=${USERCOUNT}i"

The resulting data set turned out to be quite good the average time between measurements was 10.0351 seconds with a standard deviation of 1.1147 seconds.

plot.py

I was mostly interested in the distribution of users only using a single device, against those using multiple devices. Due to some routing issues at the start of the month it is also possible to show the time frame of repopulating the MUC room after most of its users got disconnected.

presence-unique

kikuchiyo also provided me with some beautiful graphics to look at and an invitation for everyone interested in open hardware. In addition to the distribution of presence against unique user counts on a daily basis, he also graphed the values by hour and day of the week as a heat map.

It is a magical place where hackers and makers can dive into the world of open source hardware, DIY projects, RISC V processors, mechanics and handheld devices. While many XMPP MUCs discuss freedom, privacy, politics and security from a software perspective, we aim to be the place to focus on hardware.

Does Open Hardware already exist? There are different degrees of freedom and maybe “real” OpenHardware is just an utopian dream, so we welcome all dreamers and critics. To join, just open this easy on boarding link: Join OpenHardware MUC You can also take a look into our Etherpad were we collect links. If you have a PinePhone or Librem5 (both are shipping right now) tell us about their usability and share your experience!

kikuchiyo