Making a discord bot more production ready
3 years since I posted. I wonder if I (still) have any human readers. Many things have changed in 3 years and one of them is a renewed focus on software. I wrote software for decades before following a path that led me to machining for a while. More on that later perhaps.
Even after I stopped working in software, I continued to write software for my own pleasure, including writing a bot for a friend’s discord server. That bot has been running for about a year and I have been adding various features to it as time went on. The bot helps people to moderate their server and helps them to visualize the current traffic and act on it immediately.
I would like to make it more widely available. To do that, I needed to feel more confident in my ability to monitor the bot’s health and to deploy changes quickly.
Metrics
I landed on Grafana for metrics. I am using the open source prometheus client to make metrics available on a web page and on the same host I run alloy which scrapes the web page and then pushes the metrics into Grafana. I get the sense that the observability world leans more towards a push model these days for metrics but this is a pattern I am familiar with from my time at Google.
Really the way that the prometheus client code works is completely familiar to me from my Google days. And the way Grafana does queries is pretty familiar too, although in the earliest setup times I had to lean on the AI that Grafana provided to get alloy configured and to get a dashboard set up.
Deployment
I also want to be able to deploy new versions of the bot quickly and
confidently. I have cobbled together a github action that runs tests
and then leverages
PSR to
create a tag and push release artifacts back into github. Getting
this working correctly for my case was a bit of a challenge as I am
using uv and PSR and uv don’t get along that well at this point
with the main issue being that they each have separate notions about
where version information should live. And also PSR by default is
slow to setup. While I understand why they want the repeatability of
a docker image, I didn’t want to wait for that during every release
cycle. But it turns out that you can list PSR as a dev dependency
and then just run it with uv so that was nice.
To actually do the deploy, I am using pyinfra. The script I have written grabs the tagged artifact and downloads it to the production host, then it installs it to a directory based on the tag, notifies the affected discord server about the upcoming release, sends an annotation to Grafana, pushes configuration to the production host to use the new tag, restarts the bot and then monitors the bot for health issues.
The annotation shows up as a red vertical line in my Grafana dashboard with the deployed tag. This helps me to reason about any changes I see. The monitoring of the bot for health issues is pretty basic right now, but I am happy to have something in place that I can improve later.
Currently if I want to go back to an older version of the code, I need to change the configuration to point to an older tag, run the pyinfra code to push the configuration, and then do a restart manually. I could imagine automating this too, but I haven’t actually wanted to go back to an older version of the code yet, so maybe it isn’t worth investing time in.
I have plans that would make this bot more resilient, but I also should spend some time focusing on the Web UI first. There will be a web server that can be used to configure the bot and I haven’t started on that at all yet. I probably have a tendency to spend more time than I should on backend things while neglecting the user-facing portions of a project.
If you have a discord server and are interested in giving this bot a try, please let me know!
You can leave me comments here