Full Content RSS Feeds With Hugo

Last week I made a test post on Mastodon linking to one of my blogsthisthis via curl. I was doing this to see how to best handle the formatting for a script I was working on to periodically check my blog and link to any new posts from Mastodon. My friend tomasino who I’ve known for quite a while through SDF reached out to ask if there was an RSS feed for it. (Note that I now link to the RSS feed from the menu!) After he subscribed, he noticed that the RSS feed is only showing part of each post. It turns out that my theme, like many others, is using the default RSS template in Hugo. This only publishes a portion of each post to the RSS feed, the idea undoubtedly being that people will navigate to the site to finish reading the content, allowing whatever trackers are in place to see this. Since I don’t have any trackers on my site and don’t care about hits, this just serves to be a pain in the butt for anyone trying to use RSS; I personally hate having to move out of my own RSS reader in order to finish reading something.

With a lot of help from tomasino (who clearly knows a significantly more about RSS than I do), I started trying to modify my RSS template to include the full content for each post. Since the Terminal theme is using the default template, I started by taking the base template content and placing that in /layouts/_defaults/ as index.xml so that I had something to modify. The first thing that I did was modify the <description> tag so that it contained .Content instead of .Summary. This did cause the full content to be displayed in the RSS feed but caused all sorts of HTML encoding problems. Next I tried modifying the <description> block again so that instead of:

.Content | html

It was:

.Content | safeHTML

This was… slightly better. It fixed the HTML encoding problems, but it also caused the paragraph tags to disappear, meaning each post was a wall of text. tomasino’s thought was that I needed a CDATA block, which I also saw mentioned in the Hugo support forum. The problem I quickly ran into was that the block, which I was now adding in addition to the <description> block, needed to look something like this:

<content:encoded><![CDATA[{{ .Content | safeHTML }}]]></content:encoded>

Adding that directly to to the layout file cause the leading < get HTML encoded, thus breaking the entire thing. Back to hunting on DuckDuckGo, I found several people with the same issue. While a few people in those threads had offered some solutions for how to properly escape things, tomasino ultimately found the cleanest solution. After recompiling my site yet again, the encoding looked good, but the XML was still missing some metadata. Trying to open it in Firefox would give the following error:

XML Parsing Error: prefix not bound to a namespace

It’s worth noting, since I was missing this initially, that Firefox will not render the XML when you’re using the view-source: view. This makes complete sense, but I had overlooked it. You need to actually navigate to the file normally, e.g. https://my.site/index.xml. What was going on here was that I needed to define the namespace, which I did by just copying the same line from tomasino’s own XML file for a site of his:

<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">

After this addition and yet another recompile of the site, everything finally started to appear correctly in RSS readers. Suffice to say I would’ve preferred if I could simply toggle something in my config.toml file to switch my RSS feed from a summary to the full text, but at least it’s possible to modify this on your own if you have to.

Hugo and the Implausibly Old Timestamp

Management of one of my blogs is handled through a variety of shell scripts. I have a script for executing hugo to rebuild the site and copy the output of the public directory to the folder where Nginx hosts it, for example. One of my scripts creates a tarball of the site and uses rsync to copy it to another server so that, if my VPS blows up, I can easily retrieve the backup.

After composing yesterday’s post, though, I ran into an error with the backup script. It basically runs the following:

tar -zvcf /home/fail/backups/failti_me.tar.gz blog

This started throwing a warning:

tar: blog/public: implausibly old time stamp 1754-08-30 16:53:05.128654848 -0550

It claimed the public directory where Hugo publishes the compiled site contents to was created in 1754… which is probably a bit older than seems plausible. My blog still published correctly; it was only tar being salty about the weird timestamp. I used stat to check the directory and confirmed that the timetsamp on when it was modified was completely borked:

stat blog/public/

That told me:

 File: blog/public/
 Size: 4096        Blocks: 8          IO Block: 4096   directory
 Device: fc01h/64513d    Inode: 512063      Links: 73
 Access: (0755/drwxr-xr-x)  Uid: ( 1000/    john)   Gid: ( 1000/    john)
 Access: 2020-07-21 18:58:28.669384769 -0500
 Modify: 1754-08-30 16:53:05.128654848 -0550
 Change: 2020-07-21 18:58:28.189382080 -0500
 Birth: -

After some searches online I found the following GitHub issue thread confirming that plenty of people other than me were seeing the same problem and that it was still present in the current version of Hugo. While I had initially been confused as to why I suddenly started seeing this now since I hadn’t upgraded Hugo or anything like that, I saw a few comments indicating that placing items in Hugo’s static directory seemed to trigger the issue; I had placed an image there from my last post, and it was the first addition to that directory in quite a while. With some additional searching and testing, I verified I could do the following to simply ignore the warning from tar:

tar -zvcf /home/fail/backups/failti_me.tar.gz blog --warning=no-timestamp

I didn’t like the idea of having such a wonky date on my filesystem; as a result, I started searching for how I could fix it by manually adjusting the “Modify” timestamp. touch seemed like a likely candidate, and after reviewing the man page I saw that there was a -t flag for it which would allow me to manually specify the timestamp. I basically just wanted to set it to the current time so I added the following to the my script which recompiles the site, placing it after the build and before rsyncing the contents of the public directory to the Nginx directory.

STUPID=$(date "+%y%m%d%H%M")
touch -t $STUPID /home/john/blog/public/

Sure enough, after running this the resulting tar command has no qualms. Likewise, re-running the stat command from above shows the current date and time as the modified time on the directory. I really hope this bug gets fixed soon since it seems to have been around for a hot minute, but at least I have a workaround for the time being.

Self-Hosting A Static Website

Earlier this week a friend reached out to me regarding a website. He had just finished developing his very first iOS game and was ready to submit it to Apple for approval. One of Apple’s myriad requirements, though, is a website containing the author’s privacy policy. My friend had no website and no idea how to make one, so he asked me if I could help. It seems wild to me that someone could have the chops to make an iOS app in Objective-C or Swift but not be able to make a website, but each of us has a different skill set.

We first took some early steps gathering requirements. What did he want for the site? Literally just the privacy policy. Where did he want to host it? Wherever was the cheapest. Did he have a domain name already? Yes! This was fairly straightforward; he literally just wanted the very basics. After a bit of discussion I convinced him to write up a quick “about me” type of page so that we could have more than just the privacy policy. From there I could get to work.

Hosting

The first thing I did was have him head over to Vultr and spin up their cheapest instance. I think this is running him $5 USD per month. I had him pick Ubuntu as the server operating system given that it’s the one I’m most familiar with. My friend has some familiarity with Linux but not a lot of practical knowledge; when I asked him to shoot me some SSH credentials with sudo access he literally sent me the root account from Vultr. Ick.

Configuring The Host

Accounts

My first goal was to configure the host. I started that off by creating user accounts for each of us:

adduser username
usermod -aG sudo username

After switching users and verifying my new account worked, I disabled root’s ability to log in:

sudo passwd -l root

Ports

Next I wanted to change the default SSH port since having 22 open means a million places from across the planet are going to throw garbage traffic at your server. I did this by modifying the SSH config at /etc/ssh/sshd_config, finding the line with #Port 22, uncommenting it, and changing the port to a high number of my friend’s choice. Then I restarted SSH:

sudo systemctl restart ssh

Firewall

I wanted to enable the firewall as well, so I opened up with the new SSH port and 80 and 443 for our eventual website:

sudo ufw allow sshPortNumber/tcp
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp

Webserver

I next needed a web server; Nginx has been my go-to choice for a long time. Rather than re-hashing all of the steps, I’ll just recommend following the excellent documentation from DigitalOcean which nicely covers the Nginx configuration. That takes you to the point where you are hosting a website. Then you just need content on it.

Certificate

I’m an advocate of using HTTPS for everything, and with free certificates from Let’s Encrypt there’s no reason not to. Given that we have shell access, using certbot is the way to go. There’s also excellent documentation on that process on Ubuntu with Nginx. I highly recommend selecting the option to redirect any HTTP traffic to HTTPS.

Website

Now for the website itself. I’m not really much of a web developer, and I dislike making anything frontend; I don’t exactly have the best design sense. So I once again opted to leverage Hugo to take care of that for me. I’ve written about the specifics of using Hugo in detail. Since we really just wanted a generic landing page with my friend’s socials and then links to the About and Privacy Policy pages, I ended up going with the Hermit theme. It has a nice, simple look. My friend’s favorite color is mint green, so the default background also works nicely with that when I changed the accent color. The theme nicely includes an exampleSite so that I can steal their config.toml file and also their “About” page to make things even easier for myself.

Backups

One of the nice things about Hugo is that, since everything is a simple text file, it’s very easy to compress your entire site and save a backup. Then if something terrible happens to your server, it’s extremely easy to get the site back up and running on a different machine. In this case, I made tarballs for both the finished, compiled site and the Hugo directory storing the configuration and Markdown.

tar -zvcf ~/temp/html_output.tar.gz /var/www/mySite.com/
tar -zvcf ~/temp/hugo_directory.tar.gz /var/www/mySite.com/

With the tarballs created, I used an SFTP client to copy them off the server for safe keeping.

Wrap Up

In total it took me about an hour and a half to get everything up and running. Having gone through this process many times for websites of my own, I’ve got a decent bit of experience with the process, but this shows it still doesn’t necessarily take a super long time to get a decent website up and running. The big benefits are:

  1. The site is cheap to run. Even the smallest instance at any VPS provider will be able to handle multiple sites with ease unless they start getting really popular, so if my friend wants to create any other sites in the future he won’t need additional hosting.
  2. Backups are stupid simple. My friend isn’t beholden to a hosting provider or trying to work within the confines of something more expensive like WordPress or Squarespace.

The downsides are present, though, so you have to be cool with them:

  1. Setup takes more technical chops than clicking through a Squarespace template editor. While the documentation for everything in this post is extremely good, if working out a terminal freaks you out then this likely isn’t for you.
  2. Content is authored in Markdown. This likely doesn’t matter for my friend at the moment since he’s not really posting anything new to the site, but it would be something to keep in mind if he decided to start a blog. In that scenario, I usually just SSH to the server and author my content in Vim. You could also author the Markdown elsewhere and copy it to the server, or use SFTP to open the Markdown file on the server from an editor on your local machine. It’s definitely not as simple as a WYIWYG editor in your browser, though.
  3. Maintenance is something that will need to be done at least periodically. The server will need to be patched. That’s easy enough to do with a simple sudo apt update && sudo apt upgrade and then reboot when necessary, but it’s just another step to keep in mind. Likewise, bouncing the server means that the website will be down, even if it’s typically only for a moment or two.

Being kind of pretentious, technical snob I personally find it easier to author my comment in Markdown on Vim instead of using a WYSIWYG editor in a GUI, but your mileage will vary based on your own prefrences.