Grep and Sed to Modify All Files in a Directory

I recently decided to rebrand one of my websites, complete with a different domain, title, and author (that’s me!) This is part of the beauty of using a static site generator like Hugo; I updated the domain in my configuration file, and everything else just magically changed when the site was recompiled. The caveat is that I wanted to change the author attribute in each post to a different name to match my new Mastodon profile. In this particular Hugo theme the author is specified in each post rather than in the central config.toml file so that I can have different authors in a single site. This meant I needed to mass modify all of my posts to change the author (at least for the ones that have it since I previously used a theme where the author wasn’t specified.) I knew that this should be possible with sed but couldn’t remember the exact syntax. Since Hugo stores all of the Markdown files for my posts in a single directory, though, I knew it shouldn’t be too complicated.

The syntax sed uses to do find-and-replace is very familiar to me since I use the same syntax in vim all the time. It’s great for those moments when I realize I’ve given a particular variable or function an extremely poor name and need to change every instance in a particular file. So what I really needed in this case was to recall how to use the CLI to change the value in multiple files; specifically I needed to change every file in a particular directory. It didn’t take many searches for me to confirm that this would require another utility to discover each file so that sed could then update them. Ultimately I got up and running with the following commands thanks to what I found on this site:

grep -lr -e "author = \"oldName\"" . | xargs sed -i '' -e "s/author = \"oldName\"/author = \"consoleaccess\"/g"

In this case, grep is discovering the files and then I’m using xargs to redirect the output from grep to the arguments for sed. I won’t rehash the specific parameters of each command, as the original site I linked to above does a terrific job of that. However, it’s worth mentioning that I was able to swap out just the instances of my old account name for the Markdown file’s author property by specifying the full line instead of just the account name; all I had to do was use the \ to escape things like the double-quotes that I needed to include. This way everything runs as efficiently as possible since grep is only returning the files that contain the old name in the author property, and then sed is only changing that single line of each file passed to it.

Quarantine Time Script

Are you tired of sitting at home wondering how many days you’ve been choosing to quarantine like a responsible adult? Me too! The number of times I’ve been in conversations or working on posts for blogs or social media and thought, “Wait, how long have I been at home now?” followed by wasting time doing rough calendar math in my head was enough that I finally burned some time this weekend putting together a script for it.

In the interest of full disclosure, every time this has come up before I’ve done some very simple PowerShell to actually calculate this, at least once I passed the point where I couldn’t just think of it off the top of my head:

$now = Get-Date
$then = Get-Date -Date "2020/03/11"
($now - $then).Days

Clearly this is extremely simple! I’ve found myself needing a few shell scripts, though, so I figured it would be a good opportunity to write this in Bash instead for a little exposure. The biggest key was to just figure out how the heck to:

  1. Create a date at a specific time.
  2. Subtract the dates.

Date at a specific time

This was pretty easy after a quick DuckDuckGo search. The date utility includes a -d parameter that allows me to give it a string that it’ll use as the date, just like -Date does in PowerShell.

Subtract the dates

The second piece also ended up being much more straightforward than I expected. The date utility similarly includes a few codes I can use to specify how I’d like the date to be formatted, including %s which will give the date in seconds relative to the Unix epoch time. I could get both dates in seconds, subtract the current date from when I started quarantine, and then convert the seconds to days. For those keeping score at home, there are 86,400 seconds in a day.

As an added bonus, date returns the time in seconds just like everything else in the universe that isn’t Java-based. I’m looking at you, Groovy.

Getting a starting date

The easy method would’ve been to hard-code the date when I started quarantine and leave it at that. To make it a little more extensible, though, I instead opted to pass the date as a parameter. Given that people can pass anything as a parameter, though, I put together a regex to enforce the YYYY/MM/DD format on whatever is typed. That being said, I still included an additional check after parsing the starting date regardless since it would still be possible to specify a date that matches the regex but that isn’t real (e.g. 2020/02/31.)

Code

Here’s the code in all of its janky glory.

It’s extremely simple, but it was a fun little learning experience to kill some time on a weekend when I was sitting at home… continuing to quarantine…

Hugo and the Implausibly Old Timestamp

Management of one of my blogs is handled through a variety of shell scripts. I have a script for executing hugo to rebuild the site and copy the output of the public directory to the folder where Nginx hosts it, for example. One of my scripts creates a tarball of the site and uses rsync to copy it to another server so that, if my VPS blows up, I can easily retrieve the backup.

After composing yesterday’s post, though, I ran into an error with the backup script. It basically runs the following:

tar -zvcf /home/fail/backups/failti_me.tar.gz blog

This started throwing a warning:

tar: blog/public: implausibly old time stamp 1754-08-30 16:53:05.128654848 -0550

It claimed the public directory where Hugo publishes the compiled site contents to was created in 1754… which is probably a bit older than seems plausible. My blog still published correctly; it was only tar being salty about the weird timestamp. I used stat to check the directory and confirmed that the timetsamp on when it was modified was completely borked:

stat blog/public/

That told me:

 File: blog/public/
 Size: 4096        Blocks: 8          IO Block: 4096   directory
 Device: fc01h/64513d    Inode: 512063      Links: 73
 Access: (0755/drwxr-xr-x)  Uid: ( 1000/    john)   Gid: ( 1000/    john)
 Access: 2020-07-21 18:58:28.669384769 -0500
 Modify: 1754-08-30 16:53:05.128654848 -0550
 Change: 2020-07-21 18:58:28.189382080 -0500
 Birth: -

After some searches online I found the following GitHub issue thread confirming that plenty of people other than me were seeing the same problem and that it was still present in the current version of Hugo. While I had initially been confused as to why I suddenly started seeing this now since I hadn’t upgraded Hugo or anything like that, I saw a few comments indicating that placing items in Hugo’s static directory seemed to trigger the issue; I had placed an image there from my last post, and it was the first addition to that directory in quite a while. With some additional searching and testing, I verified I could do the following to simply ignore the warning from tar:

tar -zvcf /home/fail/backups/failti_me.tar.gz blog --warning=no-timestamp

I didn’t like the idea of having such a wonky date on my filesystem; as a result, I started searching for how I could fix it by manually adjusting the “Modify” timestamp. touch seemed like a likely candidate, and after reviewing the man page I saw that there was a -t flag for it which would allow me to manually specify the timestamp. I basically just wanted to set it to the current time so I added the following to the my script which recompiles the site, placing it after the build and before rsyncing the contents of the public directory to the Nginx directory.

STUPID=$(date "+%y%m%d%H%M")
touch -t $STUPID /home/john/blog/public/

Sure enough, after running this the resulting tar command has no qualms. Likewise, re-running the stat command from above shows the current date and time as the modified time on the directory. I really hope this bug gets fixed soon since it seems to have been around for a hot minute, but at least I have a workaround for the time being.