Python’s Beautiful Soup

In my last post, I more or less just complained about what a dumpster fire developing anything for Twitter is. Originally, though, the post was intended to be about what I was developing for Twitter. It’s nothing amazing or even complicated, but it was a fun learning experience… Twitter itself not withstanding. I made a Twitter bot tweeting different shades of pink each day. Since that’ll most likely seem nonsensical to most people, let me explain.

Background

A little over a year ago, a friend and I started a podcast. I won’t go into the backstory of why we named it what we did, but the name of the podcast revolved around the color pink due to an inside joke between my friend and I. We ended up publishing 21 episodes in the span of a year before we decided to stop it. I had moved about an hour away from where I previously lived for a new job, so recording in-person involved a decent bit of travel for one of us. Then the coronavirus pandemic really started to take off in my country, and given what a dumpster fire trying to record a podcast remotely is, my friend and I jointly decided to shutter the podcast. It was a fun experience, but nothing either of us were really wanting to keep putting time and money into. As is typically the case, we reached this conclusion just a month after the hosting for both the podcast and our website renewed. Go figure.

That being said, we had set up social media for the podcast, and that social media was now doing exactly nothing. While I didn’t want to do anything with the Facebook or Instagram accounts that my co-host ran (you couldn’t pay me to touch a Facebook property), I thought about what I could do with the lingering Twitter account. I eventually decided to make a simple bot that would tweet a different shade of the color pink each day.

Python and Beautiful Soup

As I mentioned previously, the actual code to post to Twitter ended up being extremely simple. I just used the Twython library to do the heavy lifting. What ended up being more interesting was how to create the database of colors I would use. After all, I don’t personally know that many different shades fo pink, and I wanted to include the RGB and hex color codes for each shade in the daily post. I basically needed a repository of shades of pink. After some DuckDuckGo-fu, I eventually found a page that included not just the RGB and hex color codes, but also a name. It was exactly what I needed.

The only problem was how to get the information from that page into something I could use in my script for the bot. My immediate thought was to copy and paste all of the information, but along with being error-prone over hundreds of shades, that’s also insanely tedious. In a shell script, something like xmllint would fit the bill. Since I was already working in Python, though, I decided to use Beautiful Soup. I had actually used Beautiful Soup one time before on a project years ago where I admittedly didn’t really know Python and most definitely didn’t understand what I was doing with Beautiful Soup; I just ended up copying and pasting a bunch of code from the Internet until things worked the way I wanted.

This time, I took just a little time to read the documentation for Beautiful Soup and understand what I was actually doing. The crux of my script comes down to:

divisions = soup.find_all("div", {"class": "color-inner"})

This gets me each of the div groupings for a color. With each of those groupings defined, it was then simple to get the name, hex, and RGB information I needed:

for division in divisions:
    color_name = division.find("span", {"class": "color-sub"}).get_text()
    color_hex = division.find("span", {"class": "color-id"}).get_text()
    color_rgb = division.find("span", {"class": "color-rgb"}).get_text()

Instead of trying to copy everything by hand, I had a working script to get all of the colors without needing to worry about human error. Plus, if the source website adds any new colors it’s trivial to re-run the script and get an updated list. I ended up making a map for each color and adding all of the maps to a list.

rows.append({"name": color_name, "hex": color_hex, "rgb": color_rgb})

Then I wrapped it all up at the end by exporting the list of maps to a JSON file.

with open('pinks.json', 'w') as outfile:
   json.dump(all_colors, outfile)

My other script which actually pushes the post to Twitter ingests this JSON file and then selects a random shade from it.

Twitter Still Sucks

As if further proof was needed that Twitter is garbage, though, I found myself simultaneously amused and irritated just a few days ago when I saw that a daily post had not been completed. When logging into the account for the bot, I received a notification that the account had been flagged for “suspicious activity”, and I had to walk through a verification process before the account could post again. It’s amazing to me that a platform which tolerates the most hateful and dangerous rhetoric chooses to flag a clear bot that makes a single post each day with details on a different shade of the color pink as “suspicious.” It’s just further proof that Twitter really isn’t worth anyone’s time at this point.

My latest project, though, involves pushing data to Mastodon instead of Twitter. This post will serve as the first test of it, so assuming everything works look for a post on that in the near future.

Twitter Development Impressions

I recently had an idea to turn the Twitter account for my defunct podcast into a Twitter bot posting a new shade of pink every day; it makes sense because the podcast and Twitter account were centered around the color pink. I didn’t really think anyone would care about this particular bot, but it seemed like a fun project idea to work on. It ended up being an interesting learning experience, but not at all for the reasons I actually expected.

Developer Account

Getting a Twitter developer account can be either really simple or really irritating, and there’s no discernable difference that dictates what experience you’ll get. I went to the Developer portal and registered for a developer account with my normal Twitter account. This account is clearly me IRL; my name is in it, I have a photo of myself on the account, it links to my personal website, and the post history clearly indicates that the account is a person rather than a bot. As part of registering for the account, I had to describe what I was going to create. I honestly stated that I was just going to create a bot that would tweet a shade of pink each day. Twitter asks questions such as if you plan to export data out of the service, if you plan to display information posted to Twitter outside of Twitter, etc. I answered “no” to all of these questions since I wasn’t pulling any data out of the service. I just needed to post.

After I completed the registration form, I received a message that my account was under review and I would receive a notification when that review was completed. I was a little bummed since it happened to be a long weekend for me, and I was hoping that this project would give me something to fill the time. I was hopeful maybe the review would be completed quickly. I was wrong. It took just shy of 2 weeks before the review was completed. I had almost forgotten about the whole thing since I’ve been trying to stay off of Twitter as of late, but then I got an email telling me I was allowed in. Wild.

Authentication

Handling authentication with Mastodon is a relatively simple, straightforward process if you’ve done this sort of thing with… pretty much any API. It’s a little different than the type of things I do for work since creating a client means people other than the person writing the code can be authenticating, but it still makes sense and is well documented. On the other hand, authenticating through Twitter is a complete nightmare. Outside of the specifics of authentication, everything in Twitter’s documentation seems aimed at keeping each individual page as short as possible. As a result, every page links to numerous other pages, and you end up having dozens of browser tabs open just to have some clue as to what your complete workflow looks like. For the OAuth 1.0a option, which is the option to use if you need an account other than the registered developer account to leverage an application, they recommend strongly against making the JWT yourself in favor of leveraging a library… but they don’t actually share any of the particulars about their JWT setup… or even call it a JWT. You very clearly get the impression that Twitter doesn’t want anyone actually using their API. Crazy.

After seeing the poor documentation, I abandoned my ideas of making my bot in Rust or maybe Bash, and instead just decided to use Python with the twython library. I’ve manually parsed together enough JWTs that I didn’t think I cared enough to do it again for this. Seeing the workflow for twython showcased the next bit of crazy, though, which is that the authentication workflow sends the OAuth token to the callback URL. I basically needed to set up something completely different with an HTTP listener for the OAuth token so that I could move forwad with authentication. That was entirely more than I wanted to put into this simple bot.

Note: The craziness of this makes me still think I’m not actually understanding the setup properly. I did verify, though, that the response received from where I was running the code included just the HTTP status, so the information is not coming back to the sender by default. Likewise, I couldn’t open my application up to other users without giving a callback URL, so omitting that wasn’t an option, either. Hit me up on Mastodon if I’m just dumb and there’s a reasonable way to handle this that I’m misunderstanding.

Registering Another Developer Account

At this point I realized that I really should have just registered for a developer account with the account I was planning to use with the bot. I started that process again fully expecting to wait another two weeks. I filled out all of the same information during the registration process, but this time when I completed the form I was immediately kicked over to the developer portal to start working on whatever I needed.

I’m still amazed that while the registration for my actual account that I use as a human being, I had to wait two weeks to get developer access. When I registered witht the account that had been used for my podcast, though, I was allowed access sans review. The podcast Twitter account generally had nothing posted to it other than the automatic posts from Squarespace, had the podcast logo as the profile picture, had no website linked, and quite clearly was not a person… yet that’s the account that got in right away. Okie dokie!

Wrap-Up

Since I now had the account I was planning to post from in the developer portal, I could spin up my OAuth token directly from there in my browser as opposed to having to leverage 3-legged OAuth to a callback URL. As I started working on the code, though, I quickly realized that the script for the bot to post was going to be insanely easy. Instead, the much more interesting part of the code was the script I wrote to make a little local repository of information on shades of pink that was completely unrelated to Twitter. I had originally planned to cover that in this post, but since this ended up being longer than I expected that’s what I’ll cover next time around.