Python’s Beautiful Soup

In my last post, I more or less just complained about what a dumpster fire developing anything for Twitter is. Originally, though, the post was intended to be about what I was developing for Twitter. It’s nothing amazing or even complicated, but it was a fun learning experience… Twitter itself not withstanding. I made a Twitter bot tweeting different shades of pink each day. Since that’ll most likely seem nonsensical to most people, let me explain.

Background

A little over a year ago, a friend and I started a podcast. I won’t go into the backstory of why we named it what we did, but the name of the podcast revolved around the color pink due to an inside joke between my friend and I. We ended up publishing 21 episodes in the span of a year before we decided to stop it. I had moved about an hour away from where I previously lived for a new job, so recording in-person involved a decent bit of travel for one of us. Then the coronavirus pandemic really started to take off in my country, and given what a dumpster fire trying to record a podcast remotely is, my friend and I jointly decided to shutter the podcast. It was a fun experience, but nothing either of us were really wanting to keep putting time and money into. As is typically the case, we reached this conclusion just a month after the hosting for both the podcast and our website renewed. Go figure.

That being said, we had set up social media for the podcast, and that social media was now doing exactly nothing. While I didn’t want to do anything with the Facebook or Instagram accounts that my co-host ran (you couldn’t pay me to touch a Facebook property), I thought about what I could do with the lingering Twitter account. I eventually decided to make a simple bot that would tweet a different shade of the color pink each day.

Python and Beautiful Soup

As I mentioned previously, the actual code to post to Twitter ended up being extremely simple. I just used the Twython library to do the heavy lifting. What ended up being more interesting was how to create the database of colors I would use. After all, I don’t personally know that many different shades fo pink, and I wanted to include the RGB and hex color codes for each shade in the daily post. I basically needed a repository of shades of pink. After some DuckDuckGo-fu, I eventually found a page that included not just the RGB and hex color codes, but also a name. It was exactly what I needed.

The only problem was how to get the information from that page into something I could use in my script for the bot. My immediate thought was to copy and paste all of the information, but along with being error-prone over hundreds of shades, that’s also insanely tedious. In a shell script, something like xmllint would fit the bill. Since I was already working in Python, though, I decided to use Beautiful Soup. I had actually used Beautiful Soup one time before on a project years ago where I admittedly didn’t really know Python and most definitely didn’t understand what I was doing with Beautiful Soup; I just ended up copying and pasting a bunch of code from the Internet until things worked the way I wanted.

This time, I took just a little time to read the documentation for Beautiful Soup and understand what I was actually doing. The crux of my script comes down to:

divisions = soup.find_all("div", {"class": "color-inner"})

This gets me each of the div groupings for a color. With each of those groupings defined, it was then simple to get the name, hex, and RGB information I needed:

for division in divisions:
    color_name = division.find("span", {"class": "color-sub"}).get_text()
    color_hex = division.find("span", {"class": "color-id"}).get_text()
    color_rgb = division.find("span", {"class": "color-rgb"}).get_text()

Instead of trying to copy everything by hand, I had a working script to get all of the colors without needing to worry about human error. Plus, if the source website adds any new colors it’s trivial to re-run the script and get an updated list. I ended up making a map for each color and adding all of the maps to a list.

rows.append({"name": color_name, "hex": color_hex, "rgb": color_rgb})

Then I wrapped it all up at the end by exporting the list of maps to a JSON file.

with open('pinks.json', 'w') as outfile:
   json.dump(all_colors, outfile)

My other script which actually pushes the post to Twitter ingests this JSON file and then selects a random shade from it.

Twitter Still Sucks

As if further proof was needed that Twitter is garbage, though, I found myself simultaneously amused and irritated just a few days ago when I saw that a daily post had not been completed. When logging into the account for the bot, I received a notification that the account had been flagged for “suspicious activity”, and I had to walk through a verification process before the account could post again. It’s amazing to me that a platform which tolerates the most hateful and dangerous rhetoric chooses to flag a clear bot that makes a single post each day with details on a different shade of the color pink as “suspicious.” It’s just further proof that Twitter really isn’t worth anyone’s time at this point.

My latest project, though, involves pushing data to Mastodon instead of Twitter. This post will serve as the first test of it, so assuming everything works look for a post on that in the near future.

Deciphering the Office 365 PSTN Usage Report

Background

Last week I received a bit of a mystifying alert from Office 365 letting me know that our pool of PSTN dial-out minutes for the month had been 80% consumed. This was mystifying due to the fact that as someone who has managed O365 for nearly a decade I didn’t realize there was a pool of minutes. PSTN (Public Switched Telephone Network) had to be related in some way or another to the capability my company has purchased from Office 365 to have a dial-in option added to our Microsoft Teams meetings. Any meeting we create has a dial-in number so that anyone who has a poor data connection can still dial a telephone and at least join the audio portion of the meeting.

While the main function of the Audioconferencing license is to provide that dial-in functionality, dial-out options also exist. For example, while in a Teams meeting if you realize you need someone else to participate you have the option to have Teams dial that person if you provide a phone number. Likewise, if you’re struggling in a low-coverage area you can tap a button in the Teams mobile app to have the service call you.

Sure enough (say what you want about Microsoft, but their documentation for Office 365 tends to be extremely good), I found the official confirmation. It just seems like a bit of a forgotten caveat considering you have to go to the legacy Skype portal to see it; the numbers are completely absent from the Teams portal:

You can monitor the usage against your dial-out minute pool in the “legacy” Skype for Business Admin Center. In the Microsoft Teams Admin Center, navigate to Legacy portal > Reports > PSTN Minute Pools. The Zone A dial-out minute pool will be labeled in the report as “Outbound Calls to Zone A Countries.”

Digging In

When I checked the total number of minutes we had in our pool, I did some quick math to confirm that we were allotted 60 dial-out minutes per Audioconferencing license that was assigned; purchased licenses that were not yet assigned to a user didn’t factor into the equation.

I was still a bit stumped as to who would be eating through our minutes, though, considering I figured dial-out to be a pretty niche feature. Luckily, the reporting section of the legacy Skype portal gives a report of how the minutes are used. There’s even an option to export it for a particular time range. I exported the data on the month-to-date in my tenant, which saved a .csv file. Opening it up showed me that there was still some work to do. The report listed dates for each call, the UPN (userPrincipalName) of the user, a source number, a destination number, and the duration of the call… in seconds. It also included both the dial-out information I wanted and the dial-in information that I don’t care about since that is unlimited. Ick.

I first fired up my text editor and put together a Python script. It weeded out any of the dial-in entries. For all of the dial-out entries, it added the UPN of the user to a list of dictionaries with a count of the seconds for their call. For each entry in the report, it was checked against my list of dictionaries, one for each user. If the user already had a dictionary in the list, the duration of that call was added to the user’s existing total. If the user wasn’t in the list yet, the script would append a new dictionary to the list for that user with the duration of the current call serving as their starting point. So the key for each dictionary was the UPN and the value was the number of seconds. Each dictionary in the list only needed one key-value pair.

I’m not sharing this initial version of the script because while the code worked perfectly, the logic was flawed. The report actually showed that Craft Brew Geek was using significantly more minutes than anyone else. While I was aware of the fact that he was frequently using his phone for Teams meetings while everyone is under quarantine due to his fixed-line provider having issues, I thought he was doing that via LTE rather than PSTN. He described his method of joining meetings to me, and he was literally tapping the “Join” button from the Teams app on his phone. To verify this was using LTE data, I even noted the LTE data consumed by the Teams app on my phone for the month, did a 10 minute Teams audio-only call with Brandi by joining the same way Craft Brew Geek was, and then checking the data usage again. Sure enough, it went up by 10 MB; this process wasn’t touching PSTN at all. I made doubly sure by assigning a policy to just my O365 account in Teams to block dial-out. I verified that tapping the “Join” button on my phone still worked fine while opting for the explicit “Dial me” option ended in an error.

The report also struck me as odd because it only listed 2 days for Craft Brew Geek, despite the fact that I know he’s been joining Teams calls the same way for 2 months now while we all work from home. I went back to the data this time paying attention to the source and destination numbers listed in the report. I noticed for all of Craft Brew Geek’s entries, the number listed as the source number was the number from Microsoft that we selected as the “default” number for our tenant due to it having the closest proximity to the majority of our employees. The destination number was the same every time as well, but it wasn’t Craft Brew Geek’s number. It was the number for a different employee in our company.

This is when I finally realized that the UPN listed in the report is not necessarily the UPN of the user who consumed the minutes; that wouldn’t even necessarily make sense considering you could have a meeting dial-out to an external participant (e.g. if you were using Teams to conduct an interview) or you may not have any telephone information stored for your users in Office 365. Instead, the UPN listed in the report is who scheduled the meeting.

Realizing that my logic was flawed due to the ambiguity of the report, I modified my script and replaced the UPN as the key in each dictionary with the destination telephone number. What I ended up with is this Python script. It’s quick and dirty since I wasn’t trying anything fancy; I just wanted a quick, sorted listing of our dial-out usage.

import csv


usage_tracker = {}

report_file = "./pstn_report.csv"
with open(report_file, "r", newline="") as csvfile:
    reader = csv.DictReader(csvfile)

    for row in reader:
        if row["Call Type"] == "conf_out":
            seconds = int(row["Duration Seconds"])
            if row["Destination Number"] in usage_tracker.keys():
                usage_tracker[row["Destination Number"]] += seconds
            else:
                usage_tracker[row["Destination Number"]] = seconds

usage_tuple_list = sorted(usage_tracker.items(), key=lambda x: x[1], reverse=True)

total_minutes = 0
for element in usage_tuple_list:
    current_minutes = round(element[1] / 60)
    total_minutes += current_minutes
    print(element[0] + ": " + str(current_minutes))

print("============================")
print(total_minutes)

At this point, all I needed to do was figure out who the telephone numbers my script spit out belonged to. Our company maintains a listing of numbers for each employee, so it was simple for me to search it for each entry. This showed us that one employee who was using dial-out heavily for himself as a matter of convenience during the quarantine was using basically all of our minutes.

Aftermath

The main takeaways from this incident are:

  1. Be mindful of the PSTN dial-out pool in Office 365 if you’re doing audioconferencing. I hadn’t thought of it because the O365 instance I managed previously was absolutely massive; this means we had significantly more minutes in the pool for a feature most people never use since the size of the pool is based on license assignment. Keep tabs on it in a smaller organization, especially when everyone is suddenly working remotely.
  2. The report from the legacy Skype admin center is pretty confusing, and it’s easy to mistake who is consuming dial-out minutes. Keep tabs on the destination number(s) in the report to find out who is actually doing it.

That’s it! Stay pink (from a socially acceptable distance, of course.)