Salvaging Images From Squarespace

I wrote previously about moving this blog from Squarespace to WordPress. One of my cited concerns with Squarespace was being locked into that particular platform without a lot of options for moving somewhere else. So how did I move my content to WordPress? I was able to export the written content for the posts themselves from within Squarespace, fortunately. Inside of Settings > Advanced is an Import / Export option. The only export offering is WordPress, so I guess it was lucky that’s where I was moving. This gives an XML file with the written content and metadata for each post. Unfortunately, there is no option to export the images that I’ve uploaded over the past year of creating content over at Squarespace; within the XML file the images show up as <div> tags with a link to the Squarespace CDN for the actual image. For example, this is what I see where the image is for the last post I authored over on Squarespace:

<div style="padding-bottom:45.903255462646484%;" class=" image-block-wrapper has-aspect-ratio " data-animation-role="image" > <noscript><img src="https://images.squarespace-cdn.com/content/v1/5cabd40b755be258403ccb99/1595366630913-VD12AA9IHURXLT6CWQ66/ke17ZwdGBToddI8pDm48kKEtlFlv-8yggmb8KJA0a9wUqsxRUqqbr1mOJYKfIPR7LoDQ9mXPOjoJoqy81S2I8N_N4V1vUb5AoIIIbLZhVYxCRW4BPu10St3TBAUQYVKcpEap199WJ5tA07nqy9HB7RsfdGE2RUqSBzw535kCng92V_tkyiZ3FgjXcK6wugnz/html.png" alt="html.png" /></noscript><img class="thumb-image" data-src="https://images.squarespace-cdn.com/content/v1/5cabd40b755be258403ccb99/1595366630913-VD12AA9IHURXLT6CWQ66/ke17ZwdGBToddI8pDm48kKEtlFlv-8yggmb8KJA0a9wUqsxRUqqbr1mOJYKfIPR7LoDQ9mXPOjoJoqy81S2I8N_N4V1vUb5AoIIIbLZhVYxCRW4BPu10St3TBAUQYVKcpEap199WJ5tA07nqy9HB7RsfdGE2RUqSBzw535kCng92V_tkyiZ3FgjXcK6wugnz/html.png" data-image="https://images.squarespace-cdn.com/content/v1/5cabd40b755be258403ccb99/1595366630913-VD12AA9IHURXLT6CWQ66/ke17ZwdGBToddI8pDm48kKEtlFlv-8yggmb8KJA0a9wUqsxRUqqbr1mOJYKfIPR7LoDQ9mXPOjoJoqy81S2I8N_N4V1vUb5AoIIIbLZhVYxCRW4BPu10St3TBAUQYVKcpEap199WJ5tA07nqy9HB7RsfdGE2RUqSBzw535kCng92V_tkyiZ3FgjXcK6wugnz/html.png" data-image-dimensions="1013x465" data-image-focal-point="0.5,0.5" alt="html.png" data-load="false" data-image-id="5f175ce6cb20a366ea6f4d62" data-type="image" /> </div>

If you think that looks disgusting, that’s because it is. When I imported the XML file into WordPress, I saw an option to download any attachments on each post. I checked that box, but since the images are linked to the Squarespace CDN they’re considered to be HTML content rather than attachments. As a result, WordPress simply embeds the <div> in each post as a custom HTML block that doesn’t actually render the image.

Set on not going through 50 posts to manually save the images out of them, I started looking at the XML to see if I could do anything useful with the image URLs. One thing that immediately concerned me was that, when I wasn’t sure what I was going to do with the unusually.pink domain but knew that I didn’t want to keep it at Squarespace, I marked the Squarespace site as Private, meaning the only way to view the content was to log in. I assumed this meant the image content on the Squarespace CDN would be inaccessible until I made the site public again. After copying an image URL from my XML file, though, I saw that it was still publicly available. Flagging a Squarespace site as private means you can’t load the site directly, but content on Squarespace’s CDN is still accessible. That in itself seems like a problem to me and a very good reason to leave the platform, but in this one case it was working to my benefit. I realized that I could parse all of the images files out of the XML file with a script and download them programmatically.

As you can see from the XML snippet above, images on the Squarespace CDN have URLs like this:

https://images.squarespace-cdn.com/content/v1/5cabd40b755be258403ccb99/1595366630913-VD12AA9IHURXLT6CWQ66/ke17ZwdGBToddI8pDm48kKEtlFlv-8yggmb8KJA0a9wUqsxRUqqbr1mOJYKfIPR7LoDQ9mXPOjoJoqy81S2I8N_N4V1vUb5AoIIIbLZhVYxCRW4BPu10St3TBAUQYVKcpEap199WJ5tA07nqy9HB7RsfdGE2RUqSBzw535kCng92V_tkyiZ3FgjXcK6wugnz/html.png

There’s a whole lot of CDN nonsense, followed by a forward slash and the original file name at the very end. While this would be handy for getting the original file name, I didn’t want to end up with dozens of images in a folder where I had no idea what post they belonged to, and I definitely didn’t want to manually correlate the file name with the CDN link in each of the HTML blocks in WordPress.

The XML, though, also includes the title of each post, and I realized that if I was scanning each line of the XML for image tags, I could also check for the title tag and keep a variable constantly updated with that. With this idea, I would just start each file name with the post title so that they would be grouped together. Once the dots were connected, it was simple to come up with the following short PowerShell script:

It downloads all of the images referenced in the XML file in the format of:

postTitle_fileName.extension

I added some extra checks to remove unsavory characters from the file names; while it’s a valid character in most modern filesystems, for example, have you ever tried to actually programmatically do things from a Bash shell with a file that has a [ in the name? It’s not pretty.

While this saved me from having to manually download each image from Squarespace, I still had to manually go through each post in WordPress, remove the custom HTML block where each image should have been, and then upload the appropriate image. With the way I downloaded the images, though, I just started at the top of the directory and worked my way through the images alphabetically since each post was grouped together. It sucked, but it could have been a lot worse. If nothing else it made me glad that I moved forward with migrating the site now rather than waiting a few more months for the Squarespace subscription to lapse; I didn’t want to deal with this for any more posts than was strictly necessary.

Sorting IP Addresses With PowerShell

I was recently attempting to sort a list of IP addresses that I had in a text file. PowerShell is awesome at sorting, so I figured I would use it given that I had nearly 2000 of them. I initially just tried to pipe the contents of the file to the Sort-Object cmdlet:

Get-Content -Path .\ipListFixed.txt | Sort-Object

The results were, suffice to say, lackluster. I ended up with something like this:

10.1.69.1
10.1.69.12
10.1.69.148
10.1.69.149
10.1.69.17

Gross, right? Obviously 10.1.69.17 shouldn’t be after 10.1.69.148 or .149. The issue is that the whole octet isn’t being considered. It’s being sorted like they’re just strings rather than IP addresses; 7 is bigger than 4 so they’re sorted accordingly. PowerShell isn’t really comparing the numbers of 17 to 148, for example, to realize that 17 is actually smaller.

It makes sense; PowerShell can’t assume that the value is an IP address so it’s treating the value like a string instead. This means that the key is to tell PowerShell it’s an IP address so that PowerShell can adjust the way it does sorting. This is possible by casting the values as the .NET Version class. Since I happened to be pulling the IPs from a file, I did the casting within the -Property parameter that I added to the Sort-Object cmdlet:

Get-Content -Path .\ipListFixed.txt | Sort-Object -Property { [System.Version]$_ }

This yields significantly better results:

10.1.69.1
10.1.69.2
10.1.69.12
10.1.69.17
10.1.69.29

Stay pink!

PowerShell: Export Non-Standard and Multi-Value AD Attributes to CSV

I’ve recently been spending a lot of my time at work doing PowerShell scripting for a project that’s currently underway. While working on a bit of code late last week I ran into two interesting problems that I had yet to really stumble across despite doing heavy PowerShell scripting against Active Directory for nearly a decade now.

Exporting Non-standard Attributes To CSV

My first issue was that I needed to export some non-standard attributes from contact objects in AD to a .csv file. When I say “non-standard” I don’t mean that we’ve done a custom schema extension or anything like that. Instead, I simply mean that the AD PowerShell module only wants to include certain attributes when exporting AD objects to a .csv file. For example, if I run Get-ADObject -Properties * against a particular contact object and simply let the output dump to my shell, I see all of the attributes I expect including stuff like givenName and sn. However, if I run the exact same cmdlet but pipe the result to Export-Csv rather than sending it to the shell I will not have all the same attributes; ones like givenName and sn will now be missing. That’s a problem.

Storing Mutli-Value Attributes In a CSV

The other problem was that one of the attributes I needed is proxyAddresses. This is a multi-value attribute as contacts could very likely have multiple proxy addresses associated with a particular mailbox. They may have an alias, x500 addresses, etc. If you use PowerShell to pull this information from AD and dump it to the screen you’ll see the expected proxy address information. If you call GetType() on the proxyAddresses attribute you’ll see it listed as an array. That’s why if you export a contact to a .csv file and include proxyAddresses, the value for each contact in that particular column will simply read:

Microsoft.ActiveDirectory.Management.ADPropertyValueCollection

PowerShell isn’t sure what to do with a multi-value attribute as far as the .csv file goes so it stores the data type. This is also a problem!

The Solution

My solution to this was the following. Note that it’ll be easier to view the Gist for it.

# Get the objects with the required attributes and storing proxyAddresses properly in the .csv file.
Get-ADObject -Filter { objectClass -eq "contact" } -Server myserver.mydomain.net -SearchBase "OU=Contacts,DC=mydomain,DC=net" -SearchScope OneLevel -Properties name,givenName,sn,displayName,telephoneNumber,proxyAddresses,targetAddress,mail,mailNickname,company,department,l,physicalDeliveryOfficeName,postalCode,st,streetAddress,title | Select-Object name,givenName,sn,displayName,telephoneNumber,targetAddress,mail,mailNickname,company,department,l,physicalDeliveryOfficeName,postalCode,st,streetAddress,title,@{name="proxyAddresses"; expression={$_.proxyAddresses -join ";"}} | Export-Csv -Path .\sample.csv -Encoding ASCII -Append -NoClobber -NoTypeInformation

# Shows how to then use the proxyAddresses attribute in the future.
$allContacts = Import-Csv -Path .\sample.csv
foreach($singleContact in $allContacts) {
    $proxyAddresses = $singleContact.proxyAddresses.Split(";")
}

The first thing I needed was to get the properties I actually needed as opposed to just the ones PowerShell felt like including due to the object type. The solution was to:

  1. Specify all of the attributes I wanted in -Properties to ensure they would be available.

  2. Pipe the AD object to Select-Object and then re-specify each of the properties I wanted.

This works because PowerShell is selecting the attributes to include in the .csv file based on the type of object. Select-Object takes whatever comes through the pipeline, though, and makes a brand new, generic object from it. That object will include any properties from that original object that you specify, hence why I tell it the exact listing of properties that I need again.

This leads to the solution to the second problem which is what to do for proxyAddresses. PowerShell allows for the use of an expression to parse together a new property. In this instance the expression is to take the existing proxyAddresses array and -join it into a string. Each of the items in the array are separated by a semicolon. If a contact has the following proxy addresses, for example:

SMTP:pretty@unusually.pink
smtp:awesome@unusually.pink

Then the property in the .csv file will be stored as:

SMTP:pretty@unusually.pink;smtp:awesome@unusually.pink

You could pick any character for the delimiter that you want as long as it isn’t used in other data that you’re storing.

In this instance my goal was to take the exported data, re-import it, and then create contacts in a different AD forest. When creating a new object AD is expecting an array for the proxyAddresses attribute. A string with a random delimiter won’t do it. That’s where the second part of the example comes into play. If I’m looping through all of the data stored in the .csv file I can recreate the array of proxy addresses by splitting the string on my delimiter (in this case a semicolon) and assigning the result to a variable. That variable will be an array which can then be used to popular proxyAddresses when running New-ADObject