Salvaging Images From Squarespace

I wrote previously about moving this blog from Squarespace to WordPress. One of my cited concerns with Squarespace was being locked into that particular platform without a lot of options for moving somewhere else. So how did I move my content to WordPress? I was able to export the written content for the posts themselves from within Squarespace, fortunately. Inside of Settings > Advanced is an Import / Export option. The only export offering is WordPress, so I guess it was lucky that’s where I was moving. This gives an XML file with the written content and metadata for each post. Unfortunately, there is no option to export the images that I’ve uploaded over the past year of creating content over at Squarespace; within the XML file the images show up as <div> tags with a link to the Squarespace CDN for the actual image. For example, this is what I see where the image is for the last post I authored over on Squarespace:

<div style="padding-bottom:45.903255462646484%;" class=" image-block-wrapper has-aspect-ratio " data-animation-role="image" > <noscript><img src="https://images.squarespace-cdn.com/content/v1/5cabd40b755be258403ccb99/1595366630913-VD12AA9IHURXLT6CWQ66/ke17ZwdGBToddI8pDm48kKEtlFlv-8yggmb8KJA0a9wUqsxRUqqbr1mOJYKfIPR7LoDQ9mXPOjoJoqy81S2I8N_N4V1vUb5AoIIIbLZhVYxCRW4BPu10St3TBAUQYVKcpEap199WJ5tA07nqy9HB7RsfdGE2RUqSBzw535kCng92V_tkyiZ3FgjXcK6wugnz/html.png" alt="html.png" /></noscript><img class="thumb-image" data-src="https://images.squarespace-cdn.com/content/v1/5cabd40b755be258403ccb99/1595366630913-VD12AA9IHURXLT6CWQ66/ke17ZwdGBToddI8pDm48kKEtlFlv-8yggmb8KJA0a9wUqsxRUqqbr1mOJYKfIPR7LoDQ9mXPOjoJoqy81S2I8N_N4V1vUb5AoIIIbLZhVYxCRW4BPu10St3TBAUQYVKcpEap199WJ5tA07nqy9HB7RsfdGE2RUqSBzw535kCng92V_tkyiZ3FgjXcK6wugnz/html.png" data-image="https://images.squarespace-cdn.com/content/v1/5cabd40b755be258403ccb99/1595366630913-VD12AA9IHURXLT6CWQ66/ke17ZwdGBToddI8pDm48kKEtlFlv-8yggmb8KJA0a9wUqsxRUqqbr1mOJYKfIPR7LoDQ9mXPOjoJoqy81S2I8N_N4V1vUb5AoIIIbLZhVYxCRW4BPu10St3TBAUQYVKcpEap199WJ5tA07nqy9HB7RsfdGE2RUqSBzw535kCng92V_tkyiZ3FgjXcK6wugnz/html.png" data-image-dimensions="1013x465" data-image-focal-point="0.5,0.5" alt="html.png" data-load="false" data-image-id="5f175ce6cb20a366ea6f4d62" data-type="image" /> </div>

If you think that looks disgusting, that’s because it is. When I imported the XML file into WordPress, I saw an option to download any attachments on each post. I checked that box, but since the images are linked to the Squarespace CDN they’re considered to be HTML content rather than attachments. As a result, WordPress simply embeds the <div> in each post as a custom HTML block that doesn’t actually render the image.

Set on not going through 50 posts to manually save the images out of them, I started looking at the XML to see if I could do anything useful with the image URLs. One thing that immediately concerned me was that, when I wasn’t sure what I was going to do with the unusually.pink domain but knew that I didn’t want to keep it at Squarespace, I marked the Squarespace site as Private, meaning the only way to view the content was to log in. I assumed this meant the image content on the Squarespace CDN would be inaccessible until I made the site public again. After copying an image URL from my XML file, though, I saw that it was still publicly available. Flagging a Squarespace site as private means you can’t load the site directly, but content on Squarespace’s CDN is still accessible. That in itself seems like a problem to me and a very good reason to leave the platform, but in this one case it was working to my benefit. I realized that I could parse all of the images files out of the XML file with a script and download them programmatically.

As you can see from the XML snippet above, images on the Squarespace CDN have URLs like this:

https://images.squarespace-cdn.com/content/v1/5cabd40b755be258403ccb99/1595366630913-VD12AA9IHURXLT6CWQ66/ke17ZwdGBToddI8pDm48kKEtlFlv-8yggmb8KJA0a9wUqsxRUqqbr1mOJYKfIPR7LoDQ9mXPOjoJoqy81S2I8N_N4V1vUb5AoIIIbLZhVYxCRW4BPu10St3TBAUQYVKcpEap199WJ5tA07nqy9HB7RsfdGE2RUqSBzw535kCng92V_tkyiZ3FgjXcK6wugnz/html.png

There’s a whole lot of CDN nonsense, followed by a forward slash and the original file name at the very end. While this would be handy for getting the original file name, I didn’t want to end up with dozens of images in a folder where I had no idea what post they belonged to, and I definitely didn’t want to manually correlate the file name with the CDN link in each of the HTML blocks in WordPress.

The XML, though, also includes the title of each post, and I realized that if I was scanning each line of the XML for image tags, I could also check for the title tag and keep a variable constantly updated with that. With this idea, I would just start each file name with the post title so that they would be grouped together. Once the dots were connected, it was simple to come up with the following short PowerShell script:

It downloads all of the images referenced in the XML file in the format of:

postTitle_fileName.extension

I added some extra checks to remove unsavory characters from the file names; while it’s a valid character in most modern filesystems, for example, have you ever tried to actually programmatically do things from a Bash shell with a file that has a [ in the name? It’s not pretty.

While this saved me from having to manually download each image from Squarespace, I still had to manually go through each post in WordPress, remove the custom HTML block where each image should have been, and then upload the appropriate image. With the way I downloaded the images, though, I just started at the top of the directory and worked my way through the images alphabetically since each post was grouped together. It sucked, but it could have been a lot worse. If nothing else it made me glad that I moved forward with migrating the site now rather than waiting a few more months for the Squarespace subscription to lapse; I didn’t want to deal with this for any more posts than was strictly necessary.

Unusually Pink Migration

So Long, Squarespace!

If anyone stumbles across this site who was previously an Unusually Pink reader, then you might notice that the site looks a bit different after a few months of hiatus. In the short, just under 2 year lifespan of the site it has now moved to its 3rd host. Originally it was hosted on a Vultr VPS that I had been hosting a few other things on, back when I originally bought the domain because I loved the name but had no idea what to do with it. Then Brandi, my former co-host, and I decided to start a podcast; it quickly became apparent that my web development skills weren’t exactly up to par with what we wanted to accomplish. As a result, we moved the site over to Squarespace.

Our podcast lived just long enough for the Squarespace hosting to renew before Brandi and I both decided that things had run their course. It was unfortunate that I had just forked over another year’s worth of money to Squarespace for hosting before reaching that decision. With that being said, you might be wondering why on Earth I’d be re-hosting the site somewhere else if I still have time left on the Squarespace subscription; more on that will come a little later on. With this being my first time using Squarespace, though, I thought I would first share some thoughts after running a site there for a year.

The Good

When I initially decided to move the site from my VPS to Squarespace, it was mainly because I knew I needed hosting somewhere, and it seemed like a good chance to mess around with something new. I had run numerous blogs on a free WordPress.com account along with compiling many of my own blogs with Hugo as I tend to discuss frequently. With us wanting to have a presence online that made us look like we knew what we were doing, though, I figured this was a worthwhile opportunity to justify spending the money on hosting with Squarespace.

Squarespace offers, hands down, the nicest management interface I’ve ever seen. Everything is very slick and inviting, without being overly cluttered and complicated. It’s simple to add new pages to your site or even branches to your site. For example, I originally migrated the blog I had been running under the Unusually Pink domain to Squarespace, but I quickly realized that the best way to handle the show notes for each podcast episode would also be basically a blog. It was trivial to literally add another blog to the site; I just had to tell Squarespace what directory I wanted to host that under and which of the two would be the “main” page of the site. The two were then independent of one another.

Squarespace doesn’t offer nearly as many themes as you’ll find with something like WordPress, but all of the Squarespace themes are highly customizable without having to wander into the realm of HTML and CSS. For example, for any theme I can change literally every color by simply using the menus presented to me. On the flip side, the WordPress theme you see right now only offered a handful of elements for color modification. Even worse, this theme offered more options than many of the others I looked at, where changing anything beyond the text color would’ve involved modifying the CSS.

Finally, Squarespace gives you an absurd amount of information about the traffic to your site, all without the need for any type of plugins. You can simply link up Google credentials to integrate with Google Analytics, for example, and see what people are searching for to reach your site, what position you’re in for the search results, the click percentage, how many impressions you get, etc. It also offers a very slick, interactive map if you want to drill down to the specifics of where your hits stem from.

The Bad

The main purpose for the previous site on Squarespace was blogging. Case in point, there were two blogs hosted on it; one for my own random posts and one for the show notes that went along with each podcast episode. Easily the single biggest nail in the Squarespace coffin is that the service is in no way designed for blogging. That might seem contradictory considering I just said that I hosted not one but two blogs on a single site there, but allow me to elaborate.

Adding a blog to Squarespace just means that when you go to edit the site, you have two different streams of posts you can choose from. You pick the blog, say you want to make a new post, and start to edit the content. This is where things immediately get murky. The editor for authoring content in Squarespace is pretty bad. It tries to break the content of each post down into blocks the way the current WordPress editor does, but it does so in an extremely clunky, unintuitive way. Simple things like handling the appearance of media you upload is often not possible, meaning that I had to resize every photo prior to uploading since I knew there would be no good options for scaling this after the fact. Likewise, trying to embed any sort of content was frequently gated behind a paywall; I couldn’t embed the player for each episode into the post with the show notes because they wanted me to pay more for that privilege. I couldn’t embed tweets but had to just link to them. That may not have been a big deal were it not for the fact that the Squarespace plan I was on was already more than double what I’m paying for hosting now.

As another blow to blogging, Squarespace doesn’t provide any real outlet for managing the posts on the site. While in the management interface, for example, going to one of the two blogs I had added would simply show me a lists of posts on the left in chronological order. If the post I needed to modify was at the very bottom of the list because it was old, then I had to just keep scrolling until I got to it, letting the clusters of posts incrementally load the further I scrolled. There weren’t any options to just search for the post I wanted. This may have been a limitation of the theme I selected, but I was equally disappointed that I couldn’t search the blog itself for specific content, either. I frequently author blog posts that I know will help me in the future; they live on a blog as opposed to just in my personal notes because they might also be beneficial to someone else. If I can’t easily get back to that content, though, without mindlessly clicking a “Next” button, that’s a problem. This WordPress blog offers both a search box and sane pagination; neither was an option for my Squarespace deployment. I’d frequently have to search the web for what I wanted to find with the URL of my own site to reach it. That’s a problem.

The last thing I’ll mention is portability. Admittedly, WordPress might be just as bad at this, but it’s extremely difficult to take content from Squarespace and move it somewhere else. This was the big reason why I didn’t want to continue creating content on Squarespace even though I’ve already paid for the hosting there; I knew that I didn’t want to stick with Squarespace once the current hosting expired, but anything new I posted there would just be more work to move to somewhere else later on. Squarespace offers you the ability to export your content, but it’s to an XML file. While this will get the written content for each post and the metadata about it, it will not include any media. I managed to throw together a bit of a workaround that’ll most likely be the topic of my next post, but it was still a large amount of work to move everything from one host to another.

An obvious question at this point would be:

But aren’t you just in the same option regarding portability after moving to WordPress?

The answer is… maybe. As long as I don’t become disenfranchised with the platform as a whole, there are many different WordPress hosting platforms out there. If I want to move from one to another, I can easily export my site or take a backup of it and move the content somewhere else. I had initially tried moving a lot of the content from Squarespace to a Hugo site I already ran, but I very quickly ran into many of the same issues I described with Squarespace regarding management and discoverability; while being lightweight is nice, sometimes having a CMS is beneficial.

Wrap-Up

Despite the vibe you may get, I don’t dislike Squarespace at all. I feel like their business is really tailored to users who want a professional, mostly static website but who don’t have the skills to create that themselves. For a hobbyist like myself with a focus on blogging, the premium you pay for Squarespace gets you essentially nothing. Any WordPress instance is going to be a better blogging platform, and one that is significantly cheaper at that. Similarly, if you need to have firm divisions in your site (e.g. a blog for the sake of shitposting and a blog for podcast show notes), you can’t easily do that within WordPress. While you can create multiple pages, such as the About page here, you can’t set up an entirely separate blog.

At least for the moment, what I did with Squarespace for both a blog and podcast repository wouldn’t be possible with WordPress. For a standalone blog, though, the experience is significantly better on WordPress. It’s important to understand what the goal of your site is and what you need out of your platform. When that goal changes, moving platforms might be the best move. Hopefully my next post on how I migrated my images between Squarespace and WordPress can help with that.

Idiot’s Guide To Figuring Out How A Website Was Hacked

Full Disclosure: This won’t tell you exactly what was wrong with a website. This will just give you a pretty good, quick idea. I’m not in DFIR or even InfoSec. I’m just a sysadmin who has some familiarity with a decent number of systems. It’s also worth mentioning that I did all of these actions from my Linux machine. The same would be possible from macOS or from Windows 10 with the Windows Subsystem for Linux.

Last night, my good buddy Craft Brew Geek shot me a message because a website we both had something of an interest in (I won’t go into more specifics than that to protect the guilty… I’ll just say that it doesn’t belong to either of us) had suddenly exhibited weird behavior. Navigating to the website, either via directly typing their URL into my browser or by searching for them on Google and clicking the link, took me not to the expected website but to a super shady online pharmacy; there’s not enough booze in the world to get me drunk enough to type my credit card information into this site. Since we’re all stuck at home under quarantine, though, I figured I’d kill a little time digging into what, exactly, was going on.

The initial problem is that I navigate to desiredsite.com and it takes me to shadysite.com instead. A common way this type of thing happens without any degree of technical compromise is if someone allows their domain to expire rather than having it automatically renew. When that happens, it’s possible for an attacker to swoop in, buy the domain, and then change the DNS information to point to their desired site. It’s pretty uncommon since most DNS registrars will park domains for a month, giving the original owner time to renew. Failing that, they often go to auction rather than back into the pool. Additionally, under this scenario there would be no reason to redirect to shadysite.com. Still, it doesn’t hurt to check the DNS history through something like SecurityTrails. This showed the last DNS change was 3 months ago for the site in question; there’s no way the site had been redirected for 3 months so I could rule that out.

My next thought was to see if the sites were on the same server. If they were, that would tell me the entire server was wrecked, receiving my request, and was configured to load a different site instead. This was easy enough through dig:

dig +short desiredsite.com. a
dig +short shadysite.com. a

This gave me two different IP addresses. This tells me the sites aren’t hosted on the same server, which means that desiredsite.com is redirecting me to shadysite.com. For that to be the case, I have to be hitting desiredsite.com first, but then I’m redirected before I see anything. I needed to see what was up with the site before being redirected. Scripts on the web are most commonly executed not on the server side, but locally in the browser. As a result, I used wget to just try to snag the file living at desiredsite.com, which for most websites will be index.html:

wget http://desiredsite.com

This simply downloads the file to my local machine. Nothing is actually executing any scripts it might reference. Sure enough, this gives me an HTML file for desiredsite.com I can open in a text editor. I figured JavaScript was likely being used to handle the redirect. To test this, I turned off JavaScript in Google Chrome and once again navigated to http://desiredsite.com. This caused the expected website to load, albeit kind of broken since JavaScript wasn’t running.

Diving back into the index.html file, a quick search showed me that there were nearly 60 .js files for JavaScript. Ick. JavaScript can be written to be fairly easy to consume if you’ve got a passing familiarity with computer programming, but most JavaScript on the web is designed to be 1.) minified and 2.) obfuscated to make this nearly impossible. Seriously, this is what a typical JavaScript file looks like. Note how my editor is showcasing the fact that it’s all one line:

Clearly trying to read through 60 files of that isn’t going to happen; this isn’t my job, I’m doing it for fun. However, I still had some options for trying to quickly look for something flagrant. I saved down local copies of all 60 JavaScript files in the same directory, and then navigated to that directory from my terminal. I then used grep -R to recursively search through every JavaScript file at the same time.

cd /path/to/javascript
grep -R "search term here" .

What did I search for? I started off by searching for shadysite.com. No dice. Then I searched for the IP address I got for the site from my previous dig command. Also no dice. I didn’t think it would be anything that overt, but it was worth a shot. I decided to look at the source code for shadysite.com to see if there were any clues. I immediately noticed that the entire site was coded around the IP address for the site rather than the domain. For example, links in the source code of most sites are going to look like:

http://mydomain.com/folder/page.html

The links on this particular page were done like this:

http://192.168.254.254/folder/page.html

Obviously that wasn’t the IP address in use, but you get the point. This tells me that, unsurprisingly, they run into a lot of problems with their domain getting shut down. So they design the site to be domain agnostic, buy a new domain when the old one is shuttered, and then point it to the same IP address they’ve been using. Some quick searches online showed me a few tools I could use to plug in an IP address and get a historical list of domains tied to that IP. I used ViewDNS.info. This showed me 6 total domains that had been pointing to the same IIP address, one of which was what I saw now. I repeated my grep search above with the others to see if there were any hits, but sadly there was still no luck.

At this point, though, I still had a pretty good idea of what was happening. Out of the 60 JavaScript files referenced by the source code for desiredsite.com, most of them were in a sub-directory for WordPress, including some directories that noted they were for WordPress plugins. Having looked at enough compromised websites over the past 15 years, it’s a definite trend that WordPress (and especially WordPress plugins) tend to be Swiss cheese. WordPress plugins are a frequent target for attackers, and most people never think to update them. At this point, if I were determined to get to the bottom of things, it would be much quicker to just point some kind of vulnerability scanner like Nessus at the site and just let it find the vulnerable plugin(s) rather than tracking them down through obfuscated JavaScript.

All told, though, it was a fun exercise to dig into how the site was compromised and come away after only about 30 minutes of work with a pretty good guess.