Remember, remember, the 5th of November, the gunpowder, treason and plot. I know of no reason why the gunpowder treason should ever be forgot.
So for this month I had lots of really cool topics I wanted to share with you, but every time I started writing I quickly realised that I'm not really an expert on that stuff. I decided I should spend some time learning about those topics before bringing them to you so I can unpick them piece by piece and do them justice by showing you how cool they really are. So instead I decided to go back a little bit and look at something I do know a lot about, and something that I don't really think gets enough airtime in the modern CyberSec world.
WSdLo8m90AE=
decipher it.
Pretty basic right? That's all the information I gave to my team of hackers a year and four months ago. Go ahead, see if you can figure it out without continuing if you like, I wouldn't worry though, in good time I did post the answer and explained it all for them to see after disappointingly not receiving a single correct answer in 5 days. So it must be a real toughie then right? Not really no, it's actually fairly trivial you just have to look at it in the right way. Also no, the team of hackers aren't stupid, in fact on the contrary, there's rarely a puzzle I can come up with they can't solve so I have the utmost respect for the community, this one though, seemed to completely stump them. The reason I suspect, is where it comes from, because its a bit of an obscure field compared to the circles hackers travel in.
Enough chitchat, let's rip this thing apart and analyze the components. Anyone familiar with this sort of puzzle will immediately recognise that as a Base64 string, so they'll open up something like CyberChef to decipher it back into ASCII encoding (UTF-8 these days really but useful to know where it comes from) . If you do that you should get something like Y'K£É½Ð
which is clearly not English so there's something a bit cleverer going on here.
In previous puzzles I'd done tricks like write something in a different language or used different character sets to obscure the solution, but this time that's not the case. After a little bit of faffing around a hacker usually realises its gibberish and stops trying to translate it and instead looks at it as a form of data. Base64 is more than just a character encoding after all, it is used in storing and sending all manner of information, including but not limited to images.
Sure enough if you strip it down to its base parts you'll be able to uncover this 59 27 4b a3 c9 bd d0 01
in Hexadecimal, which looks like this 01011001 00100111 01001011 10100011 11001001 10111101 11010000 00000001
in Binary. Which you use is usually a matter of personal preference depending which you're better at spotting patterns with, I personally find hex more descriptive and shorter, but it can allow you to miss patterns if they don't follow a common base. As before its trivial for CyberChef to make these conversions for you.
This is where most people seem to get stuck, the layout of the digits is alien to them and they seem to lack the fundamental knowledge to crack the code. The bit that matters here is the binary string as its much more tricky to spot the pattern in hex. A note on formatting though, a binary string is in actuality a endless string of binary digits, I've asked it to add spaces every byte to make it easier to read but really it's a contiguous stream of ones and zeroes and the spacings could be arbitrary especially in the event of offsets.
So, what next? Well, there's a subtle giveaway in that its exactly 8 bytes or 64 bits long. Also, notice how the end is mostly a string of zeroes? That's what they call padding, it means what we're looking at here is something slightly shorter than 64 bits that has been expanded to 64 bits for some reason, but why? and what? Well first lets look at the last bit, why is that not a 0? The most obvious reason is because it's a checksum or more specifically its a parity bit, and that should be all the clue really needed to solve this.
Parity when used is rarely on its own, and if there are more, it may explain why this string seems like gibberish, so upon finding one the next thing you should do is start looking through parity schemes to find something that makes the data make sense. It shouldn't take you too long of a search to find something about Hamming Codes and that is the key you need to unlock this string.
Hamming Codes are how you not only ensure data can be validated at the other end, but in low error rates, you can even rebuild the correct stream as it was originally intended despite only receiving corrupted data. Now if you're a data and communication geek like me that will sound really cool, if not that's a shame because it is this sort of code that ensures you only need to send data once and the other end receives the instructions correctly, which is really important if you're trying to tell the Voyager probes what to do because you can't just fly out there and fix them.
Now lets unpick this string. Naturally the parity bits are stored throughout the stream for reliability, but look carefully at the standard and you can predict them. For simplicity adding and removing them is simply a case of applying an XOR with the right value, so the original data is there in its unadulterated form assuming you have all the data and there are no errors to worry about. If that's the case you can perform a neat trick, all you have to do is remove the offending bits. You just delete the first bit, the second bit, the fourth bit, the eighth bit, the sixteenth bit...etc. all the powers of 2, because those are the parity bits. Technically first you should check for errors, but that's something I'll let you worry about once you know how to interpret the code, just trust me that here there are no errors.
Once deleted you are left with 0100 0010011 01001011 1010001 11001001 10111101 11010000 0000000
, I left the spaces where they are for ease of confirmation but they are now misaligned. Sure enough put that into CyberChef and return it to ASCII and you'll get the answer hidden in spoilers below with the null characters being the padding. Its usually advisable not to remove the padding as the end digit may also be a 0 and when accidentally removed render the data unreadable, whereas null characters are easy to ignore.
Answer
Bitrot��
So yeah there you go, you can hopefully see what I mean when I say it's a simple puzzle now. This type of encoding is often completely overlooked by CyberSecurity as a "mathematical theory" when in fact codes like these have been used quite a bit, particularly in spaceflight and other easily corruptible transfers of information. In fact, if you use 5G, it uses something called Polar Codes, which are a more modern equivalent of exactly this type of technology at work, and yet notice how the article is still labelled "coding theory". Don't underestimate these technologies, they're brilliant, powerful, and in use every day. If we hope to be able to match the Chinese for technological knowhow, we're going to need to know how this sort of stuff works, and where better to start than their cutting edge technology that we're currently working to develop?
There's also a wider problem in CyberSec I feel obliged to address, there is a big attitude among practitioners that if its not in CyberChef (or equivalent) then it ain't being figured out. I've definitely fallen into that trap for that myself, so I implore you to think outside the box, discover the unknown, break the unbroken. These codes in particular I find most interesting because of the mathematical element which I know many are not fond of, but I've scoured online, there seems to be no publicly developed or available implementation of them, which seems weird. Maybe I'm just not looking hard enough, or maybe it's because they're kind of niche, and when required the software engineers involved tend to just build them on the fly as part of their software, but their absence to me seems rather troubling. From what I can tell Hamming Codes largely get shoved into the same category as Huffman coding and other compression algorithms, not only is that not where it belongs, none of this should be ignored by our all-seeing security eyes. So I think it is important to demand better, of ourselves, of our companies, and above all of our industry.