Everything is a Number

It is difficult to write blog posts in AI times. The majority of people, when interested in something, refer to their favorite LLM (large language model) and simply discuss the topic, ask specific questions and go straight to the point.

In a certain sense, the only good reason left to write a blog post is for yourself, and this is probably a good thing.

I am writing more and more blog posts about parallelisms between numbers, philosophy and life, and the more I think about it the more I find it interesting.

In today's blog post, I'd like to talk about a simple but powerful concept: everything is a number.

No, it's not just binary

It's easy to fall into the oversimplification trap of "everything is digital > computers work in binary > everything is a number".

While this is true, this is not my main point today. It's important to keep in mind that human beings perceive decimals, as the most understandable form of numbers. In fact:

  • a 5-year-old knows that this is a seven: "7"
  • the majority of educated people struggle to see this as a seven: "111"

And the fact that decimal numbers are so close to "our world", we use them to measure basically everything: time, dates, money, and any kind of measure.

As simple as they are, decimal numbers can be used to represent, store and transport very complex pieces of information as well.

Illegal Primes

One of the most interesting stories about the topic is the DVD encryption cracking system. Back in the days we didn't watch movies on Netflix, production houses used to sell DVDs (discs with a movie on, in case you are very young), and to maximize their profit they protected them from unauthorized reproduction by encrypting their content.

In 1999, a programmer wrote a legendary program called DeCSS, capable of breaking the DVD encryption and allowing the content to be downloaded on any support, and cloned, of course. It was not a criminal act, but more an action in the name of freedom and "fair-use" - because people who bought DVDs couldn't legally play them on unlicensed or open-source systems.

Movie studios very quickly sued the programmer under the DMCA (Digital Millennium Copyright Act), and were able to ban the distribution of the software and take down all the websites distributing it. So people went creative and encoded the program into a number, thinking that a number cannot be banned... well, they were wrong!

Apparently:

distributing certain numeric encodings as a way of sharing DeCSS code was challenged under anti-circumvention laws and forbidden

At that point, the programming community went one step further and found a way to encode that computer program into a significantly big prime number, that due to its scientific relevance (yes, prime numbers are scientifically interesting!) could not be banned or censored, and this was it:

48565078965739782930984189469428613770744208735135792401965207366869851340104723744696879743992611751097377770102744752804905883138403754970998790965395522701171215702597466699324022683459661960603485174249773584685188556745702571254749996482194184655710084119086259716947970799152004866709975923596061320725973797993618860631691447358830024533697278181391479795551339994939488289984691783610018259789010316019618350343448956870538452085380458424156548248893338047475871128339598968522325446084089711197712769412079586244054716132100500645982017696177180947811362200272344827224932325954723468800292777649790614812984042834572014634896854716908235473783566197218622496943162271666393905543024156473292485524899122573946654862714048211713812438821771760298412552446474450558346281448833563190272531959043928387376407391689125792405501562088978716337599910788708490815909754801928576845198859630532382349055809203299960323447114077601984716353116171307857608486223637028357010496125956818467859653331007701799161467447254927283348691600064758591746278121269007351830924153010630289329566584366200080047677896798438209079761985949364630938058633672146969597502796877120572499666698056145338207412031593377030994915274691835659376210222006812679827344576093802030447912277498091795593838712100058876668925844870047077255249706044465212713040432118261010359118647666296385849508744849737347686142088052944

Yes, this number above contains a computer program capable of de-scramble DVD encryption, you got it right! And by doing that, it earned the definition of illegal prime!

And the astonishing thing is that any computer program can be encoded into a decimal number, even a complex one that makes interactions with other files, other systems or the internet. As we said, if it's compiled down to binary, it can be translated into a decimal.

Beauty hidden in plain sight!

Let's code this down!

To give shape to this idea, I decided to code a small program that takes any binary program as input, and turns it into the most readable form of decimal number. That number can also be reverted to a program and executed, with no information lost. Apart from being a fun coding exercise, it was a meaningful way of representing the very same object in a simple or complex way, depending on the observer (being it a machine, or a human).

* non-technical readers: please skip the code block!

import sys
import zlib
import base64

def binary_file_to_number(path):
    with open(path, "rb") as f:
        data = f.read()
    return int.from_bytes(data, "big")

def number_to_binary_file(number, output_path):
    length = (number.bit_length() + 7) // 8
    data = number.to_bytes(length, "big")
    with open(output_path, "wb") as f:
        f.write(data)

def binary_file_to_number_compressed(path):
    with open(path, "rb") as f:
        data = f.read()
    compressed = zlib.compress(data)
    return int.from_bytes(compressed, "big")

def number_to_binary_file_decompressed(number, output_path):
    length = (number.bit_length() + 7) // 8
    compressed = number.to_bytes(length, "big")
    original = zlib.decompress(compressed)
    with open(output_path, "wb") as f:
        f.write(original)

def encode_number(n, fmt):
    """
    Convert integer to chosen textual representation.
    fmt = 'dec' | 'hex' | 'b64'
    """
    if fmt == "dec":
        return str(n)

    length = (n.bit_length() + 7) // 8
    raw = n.to_bytes(length, "big")

    if fmt == "hex":
        return raw.hex()

    if fmt == "b64":
        return base64.b64encode(raw).decode("ascii")

    raise ValueError("Unknown format")

def decode_number(s, fmt):
    """
    Convert textual representation back to integer.
    """
    if fmt == "dec":
        return int(s)

    if fmt == "hex":
        return int.from_bytes(bytes.fromhex(s), "big")

    if fmt == "b64":
        raw = base64.b64decode(s)
        return int.from_bytes(raw, "big")

    raise ValueError("Unknown format")

def read_text_from_file_or_arg(arg):
    try:
        with open(arg, "r") as f:
            return f.read().strip()
    except FileNotFoundError:
        return arg

if __name__ == "__main__":

    if len(sys.argv) < 3:
        print("Usage:")
        print("  python e.py [-c] [--hex|--b64] to_number <binary_file>")
        print("  python e.py [-c] [--hex|--b64] to_binary <text_or_file> <output_file>")
        sys.exit(1)

    # Flags
    compress = False
    fmt = "dec"
    i = 1

    while i < len(sys.argv) and sys.argv[i].startswith("-"):
        if sys.argv[i] == "-c":
            compress = True
        elif sys.argv[i] == "--hex":
            fmt = "hex"
        elif sys.argv[i] == "--b64":
            fmt = "b64"
        else:
            break
        i += 1

    mode = sys.argv[i]

    # ========= to_number =========
    if mode == "to_number":
        path = sys.argv[i+1]

        if compress:
            n = binary_file_to_number_compressed(path)
        else:
            n = binary_file_to_number(path)

        print(encode_number(n, fmt))

    # ========= to_binary =========
    elif mode == "to_binary":
        text_arg = sys.argv[i+1]
        output_path = sys.argv[i+2]

        text = read_text_from_file_or_arg(text_arg)
        n = decode_number(text, fmt)

        if compress:
            number_to_binary_file_decompressed(n, output_path)
        else:
            number_to_binary_file(n, output_path)

        print(f"Wrote {output_path}")

    else:
        print("Unknown mode")
        sys.exit(1)

The program is just a quick proof of concept, but I still thought it deserved a Github repo. I may update it in the future or add some more fun — such as accepting images, running as an API, or supporting custom encodings. And you are most welcome to experiment with it if it sparkles your mind.

This paradoxical pairing of simple/complex is even more relevant in AI times, where very complex objects (words, images, videos, code) are translated into simple ones (matrices of decimal numbers) and then processed with highly sophisticated algorithms. Simplicity and complexity alternate to produce an output comparable to what the human brain produces — which isn’t too surprising when you consider that our brains also process information as patterns of electrical activity.

Mathematicians have designed many ways to measure complexity, with one of the most popular today being Big O Notation (used to measure the complexity of algorithms). But I wonder: since the form of representation is so mutable, how can we measure it in a truly appropriate way?

The next time you see a big random number... please start wondering what incredible things it may contain ;)

Until the next one,
Francesco