How many Signatures are there?

I enjoy thinking about numbers, it is one of those things that has always made me less friends — not everyone enjoys thinking about them. the numbers I am considering here are not like the ones in your checkbook, or your car’s odometer. When we consider large numbers one might think of our nation debt which is measured in trillions. These words denote the number of zeros in the number in question. Bored yet?

Programs are numbers. Programs are also a small database which contains some very esoteric arrangements of data and code. If you consider a number line in base-2 the program is its position on the line. Adding one to this number would increase the last byte of the program by one and overflowing would add a byte to the program. This infinite number line in base-2 contains all programs in every language and every architecture that has been or will be invented.

dscn2869One of the things I enjoy about this number line is that it contains every program ever written before we even had a language for writing them. All the programs we will write are on this number line too. The numbers that are programs that will actually run are also very sparse. Only some numbers will run and like primes we can’t predict where the running programs are on the number line. Some alterations to a program, shifting or replacing bits, don’t alter the program’s logic. Some alterations to a program effect the logic and may make the program either fail (segfault) or produce an incorrect or invalid output. I believe that there are many more runnable programs than there are correct ones, and for the correct ones there are probably many.

Programs begin as source code and we can consider all text and source as a single number too. These source programs are transposed from one number to another by a compiler.  this is where I quickly get out of my pay grade and consider commutative and associative properties of compilers to source code.

This is where it becomes difficult to follow, where every program is already written in every computer language we ever will write. We could randomly select programs and try to run them, but this takes time and we won’t know if a program runs until we run it. While the universe has time, I do not so maybe there is an optimization we can make to tell if these programs might run.


/* Hello World program */

#include<stdio.h>

int main()
{
     printf("Hello World");
}

The above program compiles to a program near 2^67456 which is a large number. For me to show you the number which is also the same as the program I would need to include the program’s object code. I could disassemble the programs and include the opcode instructions or I could include the Hex of those instructions, neither of those would make much sense to you because the codes are only understood by a CPU. All of these options just add to the eye-rolling boredom of large numbers, so I’ve decided not to include the examples as hex or assembly mnemonics.

There is another way that works for me to think about large numbers and as many of you might have guessed, I like to take big numbers and convert them to pictures.  The Hello World program compiled to 8,432 bytes on my mac with gcc. The image below is larger than 8K so this isn’t a great way to store things, but it is a fantastic way to look at really large numbers. Consider how they change as you move up and down a theoretical number line. There are not any Real programs in the sense that programs are only integers. I also haven’t figured out how to make complex programs, but I’m interested in working on that too.

helloworld

Hello World

Just because we can convert numbers to images does not make necessarily make the image interesting. Many versions of “hello world” exist in binary form that will still run. Turns out that we can alter many of the pixels of this image and the program will still run. There are only a few pixels that change the output of the program and a few that if changed will make the program unable to run. They are all numbers and their distance from each other can be measured.

Can the same program exist on the number line more than once? Sure, when we compress programs to a more compact form the program can exist in many compressed or encrypted forms. The compression program is also a number. The thing that hurts my head is that our hello world program has many possible positions on the number line. I suppose I need to go take a math class so I can understand if there are a finite number of representations of “hello world.” If we just take the ones that are “runnable”  meaning that they are in a form ready for execution like a PE for windows or a coff for linux or a mach-o for mac — how many signatures can we write for it?

Most signatures are written by taking a found string, in this case we could use “hello world” as out string to detect this program. We would detect any program that had this string. We could also use a hashing function to hash this set of characters to exactly match any program that contained the words “hello world” exactly.

Most of the security world tells you that signatures are dead, except they are still widely in use. All of the engines in virus total use signatures and some boldly state they they use machine learning or AI. What they really mean is that they use statistics, but that is so boring no one would buy their stuff if they told you they used statistics to find malware, so it is called AI because you don’t understand what AI is.

I have advocated against giving programs names, but all the marketing departments know humans work best with names. Most cyber security companies scour compiled programs for strings that maybe pronounceable. This is how all virus are named. Mcafee picks one name and Symantec another, in this way each can take some credit for working/defending from the same adversary without having to reference each other’s work.

59c4d05e59a38a02083e8f87de012196

Some malware

I started a project to publish signatures written as YARA Signatures. So far the project has published over one thousand signatures. Each signature leverages the “hash” module and has meta-data that identify the cluster of malware. The collection of rules is covered by the RIL (Rick’s Internet License) and each rule is named after the cluster and image of the data on our little base-2 number line.

The rules live at Icewater’s Free Yara Rules github repo. Let me know what you think of them. My goal is to publish as many as I can thus answering the title of this post, how many signatures are there. If I am able to cover most of the threats the internet becomes a safer place. I’m not sure how long it will take me, so I will update this blog with my progress.

Do not ask if all the programs are already written, or if we live in a simulation would all the programs need to be written? As subroutines or functions? Some rich folks think we live in a simulation and have hired some programmers to break us out much like those from the the film “The Matrix.” Relax, just because all the programs existed before we had silicon does not mean we are living within a quantum computer. All of the things that bring wealth in being are free and their only cost is time. Nature’s designs allow for the storage of sunshine, the comfort of another’s touch and the limits of mind.

For anyone that had not done hulisigenics — the total eclipse was enough to make some feel physically ill.  An eclipse out in nature provides for the sudden rhythms of nightfall. The wind dies down, the fish rise to the surface to eat. Within minutes the difference between five-nines becomes obvious. I suspect I had never seen 100% of anything. What I like about this is that there are few words that can provide the insight between a fraction and one. I had just never seen that, even on hulicigenics. I’d never seen something so big and able to eclipse something so very big. Why the fuck is the moon the same size as the fucking sun? Fractals.

I may continue to approch 100% for some time and I will choose those parallel, cellular autonomous and statistically backed by the networks of algorithms genetically encoded in our cells. During business meetings do not challenge your potential customers to teach every one of their cells a new algorithm. They will not like learning this feat is impossible.

Do not bring the Flu with you to a future that is viral.



Categories: Cyberr

Tags: , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: