When I started this blog, I wanted to write about computer security, turns out I preferred to write about my distractions from computer security. Working on writing more about what Icewater is. Several years ago I started on a project that I thought could help, but with any research I had no clue if it would actually be helpful. Like most ideas, it wasn’t originally mine, but through lots of research, programming, analytics and time I can call it mine, so the US PTO says.
I am fascinated with nature and the natural world. I see patterns and I noticed this pattern called Space Filling Curves that I saw in nature. I saw it used over and over in computer science too. Once I realized that it was contained in DNA, I decided that was a worthy signal to noise ratio and began to take the idea seriously.
Back about 2.4 Billion years ago we (as in earth) got a software upgrade. The first branch of the Tree of Life, leveraged a space filling curve to organize the DNA inside of nucleus of a cell. When I realized this I thought — that is a good indicator of a successful algorithm. I wonder if using that would be of benefit to understanding code without running it.
Programs are linear, we first started writing them for single CPUs and then scaled them to mult-core CPUs all the while networking those together with a glue called protocols. I like how nature simplifies things to scale them. Think “slow down to go faster.” I realized that I could preform scaleable analysis on many things if I could leverage a new kind of programming called parallel coding which requires a different kind of thinking.
This description is an over simplification: Take some data (a program) and make a picture out of it. There are many ways to do this and mapping the bytes of a program to pixels is simplified if we can do it using an algorithm that works good on a GPU.
I wanted to find similar executable code with out running it. I call this the “Stopping Problem”, after Turing’s Halting Problem. Finding new bad stuff is hard if you need to run the code, since I was poor and couldn’t run the code I had to figure out a way to “look” at the code. I want to understand if code should run without having to run it.
Today it takes about five minutes (worst case) to run a sample in a sandbox. I wanted to bring that down nine orders of magnitude. In a half millisecond per core with a GPU Icewater can give you a really good hint of you should sandbox the sample, or if the sample is like something you already have analyzed. Everyone loves binary decisions.
By folding the code into a 2-Diminational Hilbert Curve (which is super fast) its easy to have an image. The rest of the process should be naturally intuitive. If you are interested in more details look at some of the patents.
In the past few years I have something like 700 million pieces of malware indexed. I’m working on exposing this stuff to developers and I’m looking for folks from the computer security field to give me ideas on how they might be able to leverage it.
I digest about ~400K samples a day and find the “interesting samples” and sand box those. After a few years worth of sand box reports the world of good and bad begins to resolve itself. Average analysis time is one-half a milisecond using COTS servers and a mid range GPU. Its no more magical than life. What would you expect from a 2 billion old algorithm.
If you are thirsty for solutions that approch the problem of computer security and file safety, reach out to firstname.lastname@example.org I’d like to see if Icewater could help you. I have customers using this stuff, but they won’t let me tell you the use it.