Turing stated at the dawn of computing that a computer program was impossible write that could determine if another computer program and its inputs would stop or run forever. In security we think deeply about bugs and if the bugs can be exploited by other software remotely.
I’ve become interested in something I am calling the Starting Problem which is defined as: should a given program be allowed to run. Back when Turing was working on computers there were very few programs. Today almost every webpage has some executable code contained within it and many devices can interact with its surroundings. The idea of an Internet of Things (IoT) where “all devices” have an IP stack and can communicate is just dawning. Today your phone has a IP stack, but your body doesn’t.
We have seen where modern IPv4 enabled security cameras that are vulnerable to a specific attack leveraged over the network can be converted to a swarm that disables very large websites. Miria, a botnet made out of IPv4 enabled security cameras were responsible for an extremely large Dyn DDoS attack.
Understanding that every system is vulnerable — most security solutions have one of two methods to defend themselves from malicious software. The first method leveraged hashing algorithms which create unique descriptor for a piece of data. Security companies would hash either parts of the program or the entire program and if the program was later found to be malicious they would blacklist it. Application firewalls could just not allow anything on the blacklist to run.
Malware authors soon realized that if they “packed” or hid their application inside a new envelope they could create an infinite number of a applications and the security community couldn’t possibly keep up.
The arms race against malicious software began as the next logical step from the cold war
The security community created it’s second prevention method which is to understand the behavior of software. The solution is to run the software and see what it does — remember Truing stated that — given a program and its inputs you can’t write a program to tell when the first one will stop or run indefinitely. I’m not suggesting either; however, I do believe that you can determine if you should not start a program.
Building mental images of things is an art called visualization. This is an important aspect to the proposal I will make about a 3rd attempt to deal with which programs should be allowed to run. DNA is a single strand of code that runs biological computers. The code for a computer program is also executed serially. Meaning that computer code targeted for silicon CPUs runs logically from start to end. I thought that if we can view computer code as 2d then we might be able to manipulate in such a way to make decisions from the pictures, just as we will with carrots.
DNA is a single strand of code that runs biological computers
The starting problem proposal is: programs that look alike are probably alike. Think about it, we use this technique all day on the farm. A field of garlic looks like a field of garlic because all the garlic looks the same. The weeds are the things that don’t look like garlic. You can even have two kinds of garlic and still figure out which are the weeds and which is the garlic.
The idea is really simple, but you need lots of data. Most programs look they way they work. A family of malware has the same features, because it has the same code. Code from a repository of software that isn’t malicious will have its own patterns but will be different everything else including the malicious software.
Form Follows Function
It is difficult to escape in physics or bits, the fact that expressing information takes up space. A program that has many different functions will be larger than a program with a single function even though the output may be the same. Functions may be obscured which is just another function that encodes. Since a program may not preform a function that it does not have code for [see magic] code will look like itself.
Some have attempted to categorize programs by their function call graph which involves emulating an NP hard problem which takes lots of time. Through execution of a program in a sandbox is another tactic to understand the true nature of a program’s functionality.
One of the things I’ve enjoyed learning about is how math defines truth in a world full of deception, indirection and value. Most of the world is value and learning to ignore deception is valuable. Understanding a function has a cost, but once paid for up front, the cost of determining a nearly identical function is almost zero. Investments of time pay off exponentially when leveled at the unknown unknowns.
It is difficult to escape in physics or bits, the fact that expressing information takes up space
An example might be helpful. There are two plants that grow on the farm. One is native and happens to be the most deadly plant in North America. The other is not native and is called the carrot. White carrots appear nearly the same as the deadly poisonous Water Hemlock
Understanding the difference is really important. An untrained enthusiastic programmer might not think twice before vaporizing a bunch of weeds with a huge blow torch. My wife decided that eating some was the way to go — she can tell you what it tastes like. Few live to tell this as it was a numbers game in that she tasted a very small amount.
There are a number of clues that become obvious when you live around the plant. My point is that two things that might look alike, to the untrained , one might also kill you. Training is important and without it all Machine Learning would be lost. Google’s Tensor Flow requires training, just like any young farmer or newly minted foodie. Understanding the attributes of plants is how you partition the nutrients from the long slumbers of Socrates, who dies by a tea of Poison Hemlock.
Today’s internet provides many opportunities to run code presented over the network. A human’s interaction is often required to initiate this process. Turns out humans do not have the tools required to understand if something they click on will harm their computer, data, finances, company, etc. The kinetic component of code just hasn’t begun to be written about… Back to my thesis: lets not run code that looks like white carrots. Yes, there is a white carrot that won’t kill you, but there appears to be no orange or purple or yellow variants of Conium maculatum.
Visualizing code is one way of comparing its textures. We can compare hashes (exactness) but comparing alikeness, textures, requires something from nature. Ever look up at the sky and see the clouds. As a child I thought you could see things in the clouds. I am proud to say that as an adult I do get to see the sky every day, and when it has clouds I revel in its vastness of differentness. Even if it is grey, they grey is different. Comparing likeness is something humans are good at.
Some call it racism when we compare ourselves by skin color, language, or religion. Humans are just attempting to short circuit decisions by choosing to accept things that are like themselves. Its cognitively cheaper. Artificial Intelligence will need to be trained in these choices, which are learned responses. We only need a few test subjects to teach which are the edible carrots, it is how learning occurs, iteratively.
I won’t grow white carrots, no shirt — no shoes, no service.