An Embedded Mini-Interpreter
Originally published on: Mon, 16 Nov 2009 03:20:44 +0000
Several years ago, when discussing software protection schemes others had spoken of dynamically modifying code itself on the fly. In years past, this technique may not have been cause for concern, but the fact that the processor cache may pre-load blocks of code left me curious as to how one would flush the cache and reload the altered code. I later found some Windows API functions that helped in this endeavor, but I pondered a slightly different route to accomplish lower-level code obfuscation.
My solution was certainly influenced by the September 1989 Dr. Dobbs Journal article Roll Your Own Minilanguages With Mini-Interpreters by Michael Abrash and Dan Illowsky. You can read this article here: http://www.ddj.com/article/printableArticle.jhtml?articleID=184408206.
The article presents an interpreter written in assembly language that uses the assembler's ability to define pointers to implement a simple interpreter. I took a similar approach and defined a mini-interpreter in C that consisted of ten very specific instructions. My interpreter did not use function-pointers; I chose to use a switch/case construct so that the code could more easily be ported to other languages.
My goal was to be able to first create some sort of sample program that could be broken down into a handful of core, top-level functions. These core functions and a few supplemental functions would comprise the instruction set for the mini-interpreter. By encoding the instructions and their parameters into an array of integers, I could then save the array to the filesystem. Using this technique, I could use a C program that could create any external file for the mini-interpreter that I wanted. Once externalized, these files could be encrypted and / or obfuscated. My hope was that standard techniques to trace 80x86 object code would be of little value during the execution of these pseudo-instructions.
Consider the code for file game1.c
// License: MIT / X11
// Copyright (c) 2009 by James K. Lawless
// jimbo@radiks.net http://www.radiks.net/~jimbo
// http://www.mailsend-online.com
//
// Permission is hereby granted, free of charge, to any person
// obtaining a copy of this software and associated documentation
// files (the "Software"), to deal in the Software without
// restriction, including without limitation the rights to use,
// copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the
// Software is furnished to do so, subject to the following
// conditions:
//
// The above copyright notice and this permission notice shall be
// included in all copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
// EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
// OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
// NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
// HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
// WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
// FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
// OTHER DEALINGS IN THE SOFTWARE.
//
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <time.h>
int howLarge;
int theNumber;
int tries;
void clearCounter();
void genRandomNumber();
void askHowLarge();
void playTheGame();
void howDidTheUserDo();
void wannaPlayAgain();
main() {
printf("Registered version.\n");
askHowLarge();
for(;;) {
genRandomNumber();
playTheGame();
howDidTheUserDo();
wannaPlayAgain();
}
}
void clearCounter()
{
tries=0;
}
void genRandomNumber()
{
time_t t;
srand((unsigned int) time(&t));
theNumber=( rand() % howLarge ) + 1;
}
void askHowLarge()
{
char buff[12];
printf("What's the largest number I should think of?");
fgets(buff,11,stdin);
howLarge=atoi(buff);
}
void playTheGame()
{
char buff[12];
int num;
for(;;) {
tries++;
printf("Take a guess: what number am I thinking of? ");
fgets(buff,11,stdin);
num=atoi(buff);
if( num == theNumber ) {
printf("Correct!\n");
return;
}
if( num < theNumber )
printf("Too low.\n");
else
printf("Too high.\n");
}
}
void howDidTheUserDo()
{
printf("It took you %d ",tries );
if( tries == 1 )
printf("try.\n");
else
printf("tries.\n");
}
void wannaPlayAgain()
{
char buff[5];
printf("Would you like to play again?");
fgets(buff,4,stdin);
if( (*buff != 'y')&&(*buff!='Y')) {
exit(0);
}
}
game1.c is intended to model a fully functional registered copy of a game where the computer chooses a random number and allows the user to continue entering guesses until the number is reached. With each incorrect answer, the program will respond with "Too high." or "Too low."
The main() function issues calls to a series of other functions. While each of these functions will intentionally be used as an instruction in the mini-interpreter, the program will need a few extra support functions ( such as a function that can display a string ). We'll also need a primitive flow-control instruction that the game can repeat forever. Finally, we'll add an instruction that will terminate execution.
Embedding the Interpreter
In the source for gameinterpreter.c, the top-level functions from game1.c have been kept and the support functions have been added. These functions have been separated into a library that will be used in the remaining programs.
gameinterpreter.c
// License: MIT / X11
// Copyright (c) 2009 by James K. Lawless
// jimbo@radiks.net http://www.radiks.net/~jimbo
// http://www.mailsend-online.com
//
// Permission is hereby granted, free of charge, to any person
// obtaining a copy of this software and associated documentation
// files (the "Software"), to deal in the Software without
// restriction, including without limitation the rights to use,
// copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the
// Software is furnished to do so, subject to the following
// conditions:
//
// The above copyright notice and this permission notice shall be
// included in all copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
// EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
// OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
// NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
// HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
// WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
// FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
// OTHER DEALINGS IN THE SOFTWARE.
//
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <time.h>
#include "gameinterpreter.h"
static int howLarge;
static int theNumber;
static int tries;
static int insp; // virtual instruction pointer
static int labels[10];
extern int pcode[300];
void clearCounter();
void genRandomNumber();
void askHowLarge();
void playTheGame();
void howDidTheUserDo();
void wannaPlayAgain();
void branch();
void printString();
void setLabel();
void end();
static void clearCounter()
{
tries=0;
}
static void genRandomNumber()
{
time_t t;
srand((unsigned int) time(&t));
theNumber=( rand() % howLarge ) + 1;
}
static void askHowLarge()
{
char buff[12];
printf("What's the largest number I should think of?");
fgets(buff,11,stdin);
howLarge=atoi(buff);
}
void playTheGame()
{
char buff[12];
int num;
for(;;) {
tries++;
printf("Take a guess: what number am I thinking of? ");
fgets(buff,11,stdin);
num=atoi(buff);
if( num == theNumber ) {
printf("Correct!\n");
return;
}
if( num < theNumber )
printf("Too low.\n");
else
printf("Too high.\n");
}
}
static void howDidTheUserDo()
{
printf("It took you %d ",tries );
if( tries == 1 )
printf("try.\n");
else
printf("tries.\n");
}
static void wannaPlayAgain()
{
char buff[5];
printf("Would you like to play again?");
fgets(buff,4,stdin);
if( (*buff != 'y')&&(*buff!='Y')) {
exit(0);
}
}
static void branch()
{
insp=labels[ pcode[insp+1] ];
}
static void printString()
{
int c;
do {
insp++;
c=pcode[insp];
if(c)
printf("%c",c);
} while(c);
}
static void setLabel()
{
labels[ pcode[insp+1] ] = insp+2 ;
}
static void end()
{
exit(0);
}
void interpret(int position)
{
insp=position;
for(;;) {
switch( pcode[insp] ) {
case CLEAR_COUNTER:
clearCounter();
insp++;
break;
case GEN_RANDOM_NUMBER:
genRandomNumber();
insp++;
break;
case ASK_HOW_LARGE:
askHowLarge();
insp++;
break;
case PLAY_THE_GAME:
playTheGame();
insp++;
break;
case HOW_DID_THE_USER_DO:
howDidTheUserDo();
insp++;
break;
case WANNA_PLAY_AGAIN:
wannaPlayAgain();
insp++;
break;
case BRANCH:
branch();
break;
case PRINT_STRING:
printString();
insp++;
break;
case SET_LABEL:
setLabel();
insp=insp+2;
break;
case END:
end();
break;
}
}
}
Most of the functions have been made static with the exception of the interpret() function. The counterpart header file appears below:
gameinterpreter.h
#ifndef GAME_INTERPRETER_H
#define GAME_INTERPRETER_H
enum {
CLEAR_COUNTER=0, // 0
GEN_RANDOM_NUMBER, // 1
ASK_HOW_LARGE, // 2
PLAY_THE_GAME, // 3
HOW_DID_THE_USER_DO, // 4
WANNA_PLAY_AGAIN, // 5
BRANCH, // 6
PRINT_STRING, // 7
SET_LABEL , // 8
END // 9
} ;
void interpret(int);
#endif
//
The enum construct defines defines 10 symbols, each having a value in the range of 0 through 9. Those numbers will be used as virtual opcodes.
The additional support functions are as follows:
- printString() - This function displays a string of characters that occurs immediately after the PRINT_STRING opcode until we hit a 0. Each character in the string will occupy one int. That's certainly not the best way to implement this mechanism, but it will suffice.
- setLabel() - An integer after the opcode is read and is used to mark the offset of the next instruction for the target of a BRANCH instruction. SET_LABEL,0 would place the position of the next instruction into label 0 in the array labels.
- branch() - This function reads the next int in the stream, looks up that entry in the label table, and branches to it my manipulating the virtual instruction-pointer insp.
- end() - Terminate exection.
- interpret(int) - This function uses a switch-case to interpret our virtual instruction set.
The program game2.c will use the interpreter to implement the registered copy of the game.
#include "gameinterpreter.h"
int pcode[300]={
PRINT_STRING,'R','e','g','i','s','t','e','r','e','d',' ',
'v','e','r','s','i','o','n','.','\n', 0,
ASK_HOW_LARGE,
SET_LABEL,0,
GEN_RANDOM_NUMBER,
PLAY_THE_GAME,
HOW_DID_THE_USER_DO,
WANNA_PLAY_AGAIN,
BRANCH,0
} ;
main() {
interpret(0);
}
The actual p-code program is contained in an integer array named "pcode" which contains 300 entries. ( 300 is just an arbitrary number that I used for sake of example. )
int pcode[300]={
PRINT_STRING,'R','e','g','i','s','t','e','r','e','d',' ',
'v','e','r','s','i','o','n','.','\n', 0,
ASK_HOW_LARGE,
SET_LABEL,0,
GEN_RANDOM_NUMBER,
PLAY_THE_GAME,
HOW_DID_THE_USER_DO,
WANNA_PLAY_AGAIN,
BRANCH,0
} ;
When the interpet function begins execution, the PRINT_STRING opcode will be found, causing the printString() function to execute. printString() will continuously display the characters remaining in the pcode array until it reaches an int with the value zero.
ASK_HOW_LARGE causes the like-named function to execute.
SET_LABEL,0 marks the next instruction (GEN_RANDOM_NUMBER) as label number zero. Later in the code when the program encounters BRANCH,0, control will transfer to the GEN_RANDOM_NUMBER entry.
The next several opcodes ( GEN_RANDOM_NUMBER, PLAY_THE_GAME, HOW_DID_THE_USER_DO, WANNA_PLAY_AGAIN ) cause their counterpart functions to be executed.
The BRANCH,0 opcode ( assuming that wannaPlayAgain() hasn't terminated execution ) will branch back to the GEN_RANDOM_NUMBER section, simulating the never-ending for-loop from GAME1.C.
Creating the Trial Version
game3.c implements the trial version by simply changing the pcode array:
#include "gameinterpreter.h"
int pcode[300]={
PRINT_STRING,'U','n','r','e','g','i','s','t','e','r','e','d',' ',
't','r','i','a','l',' ','v','e','r','s','i','o','n','.','\n', 0,
ASK_HOW_LARGE,
GEN_RANDOM_NUMBER,
PLAY_THE_GAME,
HOW_DID_THE_USER_DO,
END
} ;
main() {
interpret(0);
}
The trial version displays "Unregistered trial version." and allows one iteration of gameplay. The END opcode terminates the program after the user plays one round.
Externalizing the P-Code
game4.c and game5.c write files REG.DAT and TRIAL.DAT respectively. These files are copies of the pcode[] array from the registered and unregistered versions of the game.
game4.c
#include <stdio.h>
#include "gameinterpreter.h"
int pcode[300]={
PRINT_STRING,'R','e','g','i','s','t','e','r','e','d',' ',
'v','e','r','s','i','o','n','.','\n', 0,
ASK_HOW_LARGE,
SET_LABEL,0,
GEN_RANDOM_NUMBER,
PLAY_THE_GAME,
HOW_DID_THE_USER_DO,
WANNA_PLAY_AGAIN,
BRANCH,0
} ;
main() {
FILE *fp;
fp=fopen("REG.DAT","wb");
fwrite(pcode,1,sizeof(pcode),fp);
fclose(fp);
}
game5.c
#include <stdio.h>
#include "gameinterpreter.h"
int pcode[300]={
PRINT_STRING,'U','n','r','e','g','i','s','t','e','r','e','d',' ',
't','r','i','a','l',' ','v','e','r','s','i','o','n','.','\n', 0,
ASK_HOW_LARGE,
GEN_RANDOM_NUMBER,
PLAY_THE_GAME,
HOW_DID_THE_USER_DO,
END
} ;
main() {
FILE *fp;
fp=fopen("TRIAL.DAT","wb");
fwrite(pcode,1,sizeof(pcode),fp);
fclose(fp);
}
game6.c is intended to be the final game runtime in the evolution of these game programs. It accepts one file name command-line parameter when executed. Game6 can be used to execute either the file REG.DAT or the file TRIAL.DAT.
To play the registered version, type the following:
game6 reg.dat
To play the trial version, type the following:
game6 trial.dat
This demonstrates that game6 contains the game engine, while the game-flow is controlled by an oversimplified script that we enter in an array of integers.
Since the array is now external ( and is data ), it can be encrypted. We can also add a CRC or a message-digest to the pcode data to detect tampering.
This source code and post were intended to be used as a model. There are certainly many improvements that can be made to the embedded interpreter. The intent was to show how to externalize control-flow without modifying native code on the fly.
Unless otherwise noted, all code and text entries are Copyright ©2009 by James K. Lawless
Views expressed in this blog are those of the author and do not necessary reflect those of the author's employer. Views expressed in the comments are those of the responding individual.
Save to StumbleUpon
Digg it
Save to Reddit
Share on Facebook
Share on Twitter
More bookmarks