Jim Lawless' Blog


A Simple Parser for a Small Command Line Interface

Originally published on: Sat, 02 Jan 2010 15:58:55 +0000

I like to create and tinker with little programming languages. Before I began to write more complex lexical analyzer functions, I used the strtok() function to retrieve tokens for a given miniature language.

Unfortunately, strtok() can be difficult to use once you introduce modest scanning rules.

I needed a function that would allow me to retrieve the next token from a line of text. This token was to be separated by spaces unless the token begins with a double quotation-mark. If the token begins with a dquote, I want to retrieve all characters including spaces up to ( but not including ) the next dquote.

If a '#' symbol is encountered, I want the parser to return NULL. I will treat this symbol as a comment character.

Consider this code-snippet ( MIT / X11 from the full source applies ):


char *get_token(char *linestart,char *lineend) {
   char *p,*q;
   p=strtok(linestart," \t\r\n");
   if(p==NULL)
      return NULL;
   if(*p=='#')
      return NULL;
   if(*p=='\"') {
      q=p;
      p+=strlen(q);
         // reconstruct the string mangled by
         // strtok()
      while( ((*p)==0)&amp;&amp;(p<lineend))
         *p=' ';
      p=strtok(q+1,"\"");
      if(p!=NULL)
         p--;
   }
   return p;
}

The above is my function that obeys my simple rules. The function accepts a starting position and an ending position. The starting position may be NULL as it is simply passed on to strtok(). The ending position is used when reconstructing part of the string that strtok() has overwritten.

Initially, get_token() scans the input string delimited by whitespace. If the result is NULL, that value is returned.

If the result begins with our comment character '#', a NULL is returned, signaling the end of processing for that line.

If the first character of the result is a dquote, the string-mangled by strtok() are replaced with spaces until either no more are found or until the original end of the line of text is reached. get_token() needs the end parameter to ensure that the original end-of-line position is not exceeded.

Once reconstructed, a call to strtok() is again made, beginning at the character after the first dquote, using a dquote as the only separator character. If that result is not NULL, the previous position is returned to the caller which causes the first dquote to appear, but not the last.

Please consider this test script file: script.txt


println "What is your name?"
# Read data into variable a
inputa


# Display a greeting.
println "Hello there, " a " ... "

# Implement our own "PAUSE" feature
println "Press ENTER to continue."
inputa

# execute something from the shell
println "Here's a list of your current directory,"
sys "dir /w"

# drop out
exit
println "You won't get here."

The following program will display the tokens for each line of text in the file specified on the command-line:

parser.c


// A parser for a small command-line interpreter.
//
// License: MIT / X11
// Copyright (c) 2010 by James K. Lawless
// jimbo@radiks.net http://www.radiks.net/~jimbo
// http://www.mailsend-online.com
//
// Permission is hereby granted, free of charge, to any person
// obtaining a copy of this software and associated documentation
// files (the "Software"), to deal in the Software without
// restriction, including without limitation the rights to use,
// copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the
// Software is furnished to do so, subject to the following
// conditions:
//
// The above copyright notice and this permission notice shall be
// included in all copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
// EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
// OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
// NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
// HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
// WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
// FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
// OTHER DEALINGS IN THE SOFTWARE.

#include <stdio.h>
#include <string.h>


char *get_token(char *linestart,char *lineend);

int main(int argc,char **argv) {
   FILE *fp;
   char buff[1024];
   char *end;
   char *token;
   int i;
   fp=fopen(argv[1],"r");
   if(fp==NULL) {
      fprintf(stderr,"Cannot open input file %s\n",argv[1]);
      return 1;
   }
   while(fgets(buff,sizeof(buff)-1,fp)!=NULL) {
      buff[strlen(buff)-1]=0;
      end=buff+strlen(buff);
      
      token=get_token(buff,end);
      for(i=1;token!=NULL;i++) {
         printf("%-3d %s\n",i,token);
         token=get_token(NULL,end);
      }
      printf("\n");
   }
   fclose(fp);
}

char *get_token(char *linestart,char *lineend) {
   char *p,*q;
   p=strtok(linestart," \t\r\n");
   if(p==NULL)
      return NULL;
   if(*p=='#')
      return NULL;
   if(*p=='\"') {
      q=p;
      p+=strlen(q);
         // reconstruct the string mangled by
         // strtok()
      while( ((*p)==0)&amp;&amp;(p<lineend))
         *p=' ';
      p=strtok(q+1,"\"");
      if(p!=NULL)
         p--;
   }
   return p;
}

After compilation, you may invoke the above program against the file script.txt by issuing the following command-line:


parser script.txt

The output should appear as follows:


1 println
2 "What is your name?


1 inputa




1 println
2 "Hello there,
3 a
4 " ...



1 println
2 "Press ENTER to continue.

1 inputa



1 println
2 "Here's a list of your current directory,

1 sys
2 "dir /w



1 exit

1 println
2 "You won't get here.

Here's a small command-line interpreter that allows for one variable (a) and understands the following commands:

  • println - Display any number of literal strings or the variable a followed by a newline on the output console
  • inputa - Retrieve a string from the console standard input device and leave it in variable a.
  • sys - Execute a command from the shell using the first argument token only. All other tokens are ignored.
  • exit - Exit the script

cli.c


// A small command-line interpreter.
//
// License: MIT / X11
// Copyright (c) 2010 by James K. Lawless
// jimbo@radiks.net http://www.radiks.net/~jimbo
// http://www.mailsend-online.com
//
// Permission is hereby granted, free of charge, to any person
// obtaining a copy of this software and associated documentation
// files (the "Software"), to deal in the Software without
// restriction, including without limitation the rights to use,
// copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the
// Software is furnished to do so, subject to the following
// conditions:
//
// The above copyright notice and this permission notice shall be
// included in all copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
// EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
// OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
// NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
// HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
// WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
// FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
// OTHER DEALINGS IN THE SOFTWARE.

#include <stdio.h>
#include <string.h>

char *get_token(char *linestart,char *lineend);
char _var_a[256];

int main(int argc,char **argv) {
   FILE *fp;
   char buff[1024];
   char *end;
   char *token;
   int i;
   fp=fopen(argv[1],"r");
   if(fp==NULL) {
      fprintf(stderr,"Cannot open input file %s\n",argv[1]);
      return 1;
   }
   while(fgets(buff,sizeof(buff)-1,fp)!=NULL) {
      buff[strlen(buff)-1]=0;
      end=buff+strlen(buff);
      
      token=get_token(buff,end);
      if(token==NULL)
         continue;
      if(!stricmp(token,"inputa")) {
         if((fgets(_var_a,sizeof(_var_a)-1,stdin))!=NULL) {
            _var_a[strlen(_var_a)-1]=0;
         }
         else {
            *_var_a=0;
         }
      }
      else
      if(!stricmp(token,"println")) {
         for(;;) {
            token=get_token(NULL,end);
            if(token==NULL)
               break;
            if(*token=='\"')
               printf("%s",token+1);
            else
            if(!stricmp(token,"a"))
               printf("%s",_var_a);
            else
               printf("%s",token);
         }
         printf("\n");
      }
      else
      if(!stricmp(token,"exit")) {
         break;
      }
      else
      if(!stricmp(token,"sys")) {
         token=get_token(NULL,end);
         if(token!=NULL)
            system(token);
      }
      else {
         fprintf(stderr,"Unknown command %s\n",token);
         break;
      }
   }
   fclose(fp);
}

char *get_token(char *linestart,char *lineend) {
   char *p,*q;
   p=strtok(linestart," \t\r\n");
   if(p==NULL)
      return NULL;
   if(*p=='#')
      return NULL;
   if(*p=='\"') {
      q=p;
      p+=strlen(q);
         // reconstruct the string mangled by
         // strtok()
      while( ((*p)==0)&amp;&amp;(p<lineend))
         *p=' ';
      p=strtok(q+1,"\"");
      if(p!=NULL)
         p--;
   }
   return p;
}

To execute this program after compilation, enter the following:


cli script.txt

You should be prompted for your name. The script should then greet you. It should then wait for you to hit ENTER. Finally, it should display your current directory ( if you're running in Windows. You might want to change the dir command to ls if you're running in Linux. )

Note that since the script encounters an exit verb before the last line, the last println is never executed.

Blank lines and lines beginning with '#' are automatically ignored.

The source code, sample script file, and Windows EXE files from this post can be found here:

http://www.mailsend-online.com/wp/cli.zip

Unless otherwise noted, all code and text entries are Copyright ©2010 by James K. Lawless

del_icio_us Save to del.icio.us
stumbleupon Save to StumbleUpon
digg Digg it
reddit Save to Reddit
facebook Share on Facebook
twitter Share on Twitter
aolfav More bookmarks



Previous post: Preserving my Favorite HN Links
Next post:Along Came AWK


Search this Blog (and site)

Search this Site with PicoSearch


Subscribe to this Blog

 Subscribe!


Contact Me

Email: jimbo@radiks.net


Follow me on Twitter

http://twitter.com/lawlessGuy


Recent Posts

Mad Schemes : Learning Lisp via SICP

Auto Save Clipboard Images Redux

Extending SpiderMonkey JavaScript on Windows

Rhino JavaScript to EXE with launch4j

Compiling Rhino JavaScript to Java

Directory Traversal in Rhino JavaScript

Taking Shape

We've Moved!


Popular Posts

A Command-Line MP3 Player for Windows

Auto Save Images from the Clipboard

Java in a Windows EXE with launch4j

An Interview with Tom Zimmer: Forth System Developer

Setting Windows Console Text Colors in C


Random Posts

Cheating the LZW

PHP, Transparent GIF's, and Web Tracking

Setting Windows Console Text Colors in C

Open Source Licenses

Site Tracking with Perl

A Quine in Forth

A Simple ROT13 Macro

A Command-Line CD Tray Opener

Structuring my Thinking

COM Scripting in C by way of JavaScript


Full List of Posts

http://www.mailsend-online.com/bloglist.htm


Blogroll

MicroISV on a Shoestring
DadHacker
The Bottom Feeder
Writin' That Code!
The Recursive ISV
The Thomsen Blog
Prototypically Speaking
The Reinvigorated Programmer