Jim Lawless' Blog


A Simple Parser for a Small Command Line Interface

Originally published on: Sat, 02 Jan 2010 15:58:55 +0000

I like to create and tinker with little programming languages. Before I began to write more complex lexical analyzer functions, I used the strtok() function to retrieve tokens for a given miniature language.

Unfortunately, strtok() can be difficult to use once you introduce modest scanning rules.

I needed a function that would allow me to retrieve the next token from a line of text. This token was to be separated by spaces unless the token begins with a double quotation-mark. If the token begins with a dquote, I want to retrieve all characters including spaces up to ( but not including ) the next dquote.

If a '#' symbol is encountered, I want the parser to return NULL. I will treat this symbol as a comment character.

Consider this code-snippet ( MIT / X11 from the full source applies ):


char *get_token(char *linestart,char *lineend) {
   char *p,*q;
   p=strtok(linestart," \t\r\n");
   if(p==NULL)
      return NULL;
   if(*p=='#')
      return NULL;
   if(*p=='\"') {
      q=p;
      p+=strlen(q);
         // reconstruct the string mangled by
         // strtok()
      while( ((*p)==0)&amp;&amp;(p<lineend))
         *p=' ';
      p=strtok(q+1,"\"");
      if(p!=NULL)
         p--;
   }
   return p;
}

The above is my function that obeys my simple rules. The function accepts a starting position and an ending position. The starting position may be NULL as it is simply passed on to strtok(). The ending position is used when reconstructing part of the string that strtok() has overwritten.

Initially, get_token() scans the input string delimited by whitespace. If the result is NULL, that value is returned.

If the result begins with our comment character '#', a NULL is returned, signaling the end of processing for that line.

If the first character of the result is a dquote, the string-mangled by strtok() are replaced with spaces until either no more are found or until the original end of the line of text is reached. get_token() needs the end parameter to ensure that the original end-of-line position is not exceeded.

Once reconstructed, a call to strtok() is again made, beginning at the character after the first dquote, using a dquote as the only separator character. If that result is not NULL, the previous position is returned to the caller which causes the first dquote to appear, but not the last.

Please consider this test script file: script.txt


println "What is your name?"
# Read data into variable a
inputa


# Display a greeting.
println "Hello there, " a " ... "

# Implement our own "PAUSE" feature
println "Press ENTER to continue."
inputa

# execute something from the shell
println "Here's a list of your current directory,"
sys "dir /w"

# drop out
exit
println "You won't get here."

The following program will display the tokens for each line of text in the file specified on the command-line:

parser.c


// A parser for a small command-line interpreter.
//
// License: MIT / X11
// Copyright (c) 2010 by James K. Lawless
// jimbo@radiks.net http://www.radiks.net/~jimbo
// http://www.mailsend-online.com
//
// Permission is hereby granted, free of charge, to any person
// obtaining a copy of this software and associated documentation
// files (the "Software"), to deal in the Software without
// restriction, including without limitation the rights to use,
// copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the
// Software is furnished to do so, subject to the following
// conditions:
//
// The above copyright notice and this permission notice shall be
// included in all copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
// EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
// OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
// NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
// HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
// WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
// FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
// OTHER DEALINGS IN THE SOFTWARE.

#include <stdio.h>
#include <string.h>


char *get_token(char *linestart,char *lineend);

int main(int argc,char **argv) {
   FILE *fp;
   char buff[1024];
   char *end;
   char *token;
   int i;
   fp=fopen(argv[1],"r");
   if(fp==NULL) {
      fprintf(stderr,"Cannot open input file %s\n",argv[1]);
      return 1;
   }
   while(fgets(buff,sizeof(buff)-1,fp)!=NULL) {
      buff[strlen(buff)-1]=0;
      end=buff+strlen(buff);
      
      token=get_token(buff,end);
      for(i=1;token!=NULL;i++) {
         printf("%-3d %s\n",i,token);
         token=get_token(NULL,end);
      }
      printf("\n");
   }
   fclose(fp);
}

char *get_token(char *linestart,char *lineend) {
   char *p,*q;
   p=strtok(linestart," \t\r\n");
   if(p==NULL)
      return NULL;
   if(*p=='#')
      return NULL;
   if(*p=='\"') {
      q=p;
      p+=strlen(q);
         // reconstruct the string mangled by
         // strtok()
      while( ((*p)==0)&amp;&amp;(p<lineend))
         *p=' ';
      p=strtok(q+1,"\"");
      if(p!=NULL)
         p--;
   }
   return p;
}

After compilation, you may invoke the above program against the file script.txt by issuing the following command-line:


parser script.txt

The output should appear as follows:


1 println
2 "What is your name?


1 inputa




1 println
2 "Hello there,
3 a
4 " ...



1 println
2 "Press ENTER to continue.

1 inputa



1 println
2 "Here's a list of your current directory,

1 sys
2 "dir /w



1 exit

1 println
2 "You won't get here.

Here's a small command-line interpreter that allows for one variable (a) and understands the following commands:

cli.c


// A small command-line interpreter.
//
// License: MIT / X11
// Copyright (c) 2010 by James K. Lawless
// jimbo@radiks.net http://www.radiks.net/~jimbo
// http://www.mailsend-online.com
//
// Permission is hereby granted, free of charge, to any person
// obtaining a copy of this software and associated documentation
// files (the "Software"), to deal in the Software without
// restriction, including without limitation the rights to use,
// copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the
// Software is furnished to do so, subject to the following
// conditions:
//
// The above copyright notice and this permission notice shall be
// included in all copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
// EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
// OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
// NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
// HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
// WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
// FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
// OTHER DEALINGS IN THE SOFTWARE.

#include <stdio.h>
#include <string.h>

char *get_token(char *linestart,char *lineend);
char _var_a[256];

int main(int argc,char **argv) {
   FILE *fp;
   char buff[1024];
   char *end;
   char *token;
   int i;
   fp=fopen(argv[1],"r");
   if(fp==NULL) {
      fprintf(stderr,"Cannot open input file %s\n",argv[1]);
      return 1;
   }
   while(fgets(buff,sizeof(buff)-1,fp)!=NULL) {
      buff[strlen(buff)-1]=0;
      end=buff+strlen(buff);
      
      token=get_token(buff,end);
      if(token==NULL)
         continue;
      if(!stricmp(token,"inputa")) {
         if((fgets(_var_a,sizeof(_var_a)-1,stdin))!=NULL) {
            _var_a[strlen(_var_a)-1]=0;
         }
         else {
            *_var_a=0;
         }
      }
      else
      if(!stricmp(token,"println")) {
         for(;;) {
            token=get_token(NULL,end);
            if(token==NULL)
               break;
            if(*token=='\"')
               printf("%s",token+1);
            else
            if(!stricmp(token,"a"))
               printf("%s",_var_a);
            else
               printf("%s",token);
         }
         printf("\n");
      }
      else
      if(!stricmp(token,"exit")) {
         break;
      }
      else
      if(!stricmp(token,"sys")) {
         token=get_token(NULL,end);
         if(token!=NULL)
            system(token);
      }
      else {
         fprintf(stderr,"Unknown command %s\n",token);
         break;
      }
   }
   fclose(fp);
}

char *get_token(char *linestart,char *lineend) {
   char *p,*q;
   p=strtok(linestart," \t\r\n");
   if(p==NULL)
      return NULL;
   if(*p=='#')
      return NULL;
   if(*p=='\"') {
      q=p;
      p+=strlen(q);
         // reconstruct the string mangled by
         // strtok()
      while( ((*p)==0)&amp;&amp;(p<lineend))
         *p=' ';
      p=strtok(q+1,"\"");
      if(p!=NULL)
         p--;
   }
   return p;
}

To execute this program after compilation, enter the following:


cli script.txt

You should be prompted for your name. The script should then greet you. It should then wait for you to hit ENTER. Finally, it should display your current directory ( if you're running in Windows. You might want to change the dir command to ls if you're running in Linux. )

Note that since the script encounters an exit verb before the last line, the last println is never executed.

Blank lines and lines beginning with '#' are automatically ignored.

The source code, sample script file, and Windows EXE files from this post can be found here:

http://www.mailsend-online.com/wp/cli.zip

Unless otherwise noted, all code and text entries are Copyright ©2010 by James K. Lawless



Views expressed in this blog are those of the author and do not necessary reflect those of the author's employer. Views expressed in the comments are those of the responding individual.

stumbleupon Save to StumbleUpon
digg Digg it
reddit Save to Reddit
facebook Share on Facebook
twitter Share on Twitter
aolfav More bookmarks


Previous post: Preserving my Favorite HN Links
Next post:Along Came AWK


About Jim ...


Click **here**
to try out MailWrench;
a command-line SMTP /
SMTPS (Google Gmail)
mailer for Windows.


Follow me on Twitter

http://twitter.com/lawlessGuy


Recent Posts

A JavaScript REPL for Android Devices

MailSend is Free

My Blog Engine

The October 10th Bug

A Review of Kevin Mitnick's Book Ghost in the Wires

Spellbound by Web Programming

Backlinks to my Blog Posts

Play MP3 Files with Python on Windows


Random Posts

Spellbound by Web Programming

Auto Save Images from the Clipboard

Obfuscated Perl

RSS feed processing with AWK

Blogoversary

TAP : A Command Processor Library

We've Moved!

A Data Manipulation Library for TAP

BBS Fun in the Eighties

An Interview with Tom Zimmer: Forth System Developer


Full List of Posts

http://www.mailsend-online.com/bloglist.htm


Recent Posts from my Other Blog

Remembering Dr. San Guinary

Why Some Web Sites will go Dark on Jan 18th

SNL Superhero Skit

More Ruby Games

My Ruby Game Challenge Entry

Steal this Bookmarklet

Nerd Toys

Learn New Jargon, You Must

Spot the Wiebe

Tech Magazine Glory Days

Book Review : Paull Allen - Idea Man

A 90's Experiment in Online Systems - The U.S. West CommunityLink Service