A Simple Parser for a Small Command Line Interface
Originally published on: Sat, 02 Jan 2010 15:58:55 +0000
I like to create and tinker with little programming languages. Before I began to write more complex lexical analyzer functions, I used the strtok() function to retrieve tokens for a given miniature language.
Unfortunately, strtok() can be difficult to use once you introduce modest scanning rules.
I needed a function that would allow me to retrieve the next token from a line of text. This token was to be separated by spaces unless the token begins with a double quotation-mark. If the token begins with a dquote, I want to retrieve all characters including spaces up to ( but not including ) the next dquote.
If a '#' symbol is encountered, I want the parser to return NULL. I will treat this symbol as a comment character.
Consider this code-snippet ( MIT / X11 from the full source applies ):
char *get_token(char *linestart,char *lineend) {
char *p,*q;
p=strtok(linestart," \t\r\n");
if(p==NULL)
return NULL;
if(*p=='#')
return NULL;
if(*p=='\"') {
q=p;
p+=strlen(q);
// reconstruct the string mangled by
// strtok()
while( ((*p)==0)&&(p<lineend))
*p=' ';
p=strtok(q+1,"\"");
if(p!=NULL)
p--;
}
return p;
}
The above is my function that obeys my simple rules. The function accepts a starting position and an ending position. The starting position may be NULL as it is simply passed on to strtok(). The ending position is used when reconstructing part of the string that strtok() has overwritten.
Initially, get_token() scans the input string delimited by whitespace. If the result is NULL, that value is returned.
If the result begins with our comment character '#', a NULL is returned, signaling the end of processing for that line.
If the first character of the result is a dquote, the string-mangled by strtok() are replaced with spaces until either no more are found or until the original end of the line of text is reached. get_token() needs the end parameter to ensure that the original end-of-line position is not exceeded.
Once reconstructed, a call to strtok() is again made, beginning at the character after the first dquote, using a dquote as the only separator character. If that result is not NULL, the previous position is returned to the caller which causes the first dquote to appear, but not the last.
Please consider this test script file:
script.txt
println "What is your name?"
# Read data into variable a
inputa
# Display a greeting.
println "Hello there, " a " ... "
# Implement our own "PAUSE" feature
println "Press ENTER to continue."
inputa
# execute something from the shell
println "Here's a list of your current directory,"
sys "dir /w"
# drop out
exit
println "You won't get here."
The following program will display the tokens for each line of text in the file specified on the command-line:
parser.c
// A parser for a small command-line interpreter.
//
// License: MIT / X11
// Copyright (c) 2010 by James K. Lawless
// jimbo@radiks.net http://www.radiks.net/~jimbo
// http://www.mailsend-online.com
//
// Permission is hereby granted, free of charge, to any person
// obtaining a copy of this software and associated documentation
// files (the "Software"), to deal in the Software without
// restriction, including without limitation the rights to use,
// copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the
// Software is furnished to do so, subject to the following
// conditions:
//
// The above copyright notice and this permission notice shall be
// included in all copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
// EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
// OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
// NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
// HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
// WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
// FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
// OTHER DEALINGS IN THE SOFTWARE.
#include <stdio.h>
#include <string.h>
char *get_token(char *linestart,char *lineend);
int main(int argc,char **argv) {
FILE *fp;
char buff[1024];
char *end;
char *token;
int i;
fp=fopen(argv[1],"r");
if(fp==NULL) {
fprintf(stderr,"Cannot open input file %s\n",argv[1]);
return 1;
}
while(fgets(buff,sizeof(buff)-1,fp)!=NULL) {
buff[strlen(buff)-1]=0;
end=buff+strlen(buff);
token=get_token(buff,end);
for(i=1;token!=NULL;i++) {
printf("%-3d %s\n",i,token);
token=get_token(NULL,end);
}
printf("\n");
}
fclose(fp);
}
char *get_token(char *linestart,char *lineend) {
char *p,*q;
p=strtok(linestart," \t\r\n");
if(p==NULL)
return NULL;
if(*p=='#')
return NULL;
if(*p=='\"') {
q=p;
p+=strlen(q);
// reconstruct the string mangled by
// strtok()
while( ((*p)==0)&&(p<lineend))
*p=' ';
p=strtok(q+1,"\"");
if(p!=NULL)
p--;
}
return p;
}
After compilation, you may invoke the above program against the file script.txt by issuing the following command-line:
parser script.txt
The output should appear as follows:
1 println
2 "What is your name?
1 inputa
1 println
2 "Hello there,
3 a
4 " ...
1 println
2 "Press ENTER to continue.
1 inputa
1 println
2 "Here's a list of your current directory,
1 sys
2 "dir /w
1 exit
1 println
2 "You won't get here.
Here's a small command-line interpreter that allows for one variable (a) and understands the following commands:
- println - Display any number of literal strings or the variable a followed by a newline on the output console
- inputa - Retrieve a string from the console standard input device and leave it in variable a.
- sys - Execute a command from the shell using the first argument token only. All other tokens are ignored.
- exit - Exit the script
cli.c
// A small command-line interpreter.
//
// License: MIT / X11
// Copyright (c) 2010 by James K. Lawless
// jimbo@radiks.net http://www.radiks.net/~jimbo
// http://www.mailsend-online.com
//
// Permission is hereby granted, free of charge, to any person
// obtaining a copy of this software and associated documentation
// files (the "Software"), to deal in the Software without
// restriction, including without limitation the rights to use,
// copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the
// Software is furnished to do so, subject to the following
// conditions:
//
// The above copyright notice and this permission notice shall be
// included in all copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
// EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
// OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
// NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
// HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
// WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
// FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
// OTHER DEALINGS IN THE SOFTWARE.
#include <stdio.h>
#include <string.h>
char *get_token(char *linestart,char *lineend);
char _var_a[256];
int main(int argc,char **argv) {
FILE *fp;
char buff[1024];
char *end;
char *token;
int i;
fp=fopen(argv[1],"r");
if(fp==NULL) {
fprintf(stderr,"Cannot open input file %s\n",argv[1]);
return 1;
}
while(fgets(buff,sizeof(buff)-1,fp)!=NULL) {
buff[strlen(buff)-1]=0;
end=buff+strlen(buff);
token=get_token(buff,end);
if(token==NULL)
continue;
if(!stricmp(token,"inputa")) {
if((fgets(_var_a,sizeof(_var_a)-1,stdin))!=NULL) {
_var_a[strlen(_var_a)-1]=0;
}
else {
*_var_a=0;
}
}
else
if(!stricmp(token,"println")) {
for(;;) {
token=get_token(NULL,end);
if(token==NULL)
break;
if(*token=='\"')
printf("%s",token+1);
else
if(!stricmp(token,"a"))
printf("%s",_var_a);
else
printf("%s",token);
}
printf("\n");
}
else
if(!stricmp(token,"exit")) {
break;
}
else
if(!stricmp(token,"sys")) {
token=get_token(NULL,end);
if(token!=NULL)
system(token);
}
else {
fprintf(stderr,"Unknown command %s\n",token);
break;
}
}
fclose(fp);
}
char *get_token(char *linestart,char *lineend) {
char *p,*q;
p=strtok(linestart," \t\r\n");
if(p==NULL)
return NULL;
if(*p=='#')
return NULL;
if(*p=='\"') {
q=p;
p+=strlen(q);
// reconstruct the string mangled by
// strtok()
while( ((*p)==0)&&(p<lineend))
*p=' ';
p=strtok(q+1,"\"");
if(p!=NULL)
p--;
}
return p;
}
To execute this program after compilation, enter the following:
cli script.txt
You should be prompted for your name. The script should then greet you. It should then wait for you to hit ENTER. Finally, it should display your current directory ( if you're running in Windows. You might want to change the dir command to ls if you're running in Linux. )
Note that since the script encounters an exit verb before the last line, the last println is never executed.
Blank lines and lines beginning with '#' are automatically ignored.
The source code, sample script file, and Windows EXE files from this post can be found here:
http://www.mailsend-online.com/wp/cli.zip
Unless otherwise noted, all code and text entries are Copyright ©2010 by James K. Lawless
Save to del.icio.us
Save to StumbleUpon
Digg it
Save to Reddit
Share on Facebook
Share on Twitter
More bookmarks