Sunday, March 27, 2011

Command line parsing :)

There is a nice saying, "do not re-invent the wheel!" It so happens, many times, we don't realize, there is already a wheel available exact to the specifications we are looking for.

Command line parsing in any programming language is a good example for this scenario.
I have a simple scenario,
  • Accept a few command line options for the program I am writing, based on the command line option, the semantics of the further arguments vary.
  • Each of the options can have sub-options (not compulsory though!)
  • Sub-options have a different meaning based on the command line switch (if we can call each of the option as a switch)
Prior to this post, I used to write a painful while loop considering various combinations of the inputs. So you guessed it right, the code for command line parsing was mostly bigger than the actual application code. I am pretty sure many of you would have faced this situation.

This was the first time I faced a challenge in terms of the number of combinations I have to consider before the entire command line was parsed! It seemed herculean.

Now what??

Google bhagwan ki jai :D :D

I came across something called getopt() in the GNU standard library. (Sorry Windows folks, I am not sure how to solve this problem on Windows yet). Getopt is a neat little library to parse the command line. There is big brother version of this library function, the getopt_long() function to parse long options. I have not yet explored the getopt_long() function. getopt() served my purpose, so i didn't go beyond that page.

The GNU page on the same is quite exhaustive. However, I found the example not so intuitive and easy. So here goes my example.

#include <stdio.h> #include <unistd.h> #include <stdlib.h> #include <string.h> struct cmd_opts { char *option; int caseIndex; int edge; int interior; }; struct cmd_opts *co; enum {CASE = 0, EDGE, INTERIOR}; const char *indices[] = { [CASE] = "ci", [EDGE] = "sub1", [INTERIOR] = "sub2", NULL }; void printUsage(){ printf ("USAGE: \n" "-h -- prints this help \n" "-t -- specify a tetrahedron case \n" "-x -- specify a hexahedron case, \n" "\t a comma separated value list. ci=<caseindex>, \n" "\t sub1=<sub case index for face ambiguity resolution> \n" "\t sub2=<sub case index for interior ambiguity resolution>\n"); } int parseCmd(int argc, char **argv) { char *casevalue = NULL; char *subopts; int index; int c; co = (struct cmd_opts*) calloc(1, sizeof(struct cmd_opts)); co->option = (char *) calloc (64, sizeof(char)); opterr = 0; while ((c = getopt (argc, argv, "hx:t:")) != -1) { switch (c) { case 'h': printUsage(); break; case 't': strcpy(co->option, "tetrahedron"); casevalue = optarg; printf("\nTetrahedron : Case = %s", casevalue); break; case 'x': strcpy(co->option, "hexahedron"); printf("\nHexahedron\n"); subopts = optarg; while(*subopts != '\0') { switch( getsubopt (&subopts, indices, &casevalue) ) { if(casevalue == NULL) { printf("\nUnspecified suboption!! -- Quitting\n\n"); return 1; } case CASE: printf("Value of : %s is : %s\n", indices[CASE], casevalue); break; case EDGE: printf("Value of : %s is : %s\n", indices[EDGE], casevalue); break; case INTERIOR: printf("Value of : %s is : %s\n", indices[INTERIOR], casevalue); break; default: printf("Unknown suboption!!\n\n"); break; } } break; case '?': switch(optopt) { case 'h': printf("Option -%c requires an argument\n", optopt); break; default: printf("Unknown option '-%c'\n", optopt); return 1; break; } default: abort (); } } return 0; } int main(int argc, char **argv) { parseCmd(argc, argv); return 0; }

The above listing was obtained using this link

Let me take you through this example,
The possible values of the sub-options are listed in the following array (or a vector if you may call it) :
const char *indices[] =  { [CASE] = "ci", [EDGE] = "sub1", [INTERIOR] = "sub2", NULL };

The function call getopt (argc, argv, "hx:t:") returns an integer, which typically is the integer value of the given options "h" "x" and "t". This integral value is used in the following switch statement.
To elaborate on the options provided above,
"h" has no colon following it, indicating it is an argument-less option
"x" and "t" have a colon following, indicating they need argument(s) that follow, something like :  
-h ci=3.


When the option is encountered, and it is destined to have arguments, the arguments are contained in the pointer optarg, defined in the library.

We can further use this optarg to process the sub-options. (Consider the case for x in the code)

The getsubopt (&subopts, indices, &casevalue) function, takes the pointer to the command line following the option parsed above, the array indicating the possible options (again indicated above) and a pointer to a string, which will contain the key-value pair(s) of the arguments.
This has some particular restrictions which are not necessarily pleasing to the programmer/user but I guess for now we have to live with it. For instance, the sub-options are delimited by comma and the key-value pairs are delimited by equal to symbol ( = ) we can't change that.
If a sub-option listed in the "indices" is encountered, the key-value pairs are extracted. This continues till there are no more comma delimited values encountered. Hence the same is put in a while loop with-in the case for "x" to extract all the sub-options.

When getopt() does not find an argument following an option, it returns "?". This can be further used to provide diagnostic messages or optionally terminate the program (as in my example).

I am including some running examples below, to help appreciate the library capabilities.

Examples:

$ ./opt -x ci=3,sub1=1,sub2=2
Hexahedron
Value of : ci is : 3
Value of : sub1 is : 1
Value of : sub2 is : 2

$ ./opt -h
USAGE:
-h -- prints this help
-t -- specify a tetrahedron case
-x -- specify a hexahedron case,
     a comma separated value list. ci=<caseindex>,
     sub1=<sub case index for face ambiguity resolution>
     sub2=<sub case index for interior ambiguity resolution>

$ ./opt -t 4
Tetrahedron : Case = 4

$


I hope this is useful when you are faced with a similar problem :)

Signing off,
Aks

No comments: