30 Oct 2016
Home » Java Programming » Generating Lexical Analyser and Parser with JavaCC
Generating Lexical Analyser and Parser with JavaCC
Posted in Java Programming By Code Guru On October 30, 2016JavaCC is a lexer and parser generator for LL(k) grammars. You specify a language’s lexical and syntactic description in a JJ file, then run javacc
on the JJ file. You will get seven java files as output, including a lexer and a parser.
We’ll look at three things you can do with JavaCC
- Do a simple syntax check only ( Lexical Analysis )
- Make an actual interpreter ( Syntactic Analysis )
- Generate code ( Java Byte Code or Machine Code )
JavaCC Installation Guide :
- Download JavaCC from here
- Extract JavaCC files
- You may want to make the javacc script accessible from your path (Set JavaCC path)
- Test it by running javaCC to ensure that javaCC is configured properly
- Set java path
- Test by running java / javac command to ensure java path is set properly and working correctly
JavaCC input Format is :
- header
- Token specifications for lexical analysis
- grammar
Try out the following 1st Basic Example :
/* Basic.jj Performing basic operations +, -, / and *. */ //Header options { STATIC = false ; } PARSER_BEGIN(Basic) class Basic { static void main( String[] args ) throws ParseException, TokenMgrError { Basic parser = new Basic( System.in ) ; parser.Start() ; } } PARSER_END(Basic) //Tokens SKIP : { " " } SKIP : { "\n" | "\r" | "\r\n" } TOKEN : { < PLUS : "+" > } TOKEN : { < SUB : "-" > } TOKEN : { < DIV : "/" > } TOKEN : { < MUL : "*" > } TOKEN : { < NUMBER : (["0"-"9"])+ > } //Grammer void Start() : {} {( ( | | |) )* } Run following command from Command Prompt :
C:\ javaCC Basic.jjJavaCC will generate seven classes each in a separate file.
- TokenMgrError is a simple error class; it is used for errors detected by the lexical analyser and is a subclass of Throwable
- ParseException is another error class; it is used for errors detected by the parser and is a subclass of Exception and hence of Throwable
- Token is a class representing tokens. Each Token object has an integer field kind that represents the kind of the token (PLUS, NUMBER, or EOF) and a String field image, which represents the sequence of characters from the input file that the token represents
- SimpleCharStream is an adapter class that delivers characters to the lexical analyser
- BasicConstants is an interface that defines a number of classes used in both the lexical analyser and the parser
- BasicTokenManager is the lexical analyser
- Basic is the parser
Next Step is to compile these classes with a javac command. For example
C:\javac Basic.javaNext is
C:\java BasicText inside the example.txt file is
20+40+100/50-30Here is a video tutorial