Quantcast
Viewing latest article 2
Browse Latest Browse All 3

Answer by user52292 for Lexer+Parser code for my "Reedoo" programming language

This will be quite big project it seems. I will try to give you some hints (I have written something like ECMA-262 / JavaScript compiler and iterpreter with custom bytecode as school project).

If you really plan to develop big language, then you should probably learn about LEX, YACC and BISON, but you'd have to learn about formal grammar for parsers.

If you are fine with learning by trial and error, your Lexar is not bad, but I would advise to start writing some custom classes and enums instead of all those strings:

std::string reserved[14] = { "print", "string", "sc", "variable", "eq", "undefined","nl", "num", "expr", "eof", "if", "else", "and", "or" };

You should use some constants to make your code readable and maintainable:

enum LaxarWords {    lxPrint, lxString, lxSemiColon, ...

You can then immediatelly use these to produce some bytecode instead of the text representation. It is not a requirement, but a personal hint - I have actually started with small calculator, infix-to-postfix and infix-to-prefix conversion algorithms, expression evalution, and finally ended with ECMA-262. The byte code I have created used prefix notation, something like this:

while(greater(var("x"),const("0")), code(decrement(var("x"))))

This can be rewritten in some byte-code and recursively executed (using e.g. array of function-pointers). No need for the byte-code, good classes/structures with pointers should do.

When I read your code, I have noticed using namespace std; but std::string and std::vector. You should probably choose if you want to use std:: prefix or place some using std::string and such. You can run into unexpected troubles with using namespace std.

Parser

void parse(vector<string> tokens) {  .  .  .  while (i < tokens.size()) {    TOP:if (tokens[i] +""+ tokens[i+1] == "print sc") {

This does not look good:

  1. First, you have to copy the whole vector, using const vector<string>& should be better, if you don't need to change the tokens inside parser.
  2. You are accessing the vector past the end
  3. Place the label TOP: on separate line, please, this looks quite ugly.

This IF definitely needs some work, some parsing helper (like get_word) and separate if(current_token == "print") { ... (fetch the string to some variable first).

if (tokens[i] +""+ tokens[i+1].substr(0,6) +""+ tokens[i+2] == "print string sc" or        tokens[i] +""+ tokens[i+1].substr(0,3) +""+ tokens[i+2] == "print num sc" or        tokens[i] +""+ tokens[i+1].substr(0,4) +""+ tokens[i+2] == "print expr sc" or        tokens[i] +""+ tokens[i+1].substr(0,8) +""+ tokens[i+2] == "print variable sc") {

Viewing latest article 2
Browse Latest Browse All 3

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>