This will be quite big project it seems. I will try to give you some hints (I have written something like ECMA-262 / JavaScript compiler and iterpreter with custom bytecode as school project).
If you really plan to develop big language, then you should probably learn about LEX, YACC and BISON, but you'd have to learn about formal grammar for parsers.
If you are fine with learning by trial and error, your Lexar is not bad, but I would advise to start writing some custom classes and enums instead of all those strings:
std::string reserved[14] = { "print", "string", "sc", "variable", "eq", "undefined","nl", "num", "expr", "eof", "if", "else", "and", "or" };
You should use some constants to make your code readable and maintainable:
enum LaxarWords { lxPrint, lxString, lxSemiColon, ...
You can then immediatelly use these to produce some bytecode instead of the text representation. It is not a requirement, but a personal hint - I have actually started with small calculator, infix-to-postfix and infix-to-prefix conversion algorithms, expression evalution, and finally ended with ECMA-262. The byte code I have created used prefix notation, something like this:
while(greater(var("x"),const("0")), code(decrement(var("x"))))
This can be rewritten in some byte-code and recursively executed (using e.g. array of function-pointers). No need for the byte-code, good classes/structures with pointers should do.
When I read your code, I have noticed using namespace std;
but std::string
and std::vector
. You should probably choose if you want to use std::
prefix or place some using std::string
and such. You can run into unexpected troubles with using namespace std
.
Parser
void parse(vector<string> tokens) { . . . while (i < tokens.size()) { TOP:if (tokens[i] +""+ tokens[i+1] == "print sc") {
This does not look good:
- First, you have to copy the whole
vector
, usingconst vector<string>&
should be better, if you don't need to change the tokens inside parser. - You are accessing the vector past the end
- Place the label
TOP:
on separate line, please, this looks quite ugly.
This IF definitely needs some work, some parsing helper (like get_word
) and separate if(current_token == "print") { ...
(fetch the string to some variable first).
if (tokens[i] +""+ tokens[i+1].substr(0,6) +""+ tokens[i+2] == "print string sc" or tokens[i] +""+ tokens[i+1].substr(0,3) +""+ tokens[i+2] == "print num sc" or tokens[i] +""+ tokens[i+1].substr(0,4) +""+ tokens[i+2] == "print expr sc" or tokens[i] +""+ tokens[i+1].substr(0,8) +""+ tokens[i+2] == "print variable sc") {