Adding OOP to the Egg Parser

Introduction

In this lab, we want to increase the expressiveness of our Egg language.

The following example shows some of the extensions that we want to introduce:

examples/object-colon-selector.egg

cat examples/object-colon-selector.egg 
do (
  def(x, { # object literals!
    c: [1, 2, 3], # array literals!
    gc:  fun(
           element(self, "c") # old way works
         ), 
    sc:  fun(value, # look at the left side of the assignment!
           =(self.c[0], value)
         ),
    inc: fun( 
           =(self.c[0], +(self.c[0], 1)) 
         ) 
  }),
  print(x),
  print(x.gc()),    # [1, 2, 3]
  x.sc(4),
  print(x.gc()),    # [4,2,3]
  x.inc(),
  print(x.gc()),    # [5,2,3]
  print(x.c.pop()), # 3
  print(x.c)        # [5,2]
)

Take a look at some of the features introduced:

Added braces {} to refer to object literals: def(x, { ... })
Note the appearance of the colon : token to separate the attribute name from the value in an object
Added brackets [] to refer to array literals [1, 2, 3]
It is possible to access the properties of an object using the dot as in x.c
In this version of the Egg language, self denotes the object. It is like this in JS
It is possible to access the properties of any object using square brackets as in self.c[0]

Download evm with OOP extensions

During the development of this lab, you can execute the ASTs generated by your parser using one of the interpreters in this release:

Egg Virtual Machine with OOP extensions for Windows/Linux/Mac OS (opens in a new tab). This release was built using vercel/pkg (opens in a new tab)

download the version you need for the development of this lab, make a symbolic link to have it at hand:

~/campus-virtual/2223/pl2223/practicas/egg-oop-parser/egg-oop-parser-solution/(master)

✗ cd bin
✗ ln -s ~/campus-virtual/shared/egg/oop-evm-releases/evm-2122-macos ./evm
✗ ls -l bin 
-rwxr-xr-x  1 casianorodriguezleon  staff   215 28 abr  2023 egg
-rwxr-xr-x  1 casianorodriguezleon  staff  1678 28 abr  2023 eggc.js
lrwxr-xr-x  1 casianorodriguezleon  staff    85 19 abr  2023 evm -> ~/campus-virtual/shared/egg/oop-evm-releases/evm-2122-macos
✗ # Also make a symbolic link on node_modules/.bin so that you can execute it with npx
✗ ln -s /Users/casianorodriguezleon/campus-virtual/shared/egg/oop-evm-releases/evm-2122-macos node_modules/.bin/evm

and try with some example:

examples/object-colon-selector.json

✗ bin/evm examples/object-colon-selector.json 
{"c":[1,2,3]}
[1,2,3]
[4,2,3]
[5,2,3]
3
[5,2]

Here is a proposal for the auxiliary script bin/egg that uses the evm executable:

bin/egg

➜  egg-oop-parser-solution git:(master) ✗ cat bin/egg
#!/bin/bash
if [ $# -eq 0 ]
then
    echo "Example of usage \"bin/egg examples/array-dot.egg\""
    exit 1
fi
BASENAME=${1/.egg/}
bin/eggc.js "$BASENAME".egg
if [ $? -ne 0 ]
then
    exit 1
fi
bin/evm $BASENAME.json

Multiple Attribute Indexation

You can make multiple indexation of an object so that a[0,2] means a[0][2]:

examples/multiple-properties.egg

✗ cat examples/multiple-properties.egg 
do(
    def(a, [[4,5,6], 1,2,3]),
    def(b, a[0,2]),
    print(b) # 6
)                                                                                                           
✗ bin/eggc.js examples/multiple-properties.egg
✗ npx evm examples/multiple-properties.json   
6

Same for objects a["p", "q", "r"] means a.p.q.r or a["p"]["q"]["r"]:

examples/multiple-properties-object.egg

✗ cat examples/multiple-properties-object-dot.egg        
do(
    def(a, { p : { q : { r : 1 } } }),
    def(b, a["p", "q", "r"]),
    print(b),      # 1
    print(a.p.q.r) # Same
)     
✗ bin/eggc.js examples/multiple-properties-object-dot.egg
✗ npx evm examples/multiple-properties-object-dot.json   
1
1

This is the section of the grammar that allows the use of property indexation:

src/egg.ne

expression -> ...
    | %WORD applies
 
applies -> calls
    | properties
    | null
properties ->  bracketExp  applies
 
bracketExp -> "["  commaExp "]"
 
commaExp -> null
   | expression ("," expression):*

Property indexation and commaExp is nullable

Notice that commaExp is nullable, and thus it fits with an empty indexation expression like a[] which initially makes nonsense, (but read the next section). To fix the problem, we can change the grammar introducing a new category nonEmptyBracketExp or simply to check for the presence of an expression at compile time so that we can protest if the index list is empty:

examples/empty-bracket.egg

➜  egg-oop-parser-solution git:(empty-bracket) ✗ cat examples/empty-bracket.egg
do(
    def(a, [1,2,3]),
    print(a[])
)

➜  egg-oop-parser-solution git:(empty-bracket) ✗ bin/eggc.js examples/empty-bracket.egg
There was an error: Syntax error accesing property at line 3 col 12.
Specify at least one property.

The Syntactically Correct, Semantically Absurd Language Design Pattern

The "Syntactically Correct, Semantically Absurd" Language Design Pattern

Whenever a phrase is syntactically correct and it seems semantically absurd like is the case of x[], I usually stop for a moment and consider 🤔 if there is some not obvious meaning we can give to it.

(opens in a new tab) Noam Chomsky. 1957 Syntactic Structures

May be we can give to x[]the meaning "to return a deep copy of x"? (See structuredClone (opens in a new tab) node v17+ or npm package realistic-structured-clone (opens in a new tab))
For instance all arrays, objects and maps have in common the length property. May be we can give to x[]the meaning "to return x.length"?
May be we can introduce the concept of default value for each of the entities in our language? So that x[] means x["default"]? For instance:
- The default of a function is what is returned if the underlying JS function throws or returns undefined
- The default of an array is what is returned if the index is the element is undefined or the index is out of bounds
- The default of an object is what is returned if the property is not defined

Natural Language and Psycholinguistics

Psycholinguistics is the discipline that investigates and describes the psychological processes that make it possible for humans to master and use language. Psycholinguists conduct research on

speech development and
language development and
how individuals of all ages comprehend and produce language.

For descriptions of language, the field relies on the findings of linguistics, which is the discipline that describes the structure of language.

Although the acquisition, comprehension, and production of language have been at the core of psycholinguistic research, the field has expanded considerably since its inception:

The neurology of language functioning is of current interest to psycholinguists, particularly to those studying

sex differences,
aphasia,
language after congenital or acquired injury to the immature brain, and
developmental disorders of language (dysphasia).

Some psycholinguists have also extended their interests to experiments in nonhuman language learning (e.g., gorillas, chimpanzees, orcas, ...) to discover if language as we know it is a uniquely human phenomenon.

Here is the Lex Friedman interview to psycholinguistic prof. Edward Gibson at MIT
See also A Cognitive Approach to Syntax: Dependency Grammar#1 (opens in a new tab) Prof. Edward Gibson:

Currying in Egg

When the argument used to index a function object is not an attribute of the function

someFun[arg1, ... ] # and "arg1" is not a property of "someFun"

then we want arg1, ... to be interpreted as arguments for someFun and the expression returns the currying of the function (opens in a new tab) on arg1, ....

For instance:

examples/curry-no-method.egg

✗ cat examples/curry-no-method.egg        
print(+[4](2))

In this version of the Egg interpreter + is a function that takes an arbritrary number of numbers:

+: \cup_{i=1}^{\infty}\mathbb{R}^i \longrightarrow \mathbb{R}

and returns its sum. The curried

+[4]: \cup_{i=1}^{\infty}\mathbb{R}^i \longrightarrow \mathbb{R}

is the function defined by

+[4](x_2, \cdots, x_n) = +(4, x_2, \cdots, x_n)

Here is the implementation of the arithmetic operations in this version of the Egg interpreter that take an arbritrary number of numbers:

~/campus-virtual/shared/egg/eloquentjsegg/lib/eggvm.js branch=2223

// arithmetics
[
  '+', 
  '-', 
  '*', 
  '/', 
  '**',
].forEach(op => {
  topEnv[op] = new Function('...s', `return s.reduce((a,b) => a ${op} b);`);
});

Execution:

➜  egg-oop-parser-solution git:(master) ✗ bin/eggc.js examples/curry-no-method.egg 
➜  egg-oop-parser-solution git:(master) ✗ bin/evm examples/curry-no-method.json 
6

However, if the attribute exists we want an ordinary property evaluation, as in this example:

examples/function-length-property.egg

➜  egg-oop-parser-solution git:(master) cat examples/function-length-property.egg
do(
    def(f, fun(x, y, +(x,y))),
    print(f["numParams"]) # JS length property is not supported
)
➜  egg-oop-parser-solution git:(master) ✗ bin/egg examples/function-length-property
2

We have added an attribute numParams to the Egg Function objects that returns the number of parameters in its declaration (opens in a new tab).

🚫

Design Consideration

The decision of overloading the meaning of the property access for functions is a risky one but has few consequences over the grammar design.

The decision of overloading the meaning of the property access for functions has consequences during the interpretation phase.

In this case the idea behind the proposal is that

Any potential argument of a function can be viewed as a property of such function whose value is the function curried for that argument

which makes the design proposal consistent with the idea of property

Currying and the dot operator

The dot operator for objects a.b is defined in such a way that a.b and a["b"] are the same thing. This is why the former program examples/curry-no-method.egg can be rewritten this way:

examples/curry-no-method-dot.egg

➜  egg-oop-parser-solution git:(master) ✗ cat  examples/curry-no-method-dot.egg 
print(+.4(2))

➜  egg-oop-parser-solution git:(master) ✗ bin/egg examples/curry-no-method-dot 
6

Changing the `evaluate` method

You have to add the code in lines 12-14 to return the curryfied function:

  evaluate(env) {
    if (this.operator.type == "word" && this.operator.name in specialForms) { 
      // ... ?
    }
 
    let theObject = this.operator.evaluate(env);
    let propsProcessed = this.args.map((arg) => arg.evaluate(env));
    let propName = checkNegativeIndex(theObject, propsProcessed[0]);
 
    if (theObject[propName] || propName in theObject) {
      // ... theObject has a property with name "propName" 
    } else if (typeof theObject === "function") {
      // theObject is a function, curry the function 
      // using propsProcessed as fixed arguments
    } else 
      throw new TypeError(`...`);
  }

Examples of currying `4["+", 5](3)`

examples/curry-multiple-indices.egg

➜  eloquentjsegg git:(main) ✗ cat test/examples/curry-multiple-indices.egg
do(
    print(4["+", 5](3)),
    print(4["+", 5, 9](3))
)

Here is the execution:

➜  eloquentjsegg git:(main) ✗ bin/egg.js test/examples/curry-multiple-indices.egg
12
21

and here is the section of AST corresponding to the sub-expression 4["+", 5](3):

campus-virtual/shared/egg/examples/curry-multiple-indices.egg.evm

{
  "type": "apply",
  "operator": {
    "type": "property",
    "operator": { "type": "value",  "value": 4  },
    "args": [
      { "type": "value", "value": "+" },
      { "type": "value", "value": 5   }
    ]
  },
  "args": [
    { "type": "value", "value": 3 }
  ]
}

Operations as methods of numbers

To keep our version of Egg symmetric and regular let us consider the arithmetic operations as methods of the number objects. This way number 4 has + as a method. This can be combined with currying to write code like the following:

examples/curry-method.egg

➜  egg-oop-parser-solution git:(master) cat examples/curry-method.egg
do (
  print(4["+"][5](3)), 
  print(4.+[5](3)),    # Same thing 12
  print(4["*"][5](3)), # 4["*"](5, 3) # 60
  print(6["/"][2](3)), # 6["/"](2, 3) # 1
  print(6["-"][2](3))  # 6["/"](2, 3) # 1
)

We say that something has a symmetry when it remains similar under transformation. Here the symmetry results of the fact that the different transformations 4["+"], +[4], +.4 and 4.+ produce the same function. A wide range of desirable properties of programming languages can be expressed as symmetry properties. See the blog and video NOETHER: symmetry in programming language design (opens in a new tab)

The ambiguities that arise in the expression 4.+ are discussed in section The Dot Ambiguity: Property dot or Mantissa dot?.

Execution:

➜  egg-oop-parser-solution git:(master) ✗ bin/egg examples/curry-method        
12
12
60
1
1

Selectors: the Dot Operator

Most OOP languages allow the use of the notation x.y as a synonym of x["y"]. To add it to Egg we add the production properties -> selector applies to the grammar.

Lines 8-10 show the rules for the new syntactic variable selector:

src/egg.ne

applies -> calls
    | properties
    | null
 
properties ->  bracketExp  applies
    | selector applies            
 
selector   ->  
     "." %WORD
   | "." %NUMBER

We want to allow programs like the following:

examples/dot-chain.egg

➜  egg-oop-parser-solution git:(master) ✗ cat examples/dot-chain.egg 
print([1,4,5].join("-").length) # Same as array(1,4,5)["join"]("-")["length"]                          
➜  egg-oop-parser-solution git:(master) ✗ bin/egg examples/dot-chain
5

same thing with object literals:

examples/dot-obj-literal.egg

➜  egg-oop-parser-solution git:(master) ✗ cat examples/dot-obj-literal.egg 
print({x : 3}.x) # 3
➜  egg-oop-parser-solution git:(master) ✗ bin/egg examples/dot-obj-literal
3

and also:

examples/dot-num.egg

➜  egg-oop-parser-solution git:(master) ✗ cat examples/dot-num.egg 
print(4.3.toFixed(2))
➜  egg-oop-parser-solution git:(master) ✗ bin/egg examples/dot-num 
4.30

and even program like this one:

Using dot to select elements of an array

examples/array-dot.egg

➜  egg-oop-parser-solution git:(master) ✗ cat examples/array-dot.egg 
do(
    def(a, [[1,2],3]),
    print(a.0.1)
)
 
➜  egg-oop-parser-solution git:(master) ✗ bin/egg examples/array-dot 
2

Think on the sub-expression above a.0.1 from the lexical analysis point of view. A naif approach will lead to the token's flow [WORD{a}, DOT, NUMBER{0.1}]

Extended ASTs Tree Grammar

Introduction to the `property` nodes

Consider the following Egg program:

examples/dot-num.egg

✗ cat examples/dot-num.egg                         
print(4.3.toFixed(2))

The AST generated has a new type of node called property to represent object property access:

examples/dot-num.json

✗ cat examples/dot-num.json 
{
  "type": "apply",
  "operator": { "type": "word", "name": "print" },
  "args": [
    {
      "type": "apply",
      "operator": {
        "type": "property",
        "operator": { "type": "value", "value": 4.3, },
        "args": [ { "type": "value", "value": "toFixed", }  ]
      },
      "args": [ { "type": "value", "value": 2, } ]
    }
  ]
}

The type in line 9 is property, which tell us that this AST node correspond to the operation of accesing the attributes of the object in its operator child.
The operator in line 10 refers to the AST of the Egg object being described ( $obj=$ 4.3).
The args in line 11 refers to the ASTs of the attributes or properties.
- The first element of args $t_0$ is the AST of a direct property $p_0$ of the object $obj$ in the operand (toFixed).
- The second $t_1$ is a property $p_1$ of the object $p_0$
- The third $t_2$ is a property $p_2$ of the object $p_1$
- ... and so on

Here is a second example, the AST for the expression a[0,2]:

examples/multiple-properties-simplified.egg

{
  "type": "property",
  "operator": { "type": "word", "name": "a" },
  "args": [
    { "type": "value", "value": 0 },
    { "type": "value", "value": 2 }
  ]
}

AST Grammar

Our parser should therefore produce an AST conforming to this tree grammar:

ast: VALUE
   | WORD 
   | APPLY( operator: ast args:[ ast * ]))
   | PROPERTY(operator: ast args:[ ast * ]))

Los nodos APPLY tienen dos atributos operator y args
El atributo operatorde un nodo APPLY contiene información sobre la función que lo interpreta (if, while, print, +, etc.)
El atributo args de un nodo APPLY es un ARRAY conteniendo los ASTs que se corresponden con los argumentos para la función asociada con operator.
Los nodos PROPERTY tienen dos atributos operator y args
El atributo operator de un nodo PROPERTY contiene información sobre el objeto (por ejemplo en [1,2,3][0] el operator sería el AST de [1, 2, 3], En {a: 1, b:2}.a sería el AST de {a: 1, b:2})
El atributo args de un nodo PROPERTY es un ARRAY conteniendo los ASTs que se corresponden con los atributos/propiedades del objeto que está en operator. Véase la sección The Shape of Property ASTs
Los nodos WORD son nodos hoja y tienen al menos el atributo name.
Los nodos VALUE tienen al menos el atributo value.

Example `4.3.toFixed(2)`

A term is a way to describe an AST: to the right of the node type and between curly braces we write the attribute: value pairs that we want to highlight. For example, the AST for 4.3.toFixed(2) could be described by this term:

APPLY(
  operator:PROPERTY(
    operator:VALUE{value:4.3}, 
    args:VALUE{value:"toFixed"}
  ),
  args:VALUE{value:2}
)

Notice that the node for toFixed is a VALUE node, not a WORD node. This is because the second dot in 4.3.toFixed is interpreted as 4.3["toFixed"].

You can use the npm package evm2term (opens in a new tab) to convert the AST to a term:

examples/dot-num.egg

➜  egg-oop-parser-solution git:(master) ✗ cat examples/dot-num.egg 
print(4.3.toFixed(2))
➜  egg-oop-parser-solution git:(master) ✗ npx evm2term -i examples/dot-num.json
apply(
  op:word{"print"},
  args:[apply(
    op:property(
      op:value{4.3},
      args:[value{"toFixed"}]),
    args:[value{2}])])

If you have difficulties review the section Anatomy of ASTs for Egg

The Shape of Property ASTs

The final shape of property-type generated ASTs depends on how you implement the functions in the src/build-ast.js library. Consider the following input:

➜  egg-oop-parser-solution git:(master) cat examples/ast-shape-for-property-nodes.egg 
[[1,2]][0,1]

What will be the AST of your compiler for such input?. Here is a simplified notation (a term) for the AST generated by my implementation of the parser:

PROPERTY(
  op: APPLY(
    op: WORD{array},
    args: [
      APPLY(
        op: WORD{array}
        args: [ VALUE{1}, VALUE{2}]
      )
    ]
  ),
  args: [VALUE{0}, VALUE{1}]
)

Notice that the property node args array has two elements. Here is the actual JSON.

Other examples of what args contains for different property ASTs:

For the expression [[1,2],3][0,1] it would be the ASTs of [0, 1] or
For [[1,2],3]["length"] would be the AST of ["length"]
For {a: [1, 2], b:2}["a", 0] would be the ASTs of ["a", 0])

The Dot Ambiguity: Property dot or Mantissa dot?

Entering the dot to select the object property causes an ambiguity with the dot inside the floats:

✗ cat test/examples/dot-num.egg
print(4.3.toFixed(2))

Proposal The proposal is to solve the ambiguity by giving priority to the interpretation of the dot as a number dot if the point is followed by a digit, otherwise we are accessing the number property

Thus, the execution of the example above gives:

bin/eggc.js test/examples/dot-num.egg 
✗ npx evm test/examples/dot-num.json  
4.30

Solution So, inside the lexical analyzer, the regexp for NUMBER has to be processed before the regexp for DOT:

const NUMBER = /(?<NUMBER>[-+]?\d+(\.\d+)?(?:[eE][-+]?\d+)?)/; // \d+ to resolve ambiguity
const DOT = /(?<DOT>\.)/;
...
const tokens = [ SPACE, NUMBER, ...  DOT,  ... ];
 
...
 
let lexer = nearleyLexer(tokens, { transform: [colonTransformer, NumberToDotsTransformer] });

This is different from what JS does, it doesn't allow using dot as an attribute selector. In JS the ambiguity is resolved by parentizing the number:

> 4.toFixed(2)
4.toFixed(2)
^^
Uncaught SyntaxError: Invalid or unexpected token
> (4).toFixed(2)
'4.00'

Lexical Transformations

To facilitate the task of doing this lab, it is convenient that we return to the lexer-generator module and modify its API a bit, providing it with the ability to add lexical transformations.

To do this, the nearleyLexer function will now receive an additional parameter of an object with options:

let lexer = nearleyLexer(tokens, { transform: transformerFun });

The only option we are going to add is transform. When specified, it applies the transformerFun function to each of the tokens of the lexer object generated by nearleyLexer.

We can have more than one lexical transformations to apply. Thus, we allow the transform property to be an array, so that the builder nearleyLexer can be called this way:

let lexer = nearleyLexer(tokens, { transform: [colonTransformer, NumberToDotsTransformer] });

Adding transformations to the nearley compatible lexer To achieve the goal we have to modify the reset method (opens in a new tab) of our nearley compatible object:

lexer-generator-solution/src/main.js

const nearleyLexer = function (regexps, options) {
  ...
  return {
    ...
    reset: function (data, info) { 
      this.buffer = data || '';
      this.currentPos = 0;
      let line = info ? info.line : 1;
      this.tokens = lexer(data, line);
      if (options && options.transform) {
        if (typeof options.transform === 'function') {
          this.tokens = options.transform(this.tokens);
        } else if (Array.isArray(options.transform)) {
          options.transform.forEach(trans => this.tokens = trans(this.tokens))
        }
      } 
      return this;
    }
    ...
  }
}

See the code for the nearley lexer at section La función nearleyLexer of the lab Lexer Generator

The Lexical Word Colon Transformation

We want to add the colon as syntactic sugar to our language. We want to transform all the pair subsequences WORD, COLON into STRING, COMMA sequences so that phrases like x: 4 are interpreted as "x", 4".

In this way we can write a program like this:

examples/colon.egg

✗ cat examples/colon.egg 
do(
  def(b, [a:4]), # The : is a "lexical" operator
  print(b)
)

so that when compiled

➜  egg-oop-parser-solution git:(master) bin/eggc.js examples/colon.egg
➜  egg-oop-parser-solution git:(master) ✗ jless examples/colon.json

and when executed produces:

✗ bin/eggc.js examples/colon.egg
✗ npx evm examples/colon.json   
["a",4]

Proposal The idea is that inside our lexer we write a lexical transformation function:

function colonTransformer(tokens) {
  // ... s/WORD COLON/STRING COMMA/g
 return tokens;
}

This transformation is what allow us to deal with the colon syntax to describe the object in the example examples/object-colon-selector.egg in section introduction

examples/object-colon-selector.egg

def(x, { 
  c: [1, 2, 3], 
  gc:  fun(element(self, "c")), 
  sc:  fun(value, =(self.c[0], value)),
  inc: fun(=(self.c[0], +(self.c[0], 1)))
})

Full Grammar

The following grammar is a NearleyJS non ambiguous grammar that allows the requested features and extends the previous Egg grammar we introduced in lab egg-parser:

program -> expression %EOF
expression -> 
      %STRING  optProperties
    | %NUMBER  optProperties
    | bracketExp optProperties 
    | curlyExp   optProperties
    | %WORD applies           
 
applies -> calls
    | properties
    | null
calls ->  parenExp applies
properties ->  bracketExp  applies
    | selector applies            
 
parenExp   -> "("  commaExp ")"
bracketExp -> "["  commaExp "]"
curlyExp   -> "{"  commaExp "}"
 
selector   ->  
     "." %WORD
   | "." %NUMBER
commaExp -> null
   | expression ("," expression):*
 
optProperties -> null
   | properties

Syntax Diagram/Railroad Diagram

A new Ambiguity: Number Dot Number

Just for fun and to go beyond what any other programming language allows we want the dot to work with numbers as property selector. This is something, to my knowledge, no language allows. For instance, in JS:

➜  src git:(main) ✗ node
Welcome to Node.js v16.0.0.
Type ".help" for more information.
> a = [[1,2],3,4]
[ [ 1, 2 ], 3, 4 ]
> a[0][0]
1
> a.0.0
a.0.0
 ^^
Uncaught SyntaxError: Unexpected number

You can not use the notation a.0.0 to select the a[0][0] element since allowing this notation confuses the interpreter.

Even if the JS designers would take a decision as the one we took in section The Dot Ambiguity: Property dot or Mantissa dot? it will not suffice: The lexer will interpret the 0.0 in a.0.0 as a word a followed by floating point 0.0!.

This goal (the dot to work with numbers as property selector) is the reason I introduced the "." %NUMBER production in the selector rule:

selector   ->  
     "." %WORD
   | "." %NUMBER

this, if correctly implemented, will allow us to write programs like this one:

✗ cat examples/array-dot.egg 
do(
    def(a, [[1,2],3]),
    print(a.0.1)
)

that will produce this output:

➜  egg-oop-parser-solution git:(master) bin/eggc.js examples/array-dot.egg
➜  egg-oop-parser-solution git:(master) bin/evm examples/array-dot.json 
2

the key observation here is that

Disambiguation Rule In an Egg program a number token corresponding to a floating point as 0.1 or 0.0 can not be preceded by a dot token.

Notice that before a dot token not necessarily comes a word, but it can be a complex expression like in this other example (Observe the first dot at line 4):

examples/function-returning-array-dot-number.egg

✗ cat examples/function-returning-array-dot-number.egg 
do(
    def(f, fun([[0,Math.PI],2])), # A function that returns an array
    print(f().0.1)
)

When executed we obtain:

✗ bin/eggc.js examples/function-returning-array-dot-number.egg
✗ npx evm examples/function-returning-array-dot-number.json   
3.141592653589793

Proposal The proposed solution is to write another lexical transformation:

// Substitute DOT NUMBER{4.3} by DOT NUMBER{4} DOT NUMBER{3}
function NumberToDotsTransformer(tokens) {
    /* ... fill the code ... */
    return tokens;
}

Raw versus value

🚫

Be careful to work with the lexical raw attribute of the NUMBER token inside the NumberToDotsTransformer and not to work directly with the value attribute of the NUMBER token. Otherwise if you use instead the value attribute and have activated the value transformer it will occur that during the first lexical pass strings like 0.0 or 1.0 will have their value transformed to numbers like 0 and 1 and the dot access information will be lost!.

The following example test the problem:

➜  egg-oop-parser-solution git:(master) ✗ cat examples/array-literal-dot-antunez.egg
print([[1,2],3].0.0)
➜  egg-oop-parser-solution git:(master) ✗ bin/egg examples/array-literal-dot-antunez.egg
1

The transformation has to substitute DOT{.} NUMBER{raw:"0.0"} by DOT NUMBER{0} DOT{.} NUMBER{0} and not by DOT NUMBER{0}.

The Evaluation/Interpretation of Property nodes

How should we interpret the property nodes?

We can try to mimick the structure of the "evaluate" method for the Apply nodes. Here is a first approach scheme to the body of evaluate for Property nodes:

evaluate(env) {
    let theObject = this.operator.evaluate(env);
    let propsProcessed = this.args.map((arg) => arg.evaluate(env));
    let propName = checkNegativeIndex(theObject, propsProcessed[0]);
 
    if (propName in theObject) { // theObject has a property with name "propName"
      let obj = theObject;
      for(let i = 0; i< propsProcessed.length; i++) {
        let element = propsProcessed[i];
        let oldObj = obj;
        element = checkNegativeIndex(obj, element);
        obj = obj[element];
        if (typeof obj === "function") {
          // What shall we do if we evaluate to a function during the evaluation?
        } 
      }
      return obj; // it is a property
    } else if (typeof theObject === "function") {
      return //... return currified function
    } else {
      return // What shall I return?? make it more compatible with JS semantics
    }
  }

What shall we do if we evaluate to a function during the evaluation? We can curry the rest:

  for(let i = 0; i< propsProcessed.length; i++) {
    ...
    obj = obj[element];
    if (typeof obj === "function") {
      obj = obj.bind(oldObj);  // bind the function to the former object
 
      if (i < propsProcessed.length-1) {
        let propName = checkNegativeIndex(obj, propsProcessed[i+1]);
        if (!(obj[propName] || propName in obj)) { // obj hasn't a property with name "propName"       
          let remaining = propsProcessed.slice(i+1); // Curry it!
          return (...args) => obj(...remaining, ...args); 
        }  
      } else {
        return obj;
      }
    } 
  }

In the following example:

➜  eloquentjsegg git:(private2223) ✗ cat examples/curry-multiple-indices.egg 
print(4["+", 5](3))

+ is a property of the number 4 and 4["+"] is a function so we curry the rest of the arguments:

➜  eloquentjsegg git:(private2223) ✗ bin/egg.js examples/curry-multiple-indices.egg
12

This input example is similar to the former but instead of an argument as 5 has the JS property length of the JS functions:

➜  eloquentjsegg git:(private2223) ✗ cat examples/curry-multiple-indices-but-property.egg
print(4["+", "length"])%                                                               
➜  eloquentjsegg git:(private2223) ✗ bin/egg.js examples/curry-multiple-indices-but-property.egg
0

If the object hasn't a property with name "propName" we return the curry of the function

    if (theObject[propName] || propName in theObject) { 
      ... 
    } else if (typeof theObject === "function") {
      return (...args) => theObject(...propsProcessed, ...args); 
    } else {
      ...
    }

Translate self to this

Remember to translate self to this

Array Literals

Let us study now the support for Array Literals. The involved rules are:

expression ->  ...
    | bracketExp optProperties
bracketExp -> "["  commaExp "]"
 
optProperties -> null
   | properties

The idea is that the transformer associated to the bracketExp rule builds an apply node like

APPLY(operator:(WORD{name:array}, args: commaexp)

where commaexp is the AST forest associated with the appearance of commaExp in the production bracketExp -> "[" commaExp "]".

Object Literals

The production rules for object literals are:

expression -> ...
    | curlyExp   optProperties
curlyExp   -> "{"  commaExp "}"
 
optProperties -> null
   | properties

As for array literals, the idea is that the transformer associated to the curlyExp rule builds an apply node like

APPLY(operator:(WORD{name:object}, args: commaexp)

The Begin End Something Language Design Pattern

The solution we have used to solve the two previous sections Array Literals and Object Literals follows a pattern I will call the Begin-End-Something Pattern:

The "Begin End Something" Language Design Pattern

Add a couple of tokens to the language to signal the beginning and the end of the new specialized category of expressions: for instance add [ to begin array literals and ] to end array literals
- Introduce the new tokens in the lexer (be carefull with conflicts, specially with "expansive" tokens. Don't trample on existing "reserved words")
- Modify the grammar adding the new rule(s) for the new kind of expression
Build an AST for the the new category by adding a function buildCategory to your build-ast.js library.
- The function buildCategory returns in fact a specialized case of an already existent kind of AST
- Remember to export the new function and import the new function in your grammar file

Following these instructions it is trivial to extend Egg with a family of constructs as

( ... ) as a synonym of do( ...): See an example in the branch doendo of the solution repo

➜  egg-oop-parser-solution git:(doendo) ✗ cat examples/do-endo.egg 
(
  def(a,4),
  print(a)
)
➜  egg-oop-parser-solution git:(doendo) ✗ bin/egg examples/do-endo
4

loop ... end loop or While ... end While as a synonym of while(...). Do not use while ... end while for the delimiter tokens or you will trample with the already existing word while
etc.

Error Management

The errors produced by Nearley.JS are quite verbose:

➜  egg-oop-parser-solution git:(b2bc2de) cat test/errors/unexpected-token.egg
+{2,3}

test/errors/unexpected-token.egg

➜  egg-oop-parser-solution git:(b2bc2de) bin/eggc.js test/errors/unexpected-token.egg
There was an error: Error near "{" in line 1
Unexpected LCB token: "{". Instead, I was expecting to see one of the following:
 
A "(" based on:
    parenExp →  ● "(" commaExp ")"
    calls →  ● parenExp applies
    applies →  ● calls
    expression → %WORD ● applies
    program →  ● expression %EOF
A "[" based on:
    bracketExp →  ● "[" commaExp "]"
    properties →  ● bracketExp applies
    applies →  ● properties
    expression → %WORD ● applies
    program →  ● expression %EOF
A "." based on:
    selector →  ● "." %WORD
    properties →  ● selector applies
    applies →  ● properties
    expression → %WORD ● applies
    program →  ● expression %EOF
A "." based on:
    selector →  ● "." %NUMBER
    properties →  ● selector applies
    applies →  ● properties
    expression → %WORD ● applies
    program →  ● expression %EOF
A EOF token based on:
    program → expression ● %EOF

In version 2.20.1 of Nearley, the Error object has an attribute token than can be used to simplify the error message.

In the example below we make use of a RegExp to traverse the message attribute of the error and add to the message the expected tokens. In Nearley JS error message you can see many repetitions of the A "<something>" based on: pattern that for named tokens changes to A <something> token based on:

src/parse.js

function parseFromFile(origin) {
  try {
    const parser = new nearley.Parser(nearley.Grammar.fromCompiled(grammar));
    const source = fs.readFileSync(origin, 'utf8');
    parser.feed(source);
    let results = parser.results;
    
    if (results.length > 1) throw new Error(`Language Design Error: Ambiguous Grammar! Generated ${results.length}) ASTs`);
    if (results.length ==  0) {
      console.error("Unexpected end of Input error. Incomplete Egg program. Expected more input");
      process.exit(1);
    }
    const ast = results[0];
    return ast;
  }
  catch(e) {
    let token = e.token;
    let message = e.message;
    let expected = message.match(/(?<=A ).*(?= based on:)/g).map(s => s.replace(/\s+token/i,''));
    let newMessage = `Unexpected ${token.type} token "${token.value}" `+
    `at line ${token.line} col ${token.col}.`;
    if (expected && expected.length) newMessage += ` Tokens expected: ${[...new Set(expected)]}`;  
 
    throw new Error(newMessage)
  }
}

When executed with an erroneous input the message is simplified to:

➜  egg-oop-parser-solution git:(master) ✗ bin/eggc.js test/errors/unexpected-token.egg
Unexpected LCB token "{" at line 1 col 2. Tokens expected: "(","[",".",EOF

Another related idea with error management is to introduce in your Grammar production rules for specific error situations with an associated semantic action that deals with the error. For instance, the rule at line 8 expression -> %EOF is added to control when in the middle of the parsing an unexpected end of file occurs:

expression -> 
      %STRING  optProperties   {% buildStringValue %}
    | %NUMBER  optProperties   {% buildNumberValue %}
    | bracketExp optProperties {% buildArray %}
    | curlyExp   optProperties {% buildObject %}
    | "(" commaExp ")"         {% buildDo %}
    | %WORD applies            {% buildWordApplies %}
    | %EOF                     {% dealWithError %}

➜  egg-oop-parser-solution git:(master) ✗ bin/eggc.js test/errors/unexpected-eof.egg 
Unexpected EOF token near line 1, col 4. Found EOF

Regexps

See Regexp interpretation

Videos

2024/04/22

Clase del 2024/04/22. Encuesta del alumnado. Fin de curso. Egg Interpreter release. Aumentando la gramática de Egg con OOP.

2024/04/23

Lexical transformations: the WORD COLON transformation. Currying.

2024/04/24

Cuestionarios de satisfacción, gorace, chatbot guías docentes. El signo en los números. El do es una metafunción? Construcción de los árboles. Atributos heredados y sintetizados. The dot ambiguity. Array and object literals. Error management.

2023/04/26

Clase del 2023/04/26. Encuesta del alumnado. Fin de curso. Egg Interpreter. Egg oop parser Lab egg-oop-parser:

Adding OOP to the Egg Parser

Introduction

Download evm with OOP extensions

Multiple Attribute Indexation

Property indexation and commaExp is nullable

The Syntactically Correct, Semantically Absurd Language Design Pattern

Natural Language and Psycholinguistics

Currying in Egg

Currying and the dot operator

Changing the evaluate method

Examples of currying 4["+", 5](3)

Operations as methods of numbers

Selectors: the Dot Operator

Extended ASTs Tree Grammar

Introduction to the property nodes

AST Grammar

Example 4.3.toFixed(2)

The Shape of Property ASTs

The Dot Ambiguity: Property dot or Mantissa dot?

Lexical Transformations

The Lexical Word Colon Transformation

Full Grammar

A new Ambiguity: Number Dot Number

Raw versus value

The Evaluation/Interpretation of Property nodes

Translate self to this

Array Literals

Object Literals

The Begin End Something Language Design Pattern

Error Management

Regexps

Videos

2024/04/22

2024/04/23

2024/04/24

2023/04/26

2023/05/02

2023/05/03

Building a Repeat Evaluate Print Loop

Rubric

egg-oop-parser Repos

Resources

Changing the `evaluate` method

Examples of currying `4["+", 5](3)`

Introduction to the `property` nodes

Example `4.3.toFixed(2)`