Introduction to Programming Languages
Phases of a Translator

Phases of a Translator

Resources

En el Repo ULL-ESIT-GRADOII-PL/esprima-pegjs-jsconfeu-talk (opens in a new tab) encontrará el material de esta lección. Clone este repo.

The examples in this repo use a couple of JavaScript compiler frameworks: Esprima (opens in a new tab) and Espree (opens in a new tab).

Introducción a Espree

REPL example

Espree (opens in a new tab) is a JavaScript parser that is designed for use in static code analysis and linting tools. It is a fast and lightweight alternative to other popular JavaScript parsers such as Esprima.

Espree is designed to be compatible with the latest ECMAScript standards, and it can parse most of the features introduced in ECMAScript 2015 (ES6) and later versions. It is also designed to produce a syntax tree format that is similar to Esprima, which makes it easy to integrate into existing code analysis tools that rely on the Esprima API.

It is maintained by the eslint team, which is a popular code linting tool for JavaScript. Many other tools also use Espree (opens in a new tab) under the hood to parse JavaScript code, such as Babel (opens in a new tab) and webpack (opens in a new tab).

It started out as a fork of Esprima (opens in a new tab) v1.2.2, the last stable published released of Esprima before work on ECMAScript 6 began. Espree (opens in a new tab) is now built on top of Acorn (opens in a new tab), which has a modular architecture that allows extension of core functionality.

Una vez clonado el repo ULL-ESIT-GRADOII-PL/esprima-pegjs-jsconfeu-talk (opens in a new tab), instalamos las dependencias:

➜  esprima-pegjs-jsconfeu-talk git:(master) npm i

y arrancamos el bucle REPL de Node.JS:

➜  esprima-pegjs-jsconfeu-talk git:(master) node
Welcome to Node.js v14.4.0.
Type ".help" for more information.

Espree supportedEcmaVersions

Cargamos espree:

> const espree = require('espree')
undefined
> espree.version
'7.3.1'
> espree.latestEcmaVersion
12
> espree.supportedEcmaVersions
[
  3,  5,  6,  7, 8,
  9, 10, 11, 12
]

Análisis léxico

Hagamos un análisis léxico:

> espree.tokenize('answer = /* comment*/ 42', { range: true })
[
  Token {
    type: 'Identifier',
    value: 'answer',
    start: 0,
    end: 6,
    range: [ 0, 6 ]
  },
  Token {
    type: 'Punctuator',
    value: '=',
    start: 7,
    end: 8,
    range: [ 7, 8 ]
  },
  Token {
    type: 'Numeric',
    value: '42',
    start: 22,
    end: 24,
    range: [ 22, 24 ]
  }
]

Análisis sintáctico con Espree

Hagamos ahora un análisis sintáctico:

> espree.parse('const answer = 42', { tokens: true })
Uncaught [SyntaxError: The keyword 'const' is reserved
] {
  index: 0,
  lineNumber: 1,
  column: 1
}

La versión ECMA de JS usada por defecto por espree es la 5 y esta no admite const

Especifiquemos la versión ECMA que queremos:

> espree.parse('const answer = 42', 
              { ecmaVersion: espree.latestEcmaVersion, 
                tokens: true }
              )
Node {
  type: 'Program',
  start: 0,
  end: 17,
  body: [
    Node {
      type: 'VariableDeclaration',
      start: 0,
      end: 17,
      declarations: [Array],
      kind: 'const'
    }
  ],
  sourceType: 'script',
  tokens: [
    Token { type: 'Keyword', value: 'const', start: 0, end: 5 },
    Token { type: 'Identifier', value: 'answer', start: 6, end: 12 },
    Token { type: 'Punctuator', value: '=', start: 13, end: 14 },
    Token { type: 'Numeric', value: '42', start: 15, end: 17 }
  ]
}

La opción comment nos permite obtener los comentarios:

> espree.parse('a = /* comment */ 32;', { tokens: true, comment: true })
Node {
  type: 'Program',
  start: 0,
  end: 21,
  body: [ ... ],
  sourceType: 'script',
  comments: [
    {
      type: 'Block',
      value: ' comment ',
      start: 4,
      end: 17,
      range: [Array]
    }
  ],
  tokens: [ ...  ]
}

See the documentation deployed by the teacher at ull-esit-pl.github.io/espree (opens in a new tab)

util.inspect

Observe que el Árbol no aparece completo. El log que usa el bucle REPL de Node lo trunca en el hijo declarations (sólo nos muestra que es un array [Array] sin expandirlo) para que la salida no sea excesivamente larga.

Para que nos muestre el árbol vamos a usar el método util.inspect del módulo util que convierte un objeto en una string:

> const util = require('util')
undefined
> console.log(
    util.inspect(
        espree.parse('const answer = 42',{ecmaVersion: 6}), 
        {depth: null}
    )
 )
Node {
  type: 'Program',
  start: 0,
  end: 17,
  body: [
    Node {
      type: 'VariableDeclaration',
      start: 0,
      end: 17,
      declarations: [
        Node {
          type: 'VariableDeclarator',
          start: 6,
          end: 17,
          id: Node {
            type: 'Identifier',
            start: 6,
            end: 12,
            name: 'answer'
          },
          init: Node {
            type: 'Literal',
            start: 15,
            end: 17,
            value: 42,
            raw: '42'
          }
        }
      ],
      kind: 'const'
    }
  ],
  sourceType: 'script'
}
undefined

El Objeto AST generado por el parser de Espree

Ves que el objeto está compuesto de objetos de la clase Node. Si te concentras sólo en los campos type del objeto queda mas evidente como el objeto describe la jerarquía AST construída para la frase answer = 42. Puedes instalar el script compast (opens in a new tab) en ULL-ESIT-PL/compact-js-ast (opens in a new tab) para ver un resumen del AST:

➜  esprima-pegjs-jsconfeu-talk git:(master) ✗ compast -p 'const answer = 42' 
[
  'Program',
  [
    'VariableDeclaration',
    [
      'VariableDeclarator',
      [ 'Identifier', 'answer' ],
      [ 'Literal', 42 ]
    ]
  ]
]

que se corresponde con el siguiente diagrama:

Tipos de Nodos y nombres de los hijos

Navegar en el árbol AST es complicado. El atributo espree.visitorKeys nos proporciona la lista de nodos y los nombres de los atributos de sus hijos

> const typesOfNodes = Object.keys(espree.VisitorKeys)
undefined
> typesOfNodes.slice(0,4)
[
  'AssignmentExpression',
  'AssignmentPattern',
  'ArrayExpression',
  'ArrayPattern'
]

El valor nos da los nombres de los atributos que define los hijos:

> espree.VisitorKeys.AssignmentExpression
[ 'left', 'right' ]
> espree.VisitorKeys.IfStatement
[ 'test', 'consequent', 'alternate' ]

El web site ASTExplorer.net

Usando la herramienta web https://astexplorer.net (opens in a new tab) podemos navegar el AST producido por varios compiladores JS:

Traversing the AST

Traversing with estraverse

The file idgrep.js (opens in a new tab) is a very simple example of using Esprima to do static analysis on JavaScript code.

It provides a function idgrep that finds the appearances of identifiers matching a search string inside the input code.

/shared/esprima-pegjs-jsconfeu-talk-labs/esprima-pegjs-jsconfeu-talk/examples/idgrep/idgrep.js
const esprima = require("espree");
const program = require("commander");
const { version } = require("../../package.json");
const estraverse = require("estraverse");
 
const idgrep = function (pattern, code, filename) {
  const lines = code.split("\n");
  if (/^#!/.test(lines[0])) code = code.replace(/^.*/, ""); // Avoid line "#!/usr/bin/env node"
  const ast = esprima.parse(code, {
    ecmaVersion: 6,
    loc: true,
    range: true,
  });
  estraverse.traverse(ast, {
    enter: function (node, parent) {
      if (node.type === "Identifier" && pattern.test(node.name)) {
        let loc = node.loc.start;
        let line = loc.line - 1;
        console.log(
          `file ${filename}: line ${loc.line}: col: ${loc.column} text: ${lines[line]}`
        );
      }
    },
  });
};
 
program
  .version(version)
  .description('Searches for IDs in a list of programs')
  .option("-p --pattern [regexp]", "regexp to use in the search", "hack")
  .usage("[options] <filename> ...");
 
program.parse(process.argv);
const options = program.opts();
const pattern = new RegExp(options.pattern);
 
if (program.args.length == 0) program.help();
 
for (const inputFilename of program.args) {
  try {
    fs.readFile(inputFilename, "utf8", (err, input) => {
      debugger;
      if (err) throw `Error reading '${inputFilename}':${err}`;
      idgrep(pattern, input, inputFilename);
    });
  } catch (e) {
    console.log(`Errores! ${e}`);
  }
}

To know more about Estraverse see the Estraverse Usage (opens in a new tab) and Estraverse README.md (opens in a new tab)

Call estraverse.traverse or estraverse.replace with an object that has the following methods:

  • enter - Called when entering a node
  • leave - Called when leaving a node

Both of these methods have the following signature: function(node, parent). Note that parent can be null in some situations.

The enter function may control the traversal by returning the following values (or calling corresponding methods):

  • estraverse.VisitorOption.Skip / this.skip() - Skips walking child nodes of this node. The leave function will still be called. See test/replace.js (opens in a new tab)
  • estraverse.VisitorOption.Break / this.break() - Ends it all

The leave function can also control the traversal by returning the following values:

  • estraverse.VisitorOption.Break / this.break() - Ends it all

In estraverse.replace you can also return one of:

  • new node to replace old one with

  • estraverse.VisitorOption.Remove / this.remove() - Removes current node from parent array or replaces with null if not element of array.

See an example of estraverse.replace (opens in a new tab)

Examples of executions

With two input files:

➜ (private) ✗ ./idgrep.js  espree-logging-solution.js hello-ast-espree.js -p ast
file espree-logging-solution.js: line 13: col: 10 text:     estraverse.traverse(ast, {
file espree-logging-solution.js: line 14: col: 24 text:         enter: function(node) {
file espree-logging-solution.js: line 23: col: 30 text: }
file hello-ast-espree.js: line 3: col: 6 text: function getAnswer() {
file hello-ast-espree.js: line 8: col: 25 text: undefined

With a single file and testing hacky.js (opens in a new tab) (Observe how the appearances of hack inside the comment or the string aren't shown)

➜  esprima-pegjs-jsconfeu-talk git:(private) ✗ ./idgrep.js -p hac hacky.js                                       
file hacky.js: line 2: col: 6 text:     /* This hack does not count */
file hacky.js: line 4: col: 8 text:     let another = 9;

When the file doesn't exist:

➜  esprima-pegjs-jsconfeu-talk git:(private) ✗ ./idgrep.js fhjdfjhdsj     

/Users/casianorodriguezleon/campus-virtual/shared/esprima-pegjs-jsconfeu-talk-labs/esprima-pegjs-jsconfeu-talk/idgrep.js:45
      if (err) throw `Error reading '${inputFilename}':${err}`;
               ^
Error reading 'fhjdfjhdsj':Error: ENOENT: no such file or directory, open 'fhjdfjhdsj'
(Use `node --trace-uncaught ...` to show where the exception was thrown)

How to build a Parser

First Steps on Building a Parser with Jison

See the examples in the repo crguezl/hello-jison (opens in a new tab)

This repo (opens in a new tab) contains two examples:

  • The first one is a simple interpreter for infix arithmetic expressions with the minus operator only
    • See files minus.jison, minus.l and use_minus.js
  • The second is a translator from infix arithmetic expressions to JavaScript
    • minus-ast.jison builds a Espree compatible AST using minus.l and the helpers in ast-build.js
    • The lexical analyzer minus.l is reused
  • The ast-*.json files contain examples of Espree ASTs

Calculator example with PEG.js from the talk Parsing, Compiling, and Static Metaprogramming

altjs.js (opens in a new tab) is the code for the "AltJS language in 5 minutes" section presented in the second half of the talk Parsing, Compiling, and Static Metaprogramming (opens in a new tab) by Patrick Dubroy

References