Pretty Diff - Processing the JSX format from Facebook's React

Introduction

React.js is a JavaScript template framework coming out of Facebook. Perhaps the most exciting feature of this technology is its data format for defining templates called JSX. Essentially, JSX allows writing HTML template code directly into a JavaScript function using XML syntax. Version 0.12 of React brought big changes to JSX to include:

Elimination of the previously required identifying pragma comment
JavaScript comments may occur freely within the XML body outside of curly braces
The means to walk in and out of JavaScript syntax and XML.

Overview of Pretty Diff support

Pretty Diff claims to fully integrate v0.12 JSX support into all its modes and features. Supporting JSX properly means providing an XML parser to a JavaScript parser which can also call upon that JavaScript parser from within areas of JSX's XML tags. A good markup parser also makes use of a minifier to normalize white space within an XML element, aside from JavaScript comments that do not otherwise occur in XML. Pretty Diff makes use of these two parsers:

In order to support prior existing features of Pretty Diff, namely command line access to multiple various files, JSX language detection is automatic and not provided as a user selectable choice.

Accounting for such novel complexity opens the potential for parsing problems that are otherwise entirely absent. For example, Pretty Diff does not claim support for intermixing the tag nesting features of JSTL syntax with the capabilities of JSX since each of these languages are supersets whose syntax violates each other. JSX parsing complexity is also challenging in that movement between various parsers is frequent and multidimensional, which makes debugging a bit more challenging compared to regular JavaScript or HTML.

How Pretty Diff parses JSX

Parsing with jspretty

JSX language is automatically detected from a JavaScript code sample. This means the code sample must begin as JavaScript syntax. If a document begins as XML syntax it will be parsed only as HTML or XML. JSX is detected by looking for less than characters, <, that start an XML tag. These characters are operators in JavaScript syntax, so I look to see if the less than character is preceded immediately optionally by any white space characters or opening parenthesis and then preceded by either a return keyword or any operator. Once an XML body is detected it is followed all the way to the end of the XML body. This XML body is represented as a single token in the jspretty parser for hand off directly to markuppretty. Along with handing over a code sample the depth of indentation is also handed over so that the markup can be indented according to the wrapping JavaScript.

Parsing with markuppretty

The markuppretty parser will treat XML-like portions of JSX exactly as XML with added support for three forms of JavaScript within XML tags:

Block comments that begin with /* and end with */
Line comments that begin with // and end with \n
All other JavaScript code may freely occur inside or outside an XML tag so long as it is wrapped in curly braces

Curly brace JavaScript, whether inside or outside of an XML tag, is packaged for handing back off to jspretty. This allows beautified or minified JavaScript to be included within the larger XML code sample that is handed back off to jspretty in a later step. If this sub-sample of JavaScript contains XML tags jspretty will interpret the tags correctly passing them back through this process as individual parsing tokens for precise interpretation.

JavaScript comments are not passed back to jspretty. Line breaks are properly maintained for block comments and after line comments. White space preceding lines of code in block comments may not be precisely preserved by the entire Pretty Diff process. This is a non-critical edge case that occurs from various different parsers touching this code at various different times and sometimes more than once.

This process allows for separate handling of embedded JavaScript and proper handling for XML syntax. JSX code is fully parsed at this point and features minimal complexity, aside from the intermixing of XML and JavaScript syntax, compared to regular HTML/XML.

Extending JSX support to jsscope

Intention

The jspretty library includes a feature called jsscope. This feature colors variables and the background of functions to associate variables with a given scope depth of declaration. The idea is to help educate the ideas of closure and functional programming while also providing a quick and easy form of code analysis and to extend the benefits of this feature to JavaScript variables expressed within JSX markup code. Read more about jsscope in the guide about closures and jsscope.

A deeper understanding of the problem

The jsscope feature operates by forming a list of variable names stored in an array index that represents the closing of a function where those variables are declared. This allows the application to know exactly what variables are declared in which functions. Unfortunately, this array is bound to the current execution context of the jspretty library and cannot be efficiently passed around.

The jspretty library can identify when JSX markup is present in JavaScript, but does not parse that markup to know if it contains JavaScript variables. The markuppretty library does provide identification of JavaScript from within JSX markup and passes this code recursively back to jspretty. Once a child instance of jspretty is launched there is no access the variable identifying array from the previous execution instance, but word tokens can be identified from the JavaScript parsing.

Providing a solution

In order to solve this problem jspretty must know that it is operating in a recursive instance and that the jsscope feature is requested. This requires that the jsscope option be identified to the markup_beauty library so that can be identified to the markuppretty library and then passed back to jspretty. The jspretty library believes the proper conditions are met when the jsscope option is enabled, the code instance starts with a curly brace, and the internal jsx option is set to true.

Once word tokens are identified in the child instance of jspretty they are wrapped in pseudo tags: [pdjsxscope]. This is necessary to identify the word tokens in a way that will persist through several parsing libraries, but not be prematurely parsed as HTML. The output of jsscope is an HTML format, so all HTML tags contained in the output must be properly escaped.

Once the markup is analyzed for beautification by jspretty the pseudo tags are identified for processing. The contained word token is compared to the array of identified variables, and if a match is found the pseudo tags are converted to em tags to properly color the variable to its respective functional depth. If there is no match the pseudo tags are simply removed.

Pretty Diff - Guide: Processing the JSX format from Facebook's React

Processing the JSX format from Facebook's React