Pretty Diff - Guide, Processing the JSX format from Facebook's React

Introduction

React.js is a JavaScript template framework coming out of Facebook. Perhaps the most exciting feature of this technology is its data format for defining templates called JSX. Essentially, JSX allows writing HTML template code directly into a JavaScript function using XML syntax. Version 0.12 of React brought big changes to JSX to include:

Elimination of the previously required identifying pragma comment
JavaScript comments may occur freely within the XML body outside of curly braces
The means to walk in and out of JavaScript syntax and XML.

Overview of Pretty Diff support

Pretty Diff claims to fully integrate v0.12 JSX support into all its modes and features. Supporting JSX properly means providing an XML parser to a JavaScript parser which can also call upon that JavaScript parser from within areas of JSX's XML tags. A good markup parser also makes use of a minifier to normalize white space within an XML element, aside from JavaScript comments that do not otherwise occur in XML. Pretty Diff makes use of these three parsers:

In order to support prior existing features of Pretty Diff, namely command line access to multiple various files, JSX language detection is automatic and not provided as a user selectable choice.

Accounting for such novel complexity opens the potential for parsing problems that are otherwise entirely absent. For example, Pretty Diff does not claim support for intermixing the tag nesting features of JSTL syntax with the capabilities of JSX since each of these languages are supersets whose syntax violates each other. JSX parsing complexity is also challenging in that movement between various parsers is frequent and multidimensional, which makes debugging a bit more challenging compared to regular JavaScript or HTML.

How Pretty Diff parses JSX

Parsing with jspretty

JSX language is automatically detected from a JavaScript code sample. This means the code sample must begin as JavaScript syntax. If a document begins as XML syntax it will be parsed only as HTML or XML. JSX is detected by looking for less than characters, <, that start an XML tag. These characters are operators in JavaScript syntax, so I look to see if the less than character is preceeded immediately optionally by any white space characters or opening parenthesis and then preceeded by either a return keyword or any operator. Once an XML body is detected it is followed all the way to the end of the XML body. This XML body is represented as a single token in the jspretty parser for hand off directly to markupmin, if in minify mode, or to the markup_beauty parser. Along with handing over a code sample the depth of indentation is also handed over so that the markup can be indented accoding to the wrapping JavaScript.

Parsing with markupmin

The markup_beauty parser always makes use of the markupmin parser to normalize extraneously characters out of its syntax.

An enhancement to the markupmin parser does all the heavy lifting for JSX. Three forms of JavaScript syntax are recognized within a JSX body:

Block comments that begin with /* and end with */
Line comments that begin with // and end with \n
All other JavaScript code may freely occur inside or outside an XML tag so long as it is wrapped curly braces

Curly brace JavaScript, whether inside or outside of an XML tag, is packaged for handing back off to jspretty. This allows beautified or minified JavaScript to be included within the larger XML code sample that is handed back off to jspretty in a later step. If this sub-sample of JavaScript contains XML tags jspretty will interprete the tags correctly passing them back through this process as individual parsing tokens for precise interpretation.

JavaScript comments are not passed back to jspretty. Line breaks are properly maintained for block comments and after line comments. White space preceeding lines of code in block comments may not be precisely preserved by the entire Pretty Diff process. This is a non-critical edge case that occurs from various different parsers touching this code at various different times and sometimes more than once.

If in minify mode markupmin will pass its output back to jspretty.

This process allows for separate handing of embedded JavaScript and proper handling for XML syntax. markupmin expects to return a string and markup_beauty only expects to receive a string. JSX code is fully parsed at this point ares features minimal complexity, aside from the intermixing of XML and JavaScript syntax, compared to regular HTML/XML. To streamline processing markupmin will join the array of tokens with an internal separator, pdjsxSep, and hand the code off to markup_beauty flagged as JSX code.

Parsing with markup_beauty

Once markup_beauty receives output from markupmin flagged as JSX it parses the input by splitting the source string on pdjsxSep. The resulting array is presumed as fully parsed and passed through regular XML beautification. Alphabetic sorting of tag attributes continues to occur. JavaScript comments embedded within a tag are treated as attributes and alphabetized accordingly. The beautified JSX is padded to match the indentation of the supplying JavaScript and then passed back to jspretty as a beautified XML body.

The markup_beauty and markupmin libraries continue to support the ignore attribute. When the attribute is present the correspoding code will not be beautified.

Return to jspretty

The JSX code returned to jspretty from markup_beauty is beautified as a single code block. This code block may span multiple lines whose indentation may slightly differ in context to the containing JavaScript. This is especially true of jspretty's jsscope feature where each line of code is formatted as HTML for output. The jspretty library will supply to finishing touches to the indentation of the XML block to ensure each line of code is properly padded to match the indentation of the JavaScript code and yet still maintain the beautification provided by markup_beauty.

Extending JSX support to jsscope

Intention

The jspretty library includes a feature called jsscope. This feature colors variables and the background of functions to associate variables with a given scope depth of declaration. The idea is to help educate the ideas of closure and functional programming while also providing a quick and easy form of code analysis and to extend the benefits of this feature to JavaScript variables expressed within JSX markup code. Read more about jsscope in the guide about closures and jsscope.

A deeper understanding of the problem

The jsscope feature operates by forming a list of variable names stored in an array index that represents the closing of a function where those variables are declared. This allows the application to know exactly what variables are declared in which functions. Unfortunately, this array is bound to the current execution context of the jspretty library and cannot be efficiently passed around.

The jspretty library can identify when JSX markup is present in JavaScript, but does not parse that markup to know if it contains JavaScript variables. The markupmin library does provide identification of JavaScript from within JSX markup and passes this code recursively back to jspretty. Once a child instance of jspretty is launched there is no access the variable identifying array from the previous execution instance, but word tokens can be identified from the JavaScript parsing.

Providing a solution

In order to solve this problem jspretty must know that it is operating in a recursive instance and that the jsscope feature is requested. This requires that the jsscope option be identified to the markup_beauty library so that can be identified to the markupmin library and then passed back to jspretty. The jspretty library believes the proper conditions are met when the jsscope option is enabled, the code instance starts with a curly brace, and the jsxstatus variable is set to true.

Once word tokens are identified in the child instance of jspretty they are wrapped in pseudo tags: [pdjsxscope]. This is necessary to identify the word tokens in a way that will persist through several parsing libraries, but not be prematurely parsed as HTML. The output of jsscope is an HTML format, so all HTML tags contained in the output must be properly escaped.

Once the markup is analyzed for beautification by jspretty the pseudo tags are identified for processing. The contained word token is compared to the array of identified variables, and if a match is found the pseudo tags are converted to em tags to properly color the variable to its respective functional depth. If there is no match the pseudo tags are simply removed.

There are likely to arise edge cases that I am not currently aware of. When a bug is encountered please announce it to the world by opening an issue on Github.

For additional options check out the documentation. I take bug and suggestions for enhancements via email and at Github.