This document was last updated on 7 October 2015 for version 1.14.5.
This document intends to fully describe how the code is organized by module.
Pretty Diff is written in a functional imperative style. Before diving further these terms need to be defined.
The idea behind functional programming is about isolation and decomposition. It is about starting with large ideas and breaking them into small composable fragments that may or may not work together. This is a fully inverted approach compared with the more commonly familiar Object Oriented Programming (OOP) approach.
Functional programming is not about passing things around. To be useful a function needs access to input and must provide some form of contribution. In a language with lexical scope it is not necessary to provide input to functions, such as function arguments, for functions to receive input. It is also not necessary that functions explicity return anything for there to be a contribution. When instructions within functions can access references outside the immediate scope, which are called closures, an implicit input state exists to the function. When instructions within a function are able to write to closures an implicit output state is also present.
Imperative programming is about writing code as directly and bluntly as possible. It is the opposite of the declarative approach to programming, which seeks to wrap a collection of instructions into a named interface. Proper naming is important in the declarative approach, because the name of the interface declares the intention, or use case, of the code it contains. Imperative code reads like a stereo instruction manual or a car owner's manual while the declarative approach reads more like elegant poetry written with incomplete sentences.
The primary differences in these two approaches is the behavior of the code author. Some developers prefer to think about application code as a modular abstraction best described by reading a given unit of code, or not described at all. Other developers may perfer to think about application code as a representational abstraction best described by a descriptive reference. These behaviors are resultant from differences in formative cognitive functions in the mind and require practice to adequetely develope. Imperative developers may describe declarative code as hidden, circular, or spaghetti code. Declarative developers may descrive imperative code as too complex, too terse, or overwhelming.
Order of Declaration
Pretty Diff is written to comply with the default rules of JSLint. JSLint requires that a reference be declared before it is used, which influences the order in which things are declared. A function that references a variable in the same scope must be declared after that variable.
The Pretty Diff application logic is contained in a single function named prettydiff, which resides in a file named prettydiff.js. Other supporting code are the APIs provided in the /api directory.
The prettydiff function is composed of a function named core and a collection of a few libraries. The names for all the library functions are declared as empty functions at the top of the application. There is no way to determine which library should be declared first when two libraries reference each other, so all libraries are declared as empty functions in order to properly instantiate a named reference. The last thing to be declared is the core function, which references all the library functions. Each library function is assigned after the core function is declared.
The core function makes determinations on which libraries to access and how to format the output. The first thing to happen in this function is to set up the options. Internal references are created to values from submitted options or provide defaults where options are missing or invalid. After setting up options from the API core interprets the prettydiff comment if present. Options specified in the prettydiff comment override user submitted options.
The default language value is auto. When auto is the language value a complex series of regular expressions are used to guess at the closest supported language.
After language detection appropriate options are supplied to the appropriate libraries first by mode (diff, beautify, minify) and secondly by language. In the case of the diff mode multiple libraries are accessed for all operations. The minify and beautify options return an informative analysis summary in addition to altered code. The beautify summary differs by language while the minify summary is uniform for all languages. Operations that output formatted HTML, namely the diff mode and the jsscope option, do not produce these summary artifacts.
The libraries are primarily written as parsers. They parse code input, but they make no attempt to interpret or execute that code. Different from many other parsers is that these parsers prepare a list of code tokens and make no attempt to prepare an Abstract Syntax Tree. This uniqueness is intentional so as to focus on analysis and formatting of the code directly instead of some abstract representation of the code.
The larger libraries primarilly store data in parallel arrays and use them in closures by various child functions. This allows for simple data structures that are supremely optimized for performance without any creative human effort. Fast performance is achieved by allowing access to only a particular classification of data at any time and preventing passing large data objects around, which is very slow. This means of coding is extremely imperative and challenging for many programmers to understand at first, but produces substantial reductions in maintenance over the life of code.
The csspretty library is the parser used for Cascading Style Sheets and similar languages, like LESS and SCSS (Sass). This library receives an option mode, which determines if code should be beautified or minified.
There are two parallel arrays principly used for storing parsed data: token and types. The token array stores parsed code tokens and the types array stores a category name determining the type of code at the given index of the token array.
The library is divided into three major sections: a parser (csspretty__tokenize), a formatter (csspretty__beautify), and an analysis summary (csspretty__summary). Unlike many of the other libraries in Pretty Diff the vast majority of the complexity and logic is stored in the parser. The parser is particularly complex because it provides logic to sort properties, vertically align the colon used for property assignment, and condense some values. The formatter is used only to beautify code while the minified code is little more than a join of the token array.
The csvpretty library is a tiny parser for CSV format. It generates an array of arrays where each child array is a data record and each of its indexes are data cells.
The diffview library diffs slightly in code style from the other libraries as this the only remaining unoriginal code in the Pretty Diff project. The code comparisons are performed with this library.
This library contains two primary sections: a line by line difference analysis (diffview__opcodes), and report generation (diffview__report). The first section produces an array of arrays where each child array contains 4 numbers and a type label. This data determines where changes in comparison type occur and for how many lines the current type runs for each the submitted source sample and diff sample.
There are three forms of output currently supported. The default output type is a HTML report of four lists that resembles a table with four columns, which is referred to as the side-by-side version. The second format is the inline version which is three lists that resembles a three column layout. The side-by-side view appears to put two documents next to each other. The inline view uses one column for all code output with difference lines vertically adjacent. These two views are toggled using the diffview option.
The third output format is command line output. This option is only available if using this library independently of Pretty Diff or if using it with the api/node-local.js file. This format of output only outputs a list of results where a result is a collection of line numbers and the code that is different. The context option defaults to a value of 2 when used with this format. This format is activated with the diffcli option.
It is important to note that there is a massive child function, named diffview__report_charcomp, under the report generating function. This function performs fuzzy string comparison to identify specific character differences for each sample code line that is not equal.
There are several parallel arrays used: token, types, lines, level and meta. The arrays level and meta are used secondarily. The token array stores parsed code tokens. The types array stores names of various code types. The lines array identifies which tokens are followed by line breaks in the original input. The level array stores a formatting definition for use in beautification. The meta array stores information about variable declarations relative to their scope of declaration.
The first part of jspretty is the parser (jspretty__tokenize). The parser's purpose it to populate the token, types, and lines arrays. It contains additional logic to identify missing semicolons and missing curly braces. The missing artifacts are populated as pseudo-tokens to be ignored during final formatting or converted to proper tokens depending upon the value supplied for the correct option.
The second major part of jspretty is the beautification function (jspretty__algorithm). This section always populates the level array and may populate the meta array if the jsscope feature is used.
The third section of jspretty formats the output. Minification gets a unique formatting scheme named jspretty__minify. There are two forms of formatting for beautification: beautified code in text format (jspretty_result) and the colorful HTML format generated by the jsscope (jspretty_resultScope).
The final major section of jspretty (jspretty__report) creates a summarized analysis of the submitted code.
The markuppretty library's first major task is to parse the string input into a list of tokens. The bulk of the entire libraries logic is contained in the parser. The parser in this case is extremely robust because on one hand markup syntax is extremely predictable and this parser supports a huge list of syntax supporting various templating languages.
If option mode is provided a value of "parse" the library performs some quick white space insertion around text content and then returns an object of two arrays.
When option mode is provided a value of "beautify" or "diff" a couple extra things happen. First the parsed tokens and types are analyzed to determine things should be beautified. This analysis is stored in an array named level. The next part of the library applies the beautification described by the level array.
The final section of the library is an analytical summary of the code sample and includes a more thorough accessibilty analysis.