JavaScript beautification uses js-beautify function. This is originally written by Einars Lielmanis. I have modified this code for more strict conformance to the JSLint tool. This code is used in accordance with the licensing provided in the documentation of the original code.
JavaScript is beautified in accordance with the conventions expressed by the JSLint beautification scheme. This essentially means that spaces are added after commas, object contents are indented, and a new line is added after each statement termination and object closing.
The js_summary function is not provided a scope by js-beautify.js, so it must be provided a scope by a consuming function or it will become an implied global. The intent is to provide js_summary to js-beautify as closure so that it can access the interior of js-beautify and be accessed outside of js-beautify.
CSS beautification uses the cleanCSS function originally written by Anthony Lieuallen. I have modified it to for better conformity against JSLint and also for some minor beautification tweaks and customized indentation. I can no longer remember any specific errors I have found and corrected from modification of this function. I am using this code with permission from the original author.
CSS is beautified so that the contents of each object are indented and a new line is added after termination of each property declaration.
The css_summary function is a function resident to the cleanCSS function but is not provided a scope by that function. It is intended to be supplied as closure so that it may access the interior of cleanCSS, but can be accessed outside cleanCSS. This summary reports the number of HTTP requests and what those requests are.
CSV typically stands for comma separated values, but in this tool it stands for character separated values. The csvbeauty function takes a sequence of characters and splits the input upon that supplied sequence onto new lines. Prior existing line breaks, if they were quoted, are converted to a space contained by braces: { }. Unquote line breaks are converted into two simultaneous line breaks. If the final character(s) match the user supplied character sequence, after charDecoder processing, then those characters are converted into {|} so that csvmin will know a character sequence must exist at the extreme end of input. Escaped double quote characters, escaped using the formal CSV method by immediately preceeding the characters with an extra double quote character, are converted in a single double quote character to improve ledgibility.
CSV beautification uses charDecoder to decode Unicode character entities. The charDecode function accepts any combination of HTML decimal Unicode entities and Unicode hexidecimal entities. HTML decimal entities must begin with an ampersand and pound character '&#', be immediately followed with between one and six decimals, and be immediately terminated by a semicolon ';'. Examples of accepted HTML entities are:
The Unicode hexidecimal entities must begin with a lowercase u and plus character 'u+', be immediately followed by a four or five digit hexidecimal value, and be immediately terminated by a plus character. Hexidecimal values smaller than four digits must be padded with 0 characters necessary to achieve four digits. Examples of accepted Unicode entities are:
Please be aware that charDecode is reliant upon the interpreting application's HTML character rendering engine to map entity values to character maps, which means if the browser does not support the entity supplied the browser will return a generic character marker instead of the intended character. The content will then be separated in accordance to the rendered sequence value, which means a generic character marker will be used in the separation instead of the character referrenced by the supplied entity. In summary, if your browser has limited support for Unicode characters you must expect equally limited results when using entity references.
Markup beautification uses the markup_beauty function of the application. This function operates upon a pattern based logic of referrential integrity. This means decisions are made through exposure to the pattern as established so far. Unfortunately, this requires defined logic to consider all possible combinations of patterns. At this point the beautification appears to work for more than 99.9% of pattern combinations, but undefined combinations are continually being discovered. If an error in my logic is discovered please contact me so that I am supply you with a corrected application.
The markup beautification is based upon syntax conventions only and absolutely not upon vocabulary. The two exceptions are that the contents of a script tag are presumed to be JavaScript and are beautified accordingly and the contents of a style tag are presumed to be CSS and are beautified as CSS. The presumed CSS and JavaScript do not inherit indentation from the markup. Since the beautification is not based upon vocabulary any language that uses angle brackets for delimiters should work assuming the conditions of the next paragraph are met. The supplied markup does not have be valid or well formed by any means.
Content in the markup is represented by whether or not it begins or ends with any whitespace. If content does begin and/or end with whitespace then new line characters are added and the content is indented. This means tags that but up directly to content are then treated as an extension of that content and are indented as such.
Singleton tags expected to be terminated as XML singleton tags, which means a forward slash character prior to its closing angle bracket. If a singleton tag is not properly closed the beautifier believes the tag to be a start tag, which expects an end tag.
PHP tags are expected to open with "<?php" and XML parsing declarations are expected to open with "<?xml". Tags that begin with only "<?" are not supported, and so they are believed to be start tags missing a closing tag. This unsupported convention is no longer supported by PHP, even if tolerated, and will generate errors to an XML parser. I don't support this and neither should you.
Start tags expect to receive an end tag. End tags will be indented exactly like their starting pair unless they are directly next to content and the same is true for start tags. The beautification logic is smart enough to compensate and correct itself in adjustment for start tags or end tags that are not indented due to content.
Inner tags are currently not supported and I am aware of this problem. Inner tags are tags declared within tags. This is not allowed in XML or SGML and is a new syntax convention of tag based programming languages that evaluate dynamically expressed markup into static SGML or XML valid markup at compile time.
The markup_beauty function contains, at its end, a function called markup_summary. The markup_summary function is not provided a scope, because it is meant to be supplied as a closure to markup_beauty.
The markup_summary creates a report of the number of parts comprising the markup, the weight of each of those parts, and a score using a math formula to compute a performance rating that reenforces reliance upon structure and elaboration of content. This function also displays each HTML element making a HTTP request.
The Indentation Size is merely a multiplier of the specified indentation character. Each component of the total application computes indentation from multiplying the indentation size by the indendation character independently so that the components may be used without the others and with minimal or no modification.
The Indentation Characters option allows a choice of space, tab, new line, or a string of text characters from the HTML tool. The code will literally allow anything that can be expressed as a character or series of characters.
The Indent Comments option determines if comments should be indented in accordance to the neighboring code or if comments should not be indented at all.
The Indent Style/Script option is only available for markup beautification. This option determines if the contents of style or script should be indented to match the indentation of their parent tag or if their indentation should begin from the left.
The Presume HTML option is only available for the markup type of code beautification. When this option is enabled all tag names that are "presumed" as singleton tags in HTML are treated as singletons regardless of their syntax.
A heavily modified JSMin is used to perform minification of both JavaScript and CSS. This code is originally written in C language by Douglas Crockford and converted to JavaScript by Franck Marcia. I am using this code in accordance with the licensing expressed in the JavaScript form of this code from Mr. Marcia. This code is modified to recognize differences in requirements for JavaScript and CSS, for better conformance to the JSLint tool, and I added a semicolon insertion mechanism for sloppy JavaScript. This function is also enhanced so that it will always minify code down to a single line, which makes the optional semicolon insertion mechanism necessary.
CSS uses the exact same modified JSMin application described for JavaScript. The minification is largely identical except that "-", ".", and "\" are recognized as string characters and not operators or comments. The "$" and "/" characters are removed from this list. Some extra whitespace is inserted to preseve naming conventions that do not exist in JavaScript.
CSV typically stands for comma separated values, but in this tool it stands for character separated values. The csvmin function reverts all changes inflicted by the csvbeauty function.
CSV beautification uses charDecoder to decode Unicode character entities. The charDecode function accepts any combination of HTML decimal Unicode entities and Unicode hexidecimal entities. HTML decimal entities must begin with an ampersand and pound character '&#', be immediately followed with between one and six decimals, and be immediately terminated by a semicolon ';'. Examples of accepted HTML entities are:
The Unicode hexidecimal entities must begin with a lowercase u and plus character 'u+', be immediately followed by a four or five digit hexidecimal value, and be immediately terminated by a plus character. Hexidecimal values smaller than four digits must be padded with 0 characters necessary to achieve four digits. Examples of accepted Unicode entities are:
Please be aware that charDecoder is reliant upon the interpreting application's HTML character rendering engine to map entity values to character maps, which means if the browser does not support the entity supplied the browser will return a generic character marker instead of the intended character. The content will then be separated in accordance to the rendered sequence value, which means a generic character marker will be used in the separation instead of the character referrenced by the supplied entity. In summary, if your browser has limited support for Unicode characters you must expect equally limited results when using entity references. csvmin does not revert any changes supplied by the charDecoder function.
Markup is minified using markupmin that I wrote recently. This function does little more than tokenize a run of whitespace characters into a single space character and scrubbing of comments. It does, however, preserve whitespace inside ASP and PHP tags and preserve SSI tags. It will also assume the contents of a script tag are JavaScript and minify them according, and also assumes the contents of style tags are CSS and minifies them as such.
The semicolon insertion mechanism for JavaScript is activated using the JavaScript Toleration option from Minify Options section of the HTML tool. The semicolon insertion mechanism has passed on all tests that I have run against it using whatever faulty code I could find from Travelocity. That does not mean I have perfected this enhancement, so use it at your own risk. It is also ridiculously slow. I suggest fixing your code opposed to reliance upon semicolon insertion.
The diff engine uses three separate functions: difflib, diffview, and charcomp. The first two components are originally by Snowtide Informatics Systems and the third component I wrote. difflib is altered in order to achieve more strict JSLint compliance, but is otherwise not significantly altered. diffview is almost entirely rewritten from scratch so that JavaScript arrays are used to store the dynamic output instead of DOM objects. This change has result in a 3.5x faster response rate. charcomp is the function used to highlight per character differences.
JavaScript code is first minified using JSMin and then beautified using js-beautify. This prevents differences from comments or whitespace interfering with the analysis of the code. It also allows beautified code to be flawlessly compared with minified code. CSV is first minified with csvmin and then beautified with csvbeauty. CSS is first minified using JSmin for CSS and then beautified using cleanCSS for the same reasons mentioned for JavaScript. Markup is first minified using markupmin and then beautified using markup-beautify. Plain text is diffed without any minification or beautification. If code that needs to be compared that is not compatible with the other processes then use the plain text mode.
Only after the automated beautification does the diff process begin. The difflib finds differences per line and sends its results, as an array of numeric values where to look in the code, over the diffview. Diffview takes the opcodes supplied by difflib and then builds an array where the code is pumped into HTML table cell code. Once the view is completely built it is immediately inserted into the page using innerHTML. You cannot see the output at this point because it is set to display none until charcomp finishes.
charcomp finds the table cells with a class of "replace" and only works on those cells. Before performing any comparison it converts non-breaking space references into actual spaces to reduce processing requirements, and converts angle brackets into entity references, and converts entity references for quotes into actual quotes. In JavaScript a single quote compares to true against a double quote even if both are string literals, so I invented character references that I could convert back to quotes within the context of the comparison function so that they actually do become string literals that comparable. Once a difference is located it is wrapped in an em tag.
Print or Save Output option dumps the output into a separate window that contains only a title and the diff output. The code in this window is still dynamic, so it cannot be viewed unless saved to disk first. Since the code is all dynamic any browser plugins that are open, if they require access to the DOM, will exist in an open state in the saved output as well. Since this is a popup window any browser that does not tolerate popup windows with stability likely will not function correctly with this option. If the output is to be printed please use landscape instead of portrait orientation for the paper in order to achieve the best results. The output may look slightly different, because I am wrapping the table in an HTML <pre> tag in order to allow the visible presense of whitespace and yet still allow breaking for presentation where that breaking would not explicitly exist as a character definition. This presentation occurs so that more output could fit within the constraints of paper width. This is not a perfect solution as runs of whitespace will not break and will flow outside printable bounds.
The Context Size option provides padding to the lines of code differences with lines of code that are not different. This option expects to receive a number or empty value only. If anything else is entered an empty value will be processed. An empty value negates this option by returning all supplied code with differences highlighted. If there is no differences are discovered this option is negated.
JavaScript Toleration is exactly the same as described in Minification Options above. Indentation Size are exactly the same as described in Beautification Options above. Indentation Characters, Indent Style/Script, and Presume HTML are exactly the same as described in Beautification Options above.
The Diff View Type option provides two choices. The first choice, Side by Side View, reports the output into two columns that display a side by side comparison of the differences. The second choice, Inline View, displays the output into a single column so that the differences can be seen in a vertical comparison.
| Component | Author(s) | Summary | Revised |
|---|---|---|---|
| js-beautify.js | Einars LielmanisRevised by Austin Cheney | Beautifies JavaScript code. | 3 Sep 10 |
| prettydiff.js | Austin Cheney | The actual Pretty Diff application code. | 3 Sep 10 |
| diffview.css | Austin Cheney | The CSS that powers everything to do with the form, diff output, and this documentation. | 2 Sep 10 |
| markup_beauty.js | Austin Cheney | Beautifies markup code. | 2 Sep 10 |
| index.php | Austin Cheney | The actual Pretty Diff tool HTML file. | 1 Sep 10 |
| cleanCSS.js | Anthony Lieuallenrevised by Austin Cheney | Beautifies CSS code. | 30 Aug 10 |
| fulljsmin.js | Original - Douglas CrockfordJavaScript adaptation - Franck Marciarevised by Austin Cheney | Minifies JavaScript code and CSS code. | 30 Aug 10 |
| documentation.php | Austin Cheney | This documentation page. | 29 Aug 10 |
| markupmin.js | Austin Cheney | Minifies markup code. | 28 Aug 10 |
| difflib.js | Snowtide Informatics | Compares lines of code looking for differences. | 10 Aug 10 |
| charcomp.js | Austin Cheney | Compares lines of differences to show per character differences. | 9 Aug 10 |
| csvbeauty.js | Austin Cheney | The function that beautifies character sequence values. | 14 May 10 |
| csvmin.js | Austin Cheney | The function that minifies character sequence values. | 14 May 10 |
| diffview.js | Snowtide Informaticsrevised by Austin Cheney | Builds the HTML diff output. | 14 May 10 |
| charDecoder.js | Austin Cheney | The function that decodes Unicode character entities for csvbeauty.js and csvmin.js. | 21 Apr 10 |
* I only claim to be a revision author where I completely rewrote or extended functional output opposed to merely reorganizing the original logic of the code for JSLint compliance.
Please send comments, feedback, and requests to cheney@mailmarkup.org.