The DOM Explained, Quick and Simple
Table of Contents
Introduction
Scope of this Document
This document will only focus upon DOM Level 2. Although DOM Level 1 is still universally supported this document views it as an expired and purely legacy artifact. XML namespaces and event execution, as well its memory management, will not be discussed thereby negating need to dwell upon DOM Level 3 and 4.
This document is an attempt to explain what the DOM is, how to use it, and why it is defined the way it is. The intended audience is anybody wanting to enrich their understanding of the DOM. Although the DOM is language agnostic this document is writing from the perspective of JavaScript.
This is not a how to article. This document does not discuss abstractions, frameworks, or other helpers. This document seeks to be educational, which is to say this document seeks to explain things. This document does not attempt to provide tools or suggestions.
What the DOM Is
The DOM serves primarily as an API for JavaScript to access and interpret any markup language based upon the XML syntax, which still includes HTML. API (Application Programming Interface) is a common term to describe how one piece of software talks to another unrelated piece of software. The DOM is language agnostic so that it can be used and defined by a variety of languages in a variety of ways. As expressed in JavaScript, the DOM is a collection of objects that inherit from a master object named: document. Each data facet conveyed by the DOM is an object. The inheritance and relationships naturally demonstrated through the document structure of the markup (functional inheritance) is defined in the DOM through methods inherited from the document object.
It is important to recognize the DOM is the standard way that web browsers interact with HTML. Every JavaScript framework, library, and abstraction that accesses HTML or XML ultimately does so through the standard DOM methods. A proper understanding of the DOM is critical for accessibility, performance, semantics, and basic problem solving.
Every data facet expressed by the DOM is referred to as a node. Each node represents a single unit of a markup document whether that unit comprises an HTML tag, attribute, comment, or anything else. Every node is defined of one of twelve types. Only the types: element, text, and attribute are commonly used in the web though every type is available in modern web browsers. The objects representative of the nodes feature properties that define their relationships or inherit methods that can define their relationships, such as the parentNode property.
Because DOM Level 2, and later versions, are based upon XML syntax the DOM is case-sensitive. When accessing HTML, which is not case-sensitive, case-sensitivity can vary by browser and browser version. It is recommended to always use lower case to access HTML from the DOM to eliminate minor conformance differences by browser and also force HTML elements and attribute names to lower case when interpreting names returned from the DOM.
There is a standard alternative to the DOM with a lighter memory footprint and arguably far faster execution time called XPath. Unfortunately, XPath is not universally supported cross browser as a natively available method. This further reinforces the importance of learning the DOM as the only universally supported and standard API to access a markup document structure.
The well known innerHTML and outerHTML properties are not defined in the DOM. As such they will not be discussed here.
History of the DOM
The DOM originally came about as a standard means for accessing HTML from JavaScript. Prior to the first DOM standard IE and Netscape had separate and unrelated means of accessing HTML. The first standard is the original DOM Specification, later called DOM Level 1, from 1997. There was an immediate and vital need for conformity so this document was released earlier than other W3C technologies.
In 1998 work began on the DOM Level 2 specification that became a recommendation in late 2000. If you compare the proposed properties of the Level 2 document to the Level 1 document you can see there are wild differences. For instance DOM Level 1 identifies properties directly related to tags required in HTML while DOM Level 2 is fully language agnostic. The DOM Level 2 specification was written in parallel with the XML Schema Part 1: Structures specification. DOM Level 2 specification released revisions shortly after each XML specification revision and became a formal recommendation only after XML Schema became a recommendation. DOM Level 2 is primarily concerned with node definitions, node types, and access to nodes via their relationships.
In 2004 the DOM Level 3 specification became a recommendation on the same date DOM Level 2 received its final update. This was also just three weeks after a major revision to the XML Schema Part 1 Second Edition Edit Recommendation was released. Notice that DOM Level 3 extends DOM Level 2 but does not extend DOM Level 1. DOM Level 3 is primarily concerned with namespaces, inheritance, and extension of node names and definitions.
DOM Level 4 became a published standard in late 2015. DOM Level 4 is worthy of note in that it extends DOM Level 2 and 3 but is focused primarily upon interaction and events instead of anything related to structure or node relationships. A primary example is that garbage collection is described in the specification where in the past these discussions were left entirely to the implementation of JavaScript interpreters. In a purely desktop driven experience DOM Level 4 might be considered unnecessarily and excessive. New platforms like mobile and tablets have proven that standard definitions for events and interaction is not enough. There must also be standard guidance on implementation of those events to achieve conformance, such as when the two largest mobile platform developers cannot produce interfaces that execute to the standards they helped write.
The DOM Level 4 specification as, published by the W3C, is a snapshot of work from the WHATWG DOM specification. This snapshot allows an evolving artifact from the WHATWG to be published against a fixed date and version number such that consuming parties have a static reference to serve as a standard for conformance.
How HTML Loads in Modern Browsers
It is important to understand how the DOM is created. If a cached JavaScript file is requested at the top of a document and is well written it is quite possible that it may request access to DOM objects before they are created, which can result in unintended errors. For performance reasons it is often suggested that JavaScript be requested from the bottom of a document, which also provides the benefit of allowing the DOM to fully render before attempts to access it are executed.
The first thing modern browsers do when encountering any markup language, such as HTML, is parse the markup. Parsing does not imply producing or rendering the DOM, which is a separate and later process. Parsing is the process of transform strings of text, which is what is sent across the internet, into intrepreted code that computers use. As an example people parse language when they talk to each other or when they read books. All modern documents use XML as a parsing foundation for any XML similar markup languages. In web browsers poorly formatted HTML is generally considered close enough to XML to use the same parsing techniques but with some additional rules. Because XML provides syntax rules that are terse and specific writing an XML parser is much faster, easier, and cheaper than writing an HTML parser. This means a smaller and faster executing parser, but more importantly it means a solid foundation on which to provide additional rules necessary for parsing HTML. An example of standard parsing guidance can be found in the XML Information Set.
Once the markup is parsed requests for additional resources are identified and those requests are executed. It is absolutely essential that the requests for additional resources execute as immediately as possible as the transmission time necessary to return these resources can occur simultaneously to the DOM rendering. For performance it is considered a good idea to put requests for visual artifacts, particularly stylesheets, as high in the document as possible so that the page can render visually as the DOM renders without repainting later. Another reason the DOM is created separate from the parsing process is because it is significantly larger, and thus slower to produce and populate into memory, than the parse data.
After the DOM is fully rendered, all requests for additional resources have executed, and all initially requested JavaScript is interpreted the load of the page is complete. It is only at this time that the onload event executes. As a side note events are not formally addressed in this document, but are points of user interaction and are formally defined in DOM Level 3. It is important to take note that older versions of Internet Explorer require receipt of an HTTP response for all requested resources, which means that if an HTTP response is never returned the onload event never executes. It is also important to notice that requests for JavaScript not made from the HTML have no bearing on the execution of the onload event and neither do any JavaScript executions that occur outside of initial interpretation or through any other API.
Accessing the DOM in JavaScript
Methods for Searching the DOM
The DOM is searched using methods defined on the document object. Some methods defined in that object are inherited by other DOM objects and some methods are not. Any DOM access method executed on the document object search from the document.documentElement object that represents the root node of the markup, typically the <html> element. The most common methods for accessing the DOM are:
-
getElementById - DOM Level 2
This method is only available from the document object or documentFragment type object and requires a string argument. The method will search the document for the first node with an id attribute whose value matches the value of the provided string argument. This method returns either null or the first DOM element node with a matching ID attribute value.
document.getElementById("idValue"); //returns null or element node
-
getElementsByTagName - DOM Level 2
This method is inherited to all element node types and requires a string argument. This method will search from the object it is executed on for all elements with a tag name matching the provided string. A node list is always returned. If there are no matching elements a node list with 0 indexes is returned.
document.getElementsByTagName("div"); //returns a node list
myElementNode.getElementsByTagName("div"); //returns a node list
-
getElementsByClassName - DOM Level 4
This method is inherited to all element node types and requires a string argument. This method will search from the object it is executed on for all elements with a class attribute value matching any class name in the class attribute of any element. Class names are represented as a space separated list in the class attribute value. A node list is always returned. If there are no matching elements a node list with 0 indexes is returned.
document.getElementsByClassName("classValueToken"); //returns a node list
myElementNode.getElementsByClassName("classValueToken"); //returns a node list
-
getAttribute - DOM Level 2
This method is inherited to all element node types and requires a string argument. This method will return the string value of the attribute matching the attribute name supplied in the string argument. If no matching attribute exists it will return null.
myElementNode.getAttribute("attributeName"); //returns null or a string
-
cloneNode - DOM Level 2
This method is inherited to all DOM nodes. If this method is used on an element node all values, attributes, and properties are cloned. This method takes one argument of type boolean, which if true also clones all child and descendant nodes.
myNode.cloneNode(true); //returns a copy of a node including all descendants
myNode.cloneNode(); //returns a copy of a node not including descendants
Walking the DOM
Another means of accessing the DOM is by starting from a DOM element node and moving from node to node using relationships defined in the properties of each element's DOM object. This is called walking the DOM. Here are a list of the most common properties to demonstrate those relationships:
-
parentNode - DOM Level 2
The parentNode property is available to every DOM node regardless of node type. This property returns the element node immediately containing the current DOM node. When the parentNode property is executed on the document.documentElement object the document object is returned.
myNode.parentNode; //returns an element node
-
childNodes - DOM Level 2
The childNodes property is available to every element node type. When executed this property will return a node list of child nodes regardless of type, but it will not return descendant nodes, which are nodes that are children of the immediate child nodes. The child nodes are always returned in the same order in which they occur in the DOM. Versions of Internet Explorer 8, and earlier, intentionally omitted text nodes containing only white space characters from the childNodes property thereby creating conformance problems. The childNodes property does not return attribute nodes.
myElementNode.childNodes; //returns a node list
-
firstChild - DOM Level 2
The firstChild property is available to all element, document, and documentFragment node types. It returns the first child node from the childNodes node list. If this property is executed on the document element it will return a documentType node type.
myElementNode.firstChild; //returns first child node
-
lastChild - DOM Level 2
This property returns the node in position: childNodes[childNodes.lenght - 1], but is otherwise identical to the firstChild property.
myElementNode.lastChild; //returns last child node
-
nextSibling - DOM Level 2
The nextSibling property is available to every DOM node and returns the next child node of the parent node. If the current node is the last child of its parent this property will return null.
myNode.nextSibling; //returns next adjacent node
-
nextElementSibling - WHATWG DOM
The nextElementSibling property is available to every DOM node and returns the next child node of type element of the parent node. If the parent node contains no element child nodes following the current node this property will return null.
myNode.nextElementSibling; //returns next element sibling
-
previousSibling - DOM Level 2
The previousSibling property is available to every DOM node and returns the previous child node of the parent node. If the current node is the first child of its parent this property will return null.
myNode.nextSibling; //returns prior adjacent node
-
previousElementSibling - WHATWG DOM
The previousElementSibling property is avialable to every DOM node and returns the previous child node of type element of the parent node. If the parent node contains no element child nodes preceding the current node this property will return null.
myNode.previousElementSibling; //returns prior element sibling
-
attributes - DOM Level 3
The attributes property is only available to element type nodes. It returns a node list of all attributes, as well as their values, represented on that element. If an element has no attributes then a list of length 0 is returned.
myElementNode.attributes; //returns a node list of format: attributeName="value"
-
nodeType - DOM Level 2
This property is available to every DOM node. It always returns a number depend upon the node type as described in the following list order.
- ELEMENT_NODE
- ATTRIBUTE_NODE
- TEXT_NODE
- CDATA_SECTION_NODE
- ENTITY_REFERENCE_NODE
- ENTITY_NODE
- PROCESSING_INSTRUCTION_NODE
- COMMENT_NODE
- DOCUMENT_NODE
- DOCUMENT_TYPE_NODE
- DOCUMENT_FRAGMENT_NODE
- NOTATION_NODE
myNode.nodeType; //returns a number 1 - 12
-
nodeName - DOM Level 2
This property always returns either a string or null depending upon the node type. Please see this table in the specification for value definitions. According to conformance with the XML specification the value returned this property should always be case sensitive and specifically match the node it is applied against. With HTML, however, this is frequently not the case and differs by browser and version. I recommend always using the JavaScript toLowerCase against the value returned for nodeType to safely force conformity.
myNode.nodeName; //returns a string of variant definition by node type
-
nodeValue - DOM Level 2
This property always returns either a string or null depending upon the node type. Please see this table in the specification for value definitions.
myNode.nodeValue; //returns either null for a string of variant definition by node type
Changing the DOM
The DOM can also be manipulated and altered through various means. The following is a list of common means to change the DOM:
-
createElement - DOM Level 2
The createElement method is available from the document object or a documentFragment type object and takes a single string argument. The value of the string argument is the name of an element type node to create, so this method will throw an error if the supplied string violates syntax rules for element names.
document.createElement("div"); //returns a new element node not bound to the document
-
createTextNode - DOM Level 2
The createTextNode method is available from the document object or a documentFragment type object and takes a single string argument. The value of the string argument is the text content of the node.
document.createTextNode("some text here"); //returns a new text node not bound to the document
-
appendChild - DOM Level 2
This method is available to every element type node. It takes a single argument of a reference to a node object, which is an object node to add to the DOM, and adds this node after every other child node. The object node must not already have an assigned parent node.
myElementNode.appendChild(newNode); //returns the appended node
-
insertBefore - DOM Level 2
This method is available to every element type node. It takes a two arguments which is a reference node and a reference to a new node object, adds this new node before the reference node. The object node must not already have an assigned parent node.
myElementNode.insertBefore(referenceNode, newNode); //returns the newNode
-
removeChild - DOM Level 2
This method is available to every element type node. It takes a single argument of a reference to a node object, which must be a child node of the current node. The supplied node is removed from its parent node and so it is effectively removed from the document. As long as the JavaScript reference to this node can still be accessed the node will remain alive in memory so that it can be changed and added back to the document later.
myElementNode.removeChild(anotherNode); //returns anotherNode
myElementNode.removeChild(myElementNode.lastChild); //returns myElementNode.lastChild
myElementNode.removeChild(myElementNode.childNodes[3]); //returns myElementNode.childNodes[3]
-
replaceChild - DOM Level 2
This method is available to every element type node. It takes a two arguments. The first is a new DOM node and the second is a DOM node to replace. The new DOM node cannot have an assigned parentNode and the old DOM node will have its parentNode unassigned effectively removing it from the document. The old DOM node will continue to reside in memory so long as a variable reference to it remains available.
myElementNode.replaceChild(newNode, originalNode); //returns originalNode
-
removeAttribute - DOM Level 2
This method is available to every element type node. It takes a single argument representing an attribute name and if that attribute exists it is removed from its parent element.
myElementNode.removeAttribute("class"); //returns nothing
-
setAttribute - DOM Level 2
This method is available to every element type node. It takes two arguments of which both are strings. The first argument is the name of the attribute and the second argument is the value to assign. If an attribute with that name already exists it will be over written otherwise a new attribute will be created.
myElementNode.setAttribute("class", "myAttributeValue"); //returns nothing
Examples of Method Chaining
It is frequently necessary to use multiple DOM methods together to target a specific node relative some starting point. This process is commonly referred to as walking the DOM. This is the only means a web browser uses to access pieces of an HTML/XML document from JavaScript. All frameworks and querySelector methods sugar down to the DOM methods, and all modern web browsers provide the bulk of their optimizations to this API, so learning how these methods are used together will allow for writing faster executing web applications. I have provided some brief examples:
-
document.getElementById("someID").childNodes //get all child nodes from a targeted element
-
document.getElementById("someID").childNodes.length //how many children does the targeted node contain
-
document.getElementById("someID").getElementsByTagName("p") //get all p nodes in the tree of a targeted node
-
document.getElementById("someID").firstChild.nodeType //determine the type of node of a target elements first child node
-
document.getElementById("someID").previousSibling.firstChild //get the first child of a targeted element's previous sibling
-
document.getElementById("someID").nextSibling.childNodes[2] //get the third child of a targeted element's next sibling
-
document.getElementsByTagName("h3")[4].parentNode.parentNode //find the grandfather element of the fifth h3 element
-
document.getElementsByTagName("h3")[4].getAttribute("id") //get the id attribute value of the fifth h3 element in the document