In Free Fall (In Caduta Libera), by Tullio Crali

Many aspects of web development require extracting text from a page while cleansing it of markup: populating an RSS feed, for example, or filling a JSON request with page data. There are also plenty of occasions when you’ll need to fill a newly created element with text: for example, creating a label for a <button>. In both cases, the safest and most efficient way to achieve these ends in JavaScript is usually via the textContent property.

Inserting Text Content

Let’s say that we’ve created a new button element, referenced in JavaScript as hitSwitch:

var hitSwitch = document.createElement("button");

We want to place text inside that element: i.e. between the opening <button> and closing </button> tags, before placing the element on the page. Traditionally, that would call for innerHTML, but there are two downsides to that approach:

  1. innerHTML can be used as a vector for cross-site scripting attacks (XSS)
  2. innerHTML’s execution speed is little slow, as it parses the text before adding it to the element.

In most cases, textContent is a better choice:

hitSwitch.textContent = "Hit me with your rhythm stick";

hitSwitch now appears as:

<button>Hit me with your rhythm stick</button>

textContent only adds text: if you needed to add HTML markup at the same time, innerHTML or insertAdjacentHTML are better choices. When textContent is used to set text, it will replace any text and markup that already exists inside the referenced element. For example, you could remove the entire content of a web page by using the following:

document.body.textContent = "";

Extracting Text

textContent can also be used to extract content from a page. If we have the following:

<p id="futuro"><strong>Futurism</strong> (Italian: <em>Futurismo</em>):
an artistic and social movement that originated in Italy before 
<abbr title="World War">WW</abbr>I.
It emphasized speed, technology, youth, and violence, 
together with new industrial objects&hellip; the car, the aeroplane, 
the train, and the modern city.</p>

Then we can pull the text content of the paragraph only by using the following:

var futurism = document.getElementById("futuro");
var textExtract = futurism.textContent;

Printed to the console, textExtract would appear as:

"Futurism (Italian: Futurismo):
an artistic and social movement that originated in Italy before 
WW I. It emphasized speed, technology, youth, and violence, 
together with new industrial objects… the car, the aeroplane, 
the train, and the modern city."

There are several things to note about this extraction technique:

  1. The original content on the page remains unchanged.
  2. All HTML markup is removed from the extracted text, including tags inside the referenced element. Text content between those tags is retained.
  3. HTML entities are automatically converted into their on-screen representation.
  4. Images, being tags, will be removed entirely, and their alt values will not appear in the extraction.
  5. To shorten our code, we could merge the two lines of JavaScript into:
var textExtract = document.getElementById("futuro").textContent;

The Danger of Open Tags

Previously I have pointed out that elements like <p> can be written without a closing tag. This is always optional; and if you’re ever going to use textContent, the practice can be dangerous. If we remove the closing tag from the text sample, and add an inline <script> after it:

<p id="futuro"><strong>Futurism</strong> (Italian: <em>Futurismo</em>):
an artistic and social movement that originated in Italy before 
<abbr title="World War">WW</abbr>I.
It emphasized speed, technology, youth, and violence, 
together with new industrial objects&hellip; the car, the aeroplane, 
the train, and the modern city.
<script>
var dt = dy + dx;
</script>

Then repeat the same JavaScript:

var textExtract = document.getElementById("futuro").textContent;

The resulting value of textExtract is now:

Futurism (Italian: Futurismo):
an artistic and social movement that originated in Italy before 
WWI. It emphasized speed, technology, youth, and violence, 
together with new industrial objects… the car, the aeroplane, 
the train, and the modern city.

var dt = dy + dx;

Why does this happen? JavaScript sees the new <script> tag as being inside the paragraph (due to the paragraph’s lack of a closing tag). It eliminates the script markup itself, but interprets the remaining code as “text”. To avoid this, we just need to close the paragraph with a </p>, clarifying where the paragraph ends.

Conclusion

textContent is a very useful property to have in your arsenal of JavaScript techniques to manipulate and extract from the DOM, with terrific support: most browsers have supported it from their earliest versions, and Microsoft has support from IE9 (having abandoned its earlier proprietary .innerText property, which had the same functionality).

In Free Fall (In Caduta Libera) by Tullio Crali, Italian Futurist painter

Enjoy this piece? I invite you to follow me at twitter.com/dudleystorey to learn more.