LuaWebGen

XML

v1.2 The XML module, available through the xml global, handles XML data parsing and contains XML/HTML related functionality.

Note: The API is very similar to Penlight's. Most functions have new names, but the Penlight names also work.


Introduction

Accessing XML data through the data object (or calling xml.parseXml()) will get you an XML node back (or specifically, an element). A node can be two things: XML tags become elements (represented by tables) while all other data become text nodes (represented by strings).

Elements are sometimes also called documents in this documentation and other places, especially when referring to the root element in a node tree.

Elements

Elements always have a tag field and an attr field (for attributes). They are also arrays containing child nodes.

element = {
	tag  = tagName,
	attr = {
		[name1]=value1, [name2]=value2, ...
	},
	[1]=childNode1, [2]=childNode2, ...
}

A similar format is used in other libraries too. LuaExpat calls it LOM.

Example

The following XML...

<animal type="dog" name="Puddles">
	<hobbies>Biting &amp; eating</hobbies>
	<!-- Comments are ignored. -->
	How did this <![CDATA[ get here? ]]>
</animal>

...results in this table:

document = {
	tag  = "animal",
	attr = {
		["name"] = "Puddles",
		["type"] = "dog",
	},
	[1] = "\n\t",
	[2] = {
		tag  = "hobbies",
		attr = {},
		[1]  = "Biting & eating",
	},
	[3] = "\n\t\n\tHow did this  get here? \n",
}

Notice how all whitespace is preserved, and that CDATA sections become text.

API

Functions

Note: All functions can be called as methods on elements (i.e. xml.toXml(element) is the same as element:toXml()).

addChild

xml.addChild( element, childNode )

Add a child node to an element.

Penlight alias: Element:add_direct_child()

clone

nodeClone = xml.clone( node [, textSubstitutionCallback ] )

Clone a node and it's children. If the textSubstitutionCallback argument is given, it should be a function with this signature:

text = textSubstitutionCallback( text, kind, parentElement )

This function is called for every text node, tag name and attribute in the node tree. It can modify the values of these things for the clone by returning a modified string. kind will be "*TEXT" for text nodes, "*TAG" for tag names, and the attribute name for attributes. parentElement will be nil if the initial node argument is a text node.

compare

nodesLookEqual = xml.compare( value1, value2 )

Returns true if the values are two nodes that look equal, false otherwise. Returns false if any value is not a node.

contentsToHtml

htmlString = xml.contentsToHtml( node )

v1.3 Convert the child nodes of a node into an HTML string.

Also see xml.toHtml().

contentsToXml

xmlString = xml.contentsToXml( node )

v1.3 Convert the child nodes of a node into an XML string.

Also see xml.toXml().

decodeEntities

string = decodeEntities( encodedString [, strict=true ] )

Decode XML/HTML entities in a string. If strict is false then the parsing rules are more relaxed (i.e. & can appear without being being part of a valid entity).

eachChild

for childNode in xml.eachChild( element )

Iterate over child nodes.

Penlight alias: Element:children()

eachChildElement

for childElement in xml.eachChildElement( element )

Iterate over child elements (skipping over text nodes).

Penlight alias: Element:childtags()

eachMatchingChildElement

for childElement in xml.eachMatchingChildElement( element, tag )

Iterate over child elements that have the given tag name.

element

element = xml.element( tag [, childNode ] )
element = xml.element( tag, attributesAndChildNodes )

Convenient function for creating a new element. The second argument, if given, can be either a node to put in the element as it's first child, or a combination of an array of child elements and a table of attributes. Examples:

local person = xml.element("person")
local month  = xml.element("month", "April")
local planet = xml.element("planet", xml.element("moon"))

local chicken = xml.element("chicken", {
	age = "3",
	id  = "942-8483",

	xml.element("egg"),
	xml.element("egg"),
})

Also see xml.newElement().

Penlight alias: xml.elem()

encodeRequiredEntities

encodedString = xml.encodeRequiredEntities( string )

v1.3 Encode &, <, >, " and ' characters into XML/HTML entities (&amp; etc.). This is the same function as entities().

encodeMoreEntities

html = xml.encodeMoreEntities( string )

v1.3 Encode &, <, >, " and ' characters into HTML entities (&amp; etc.). Also encodes some additional spaces and invisible characters, like &nbsp; and &InvisibleTimes;.

filter

xml.filter( element [, textSubstitutionCallback ] )

Clone an element and it's children. This is an alias for xml.clone().

findAllElementsByName

elements = xml.findAllElementsByName( element, tag [, doNotRecurse=false ] )

Get all child elements that have the given tag, optionally non-recursively.

Penlight alias: Element:get_elements_with_name()

getAttributes

attributes = xml.getAttributes( element )

Get the attributes table for an element (i.e. element.attr). Note that the actual table is returned - not a copy of it!

Note: You can use xml.setAttribute() or xml.updateAttributes() for updating attributes.

Penlight alias: Element:get_attribs()

getChildByName

childElement = xml.getChildByName( element, tag )

Get the first child element with a given tag name. Returns nil if none exist.

Penlight alias: Element:child_with_name()

getFirstElement

childElement = xml.getFirstElement( element )

Get the first child element. Returns nil if none exist.

Penlight alias: Element:first_childtag()

getHtmlText

text = xml.getHtmlText( element )

v1.3 Get the full text value of an element (i.e. the concatenation of all child text nodes, recursively). Unlike xml.getText(), this function is aware of HTML-specific properties, e.g. that the alt attribute of <img> tags can be used as a textual replacement for the image.

getText

text = xml.getText( element )

Get the full text value of an element (i.e. the concatenation of all child text nodes, recursively).

Also see xml.getHtmlText().

getTextOfDirectChildren

text = xml.getTextOfDirectChildren( element )

Get the full text value of an element's direct children (i.e. the concatenation of all child text nodes, non-recursively).

(In most cases you probably want to use xml.getText() or xml.getHtmlText() instead of this function.)

Penlight alias: Element:get_text()

isElement

bool = xml.isElement( value )

Check if a value is an element.

Penlight alias: xml.is_tag()

isText

bool = xml.isText( value )

Check if a value is a text node. (Any string value will make the function return true.)

makeElementConstructors

constructor1, constructor2, ... = xml.makeElementConstructors( tags )
constructor1, constructor2, ... = xml.makeElementConstructors "tag1,tag2,..."

Given a list of tag names, return a number of element constructors. The argument can either be an array of tag names, or a string with comma-separated tags.

A constructor creates a new element with the respective tag name every time it's called. It's a function with this signature:

element = constructor( [ childNode ] )
element = constructor( attributesAndChildNodes )

The argument, if given, can be either a node to put in the element as it's first child, or a combination of an array of child elements and a table of attributes (same as the argument for xml.element()).

Example:

local bowl,fruit = xml.makeElementConstructors "bowl,fruit"
local document   = bowl{ size="small", fruit"Apple", fruit"Orange" }
print(document) -- <bowl size="small"><fruit>Apple</fruit><fruit>Orange</fruit></bowl>

Penlight alias: xml.tags()

mapElements

element = xml.mapElements( element, callback )
replacementNode = callback( childElement )

Visit and call a function on all child elements of an element (non-recursively), possibility modifying the document. Returning a node from the callback replaces the current element, while returning nil removes it.

Penlight alias: Element:maptags()

match

matches = xml.match( document, xmlStringPattern )
matches = xml.match( document, elementPattern )

Find things in a document by supplying a pattern. This is the opposite function of xml.substitute(). See the Penlight manual on the subject for more info (look for the sections describing templates). Returns nil and a message on error.

newElement

element = xml.newElement( tag [, attributes ] )

Create a new element, optionally initialized with a given attributes table. Examples:

local person  = xml.newElement("person")
local chicken = xml.newElement("chicken", {age="3", id="942-8483"})

Also see xml.element().

Penlight alias: xml.new()

parseHtml

element = xml.parseHtml( htmlString [, filePathForErrorMessages ] )

Parse a string containing HTML markup. Returns nil and a message on error. Example:

local document = xml.parseHtml("<!DOCTYPE html>\n<html><head><script> var result = 1 & 3; </script></head></html>")
print(document[1][1].tag) -- script

parseXml

element = xml.parseXml( xmlString [, filePathForErrorMessages ] )

Parse a string containing XML markup. Returns nil and a message on error. Example:

local document = xml.parseXml("<foo><bar/></foo>")
print(document[1].tag) -- bar

removeWhitespaceNodes

xml.removeWhitespaceNodes( document )

Recursively remove all text nodes that don't contain any non-whitespace characters from the document.

print(document:toXml())
--[[ Output:
<horses>
	<horse>
		<name> Glitter </name>
	</horse>
	<horse>
		<name>Rush  </name>
	</horse>
</horses>
]]

document:removeWhitespaceNodes()
print(document:toXml())
--[[ Output:
<horses><horse><name> Glitter </name></horse><horse><name>Rush  </name></horse></horses>
]]

setAttribute

xml.setAttribute( element, attributeName, attributeValue )
xml.setAttribute( element, attributeName, nil )

Add a new attribute, or update the value of an existing. Specify a nil value to remove the attribute.

Penlight alias: Element:set_attrib()

substitute

newDocument = xml.substitute( xmlString, data )
newDocument = xml.substitute( document, data )

Create a substituted copy of a document. This is the opposite function of xml.match(). See the Penlight manual on the subject for more info (look for the sections describing templates). Returns nil and a message on error.

Penlight alias: Element:subst()

toHtml

htmlString = xml.toHtml( node [, preface=false ] )

Convert a node into an HTML string.

preface, if given, can either be a boolean that says whether a standard <!DOCTYPE html> string should be prepended, or be a string containing the given preface that should be added. Example:

local document = xml.parseHtml('<html  x = "y"  ><body><input type=text disabled></body></html>')

print(document:toHtml())
--[[ Output:
<html x="y"><body><input type="text" disabled></body></html>
]]

toPrettyXml

xmlString = xml.toPrettyXml( node [, initIndent="", indent=noIndent, attrIndent=noIndent, preface=false ] )

Convert a node into an XML string with some "pretty" modifications.

(Generally, you probably want to use xml.toXml() instead of this function.)

initIndent will be prepended to each line. Specifying indent puts each tag on a new line. Specifying attrIndent puts each attribute on a new line. preface, if given, can either be a boolean that says whether a standard <?xml...?> string should be prepended, or be a string containing the given preface that should be added. Examples:

local document = xml.parseXml('<foo x="y"><bar/></foo>')

print(document:toPrettyXml("", "  "))
--[[ Output:
<foo x="y">
  <bar/>
</foo>
]]

print(document:toPrettyXml("", "    ", "  ", '<?xml version="1.0"?>'))
--[[ Output:
<?xml version="1.0"?>
<foo
  x="y"
>
    <bar/>
</foo>
]]

This function is used when calling tostring(element). Also see xml.toXml().

Penlight alias: xml.tostring()

toXml

xmlString = xml.toXml( node [, preface=false ] )

Convert a node into an XML string.

preface, if given, can either be a boolean that says whether a standard <?xml...?> string should be prepended, or be a string containing the given preface that should be added. Examples:

local document = xml.parseXml('<foo  x = "y"  ><bar /></foo>')

print(document:toXml())
--[[ Output:
<foo x="y"><bar/></foo>
]]

print(document:toXml('<?xml version="1.0"?>'))
--[[ Output:
<?xml version="1.0"?>
<foo x="y"><bar/></foo>
]]

Also see xml.toPrettyXml().

updateAttributes

xml.updateAttributes( element, attributes )

Add new attributes, or update the values of existing.

Penlight alias: Element:set_attribs()

walk

xml.walk( document, depthFirst, callback )
traversalAction = callback( tag, element )
traversalAction = "stop"|"ignorechildren"|nil

Have a function recursively be called on every element in a document (including itself and excluding text nodes). If depthFirst is true then child elements are visited before parent elements.

Return "stop" from the callback to stop the traversal completely, return "ignorechildren" to make the traversal skip all children (unless depthFirst is true in which case it does nothing), or return nil (or nothing) to continue the traversal.

Example:

document:walk(false, function(tag, el)
	if tag == "dog" then
		local dogName = (el.attr.name or "something")
		printf("Found doggo called %s!", dogName)
	end
end)

Settings

htmlAllowNoAttributeValue

xml.htmlAllowNoAttributeValue = bool

Whether attributes in HTML should be allowed to have no value, or if the encoder should follow the same rule as for XML. Default: true.

local doc = xml.parseHtml[[
<input disabled>
]]
echoRaw(xml.toHtml(doc))
-- Output if htmlAllowNoAttributeValue is false: <input disabled="">
-- Output if htmlAllowNoAttributeValue is true:  <input disabled>

htmlScrambleEmailAddresses

xml.htmlScrambleEmailAddresses = bool

Whether or not to encode the href attribute and text child nodes of <a> elements using the mailto: protocol in a way that increases the chance of fooling address-harvesting bots. Default: true.

local doc = xml.parseHtml[[
<a href="mailto:hugh-jass@www.example">hugh-jass@www.example</a>
]]
echoRaw(xml.toHtml(doc))
--[[
	Output if htmlScrambleEmailAddresses is false:
	<a href="mailto:hugh-jass@www.example">hugh-jass@www.example</a>

	Output if htmlScrambleEmailAddresses is true:
	<a href="&#x6d;&#97;&#x69;l&#116;&#x6f;&#58;&#x68;u&#103;&#x68;&#45;
	&#x6a;a&#115;&#x73;&#64;&#x77;w&#119;&#x2e;&#101;&#x78;a&#109;&#x70;
	&#108;&#x65;">&#x68;&#117;&#x67;h&#45;&#x6a;&#97;&#x73;s&#64;&#x77;
	&#119;&#x77;.&#101;&#x78;&#97;&#x6d;p&#108;&#x65;</a>
]]

Page updated: 2021-07-09