XML
v1.2
The XML module, available through the xml
global, handles XML data parsing and contains XML/HTML related functionality.
Note: The API is very similar to Penlight's. Most functions have new names, but the Penlight names also work.
Introduction
Accessing XML data through the data
object (or calling xml.parseXml
) will get you an XML node back (or specifically, an element).
A node can be two things: XML tags become elements (represented by tables) while all other data become text nodes (represented by strings).
Elements are sometimes also called documents in this documentation and other places, especially when referring to the root element in a node tree.
Elements
Elements always have a tag
field and an attr
field (for attributes).
They are also arrays containing child nodes.
element = {
tag = tagName,
attr = {
[name1]=value1, [name2]=value2, ...
},
[1]=childNode1, [2]=childNode2, ...
}
A similar format is used in other libraries too. LuaExpat calls it LOM.
Example
The following XML...
<animal type="dog" name="Puddles">
<hobbies>Biting & eating</hobbies>
<!-- Comments are ignored. -->
How did this <![CDATA[ get here? ]]>
</animal>
...results in this table:
document = {
tag = "animal",
attr = {
["name"] = "Puddles",
["type"] = "dog",
},
[1] = "\n\t",
[2] = {
tag = "hobbies",
attr = {},
[1] = "Biting & eating",
},
[3] = "\n\t\n\tHow did this get here? \n",
}
Notice how all whitespace is preserved, and that CDATA sections become text.
API
Functions
Note: All functions can be called as methods on elements (i.e. xml.toXml(element)
is the same as element:toXml()
).
- addChild
- clone
- compare
- contentsToHtml
- contentsToXml
- decodeEntities
- eachChild
- eachChildElement
- eachMatchingChildElement
- element
- encodeMoreEntities
- encodeRequiredEntities
- filter
- findAllElementsByName
- getAttributes
- getChildByName
- getFirstElement
- getHtmlText
- getText
- getTextOfDirectChildren
- isElement
- isText
- makeElementConstructors
- mapElements
- match
- newElement
- parseHtml
- parseXml
- removeWhitespaceNodes
- setAttribute
- substitute
- toHtml
- toPrettyXml
- toXml
- updateAttributes
- walk
addChild
xml.addChild( element, childNode )
Add a child node to an element.
Penlight alias:
Element:add_direct_child
clone
nodeClone = xml.clone( node [, textSubstitutionCallback ] )
Clone a node and it's children.
If the textSubstitutionCallback
argument is given, it should be a function with this signature:
text = textSubstitutionCallback( text, kind, parentElement )
This function is called for every text node, tag name and attribute in the node tree.
It can modify the values of these things for the clone by returning a modified string.
kind
will be "*TEXT"
for text nodes, "*TAG"
for tag names, and the attribute name for attributes.
parentElement
will be nil if the initial node
argument is a text node.
compare
nodesLookEqual = xml.compare( value1, value2 )
Returns true if the values are two nodes that look equal, false otherwise. Returns false if any value is not a node.
contentsToHtml
htmlString = xml.contentsToHtml( node )
v1.3 Convert the child nodes of a node into an HTML string.
contentsToXml
xmlString = xml.contentsToXml( node )
v1.3 Convert the child nodes of a node into an XML string.
decodeEntities
string = decodeEntities( encodedString [, strict=true ] )
Decode XML/HTML entities in a string.
If strict
is false then the parsing rules are more relaxed (i.e. &
can appear without being being part of a valid entity).
eachChild
for childNode in xml.eachChild( element )
Iterate over child nodes.
Penlight alias:
Element:children
eachChildElement
for childElement in xml.eachChildElement( element )
Iterate over child elements (skipping over text nodes).
Penlight alias:
Element:childtags
eachMatchingChildElement
for childElement in xml.eachMatchingChildElement( element, tag )
Iterate over child elements that have the given tag name.
element
element = xml.element( tag [, childNode|attributesAndChildNodes ] )
Convenient function for creating a new element. The second argument, if given, can be either a node to put in the element as it's first child, or a combination of an array of child elements and a table of attributes. Examples:
local person = xml.element("person")
local month = xml.element("month", "April")
local planet = xml.element("planet", xml.element("moon"))
local chicken = xml.element("chicken", {
age = "3",
id = "942-8483",
xml.element("egg"),
xml.element("egg"),
})
Penlight alias:
xml.elem
encodeRequiredEntities
encodedString = xml.encodeRequiredEntities( string )
v1.3
Encode &
, <
, >
, "
and '
characters into XML/HTML entities (&
etc.).
This is the same function as entities
.
encodeMoreEntities
html = xml.encodeMoreEntities( string )
v1.3
Encode &
, <
, >
, "
and '
characters into HTML entities (&
etc.).
Also encodes some additional spaces and invisible characters, like
and ⁢
.
filter
xml.filter( element [, textSubstitutionCallback ] )
Clone an element and it's children.
This is an alias for xml.clone
.
findAllElementsByName
elements = xml.findAllElementsByName( element, tag [, doNotRecurse=false ] )
Get all child elements that have the given tag, optionally non-recursively.
Penlight alias:
Element:get_elements_with_name
getAttributes
attributes = xml.getAttributes( element )
Get the attributes table for an element (i.e. element.attr
).
Note that the actual table is returned - not a copy of it!
Note: You can use xml.setAttribute
or xml.updateAttributes
for updating attributes.
Penlight alias:
Element:get_attribs
getChildByName
childElement = xml.getChildByName( element, tag )
Get the first child element with a given tag name. Returns nil if none exist.
Penlight alias:
Element:child_with_name
getFirstElement
childElement = xml.getFirstElement( element )
Get the first child element. Returns nil if none exist.
Penlight alias:
Element:first_childtag
getHtmlText
text = xml.getHtmlText( element )
v1.3
Get the full text value of an element (i.e. the concatenation of all child text nodes, recursively).
Unlike xml.getText
, this function is aware of HTML-specific properties, e.g. that the alt
attribute of <img>
tags can be used as a textual replacement for the image.
getText
text = xml.getText( element )
Get the full text value of an element (i.e. the concatenation of all child text nodes, recursively).
getTextOfDirectChildren
text = xml.getTextOfDirectChildren( element )
Get the full text value of an element's direct children (i.e. the concatenation of all child text nodes, non-recursively).
(In most cases you probably want to use xml.getText
or xml.getHtmlText
instead of this function.)
Penlight alias:
Element:get_text
isElement
bool = xml.isElement( value )
Check if a value is an element.
Penlight alias:
xml.is_tag
isText
bool = xml.isText( value )
Check if a value is a text node. (Any string value will make the function return true.)
makeElementConstructors
constructor1, constructor2, ... = xml.makeElementConstructors( tags )
constructor1, constructor2, ... = xml.makeElementConstructors "tag1,tag2,..."
Given a list of tag names, return a number of element constructors. The argument can either be an array of tag names, or a string with comma-separated tags.
A constructor creates a new element with the respective tag name every time it's called. It's a function with this signature:
element = constructor( [ childNode|attributesAndChildNodes ] )
The argument, if given, can be either a node to put in the element as it's first child, or a combination of an array of child elements and a table of attributes
(same as the argument for xml.element
).
Example:
local bowl,fruit = xml.makeElementConstructors "bowl,fruit"
local document = bowl{ size="small", fruit"Apple", fruit"Orange" }
print(document) -- <bowl size="small"><fruit>Apple</fruit><fruit>Orange</fruit></bowl>
Penlight alias:
xml.tags
mapElements
element = xml.mapElements( element, callback )
replacementNode = callback( childElement )
Visit and call a function on all child elements of an element (non-recursively), possibility modifying the document. Returning a node from the callback replaces the current element, while returning nil removes it.
Penlight alias:
Element:maptags
match
matches = xml.match( document, xmlStringPattern|elementPattern )
Find things in a document by supplying a pattern.
This is the opposite function of xml.substitute
.
See the Penlight manual on the subject for more info (look for the sections describing templates).
Returns nil and a message on error.
newElement
element = xml.newElement( tag [, attributes ] )
Create a new element, optionally initialized with a given attributes table. Examples:
local person = xml.newElement("person")
local chicken = xml.newElement("chicken", {age="3", id="942-8483"})
Penlight alias:
xml.new
parseHtml
element = xml.parseHtml( htmlString [, filePathForErrorMessages ] )
Parse a string containing HTML markup. Returns nil and a message on error. Example:
local document = xml.parseHtml("<!DOCTYPE html>\n<html><head><script> var result = 1 & 3; </script></head></html>")
print(document[1][1].tag) -- script
parseXml
element = xml.parseXml( xmlString [, filePathForErrorMessages ] )
Parse a string containing XML markup. Returns nil and a message on error. Example:
local document = xml.parseXml("<foo><bar/></foo>")
print(document[1].tag) -- bar
removeWhitespaceNodes
xml.removeWhitespaceNodes( document )
Recursively remove all text nodes that don't contain any non-whitespace characters from the document.
print(document:toXml())
--[[ Output:
<horses>
<horse>
<name> Glitter </name>
</horse>
<horse>
<name>Rush </name>
</horse>
</horses>
]]
document:removeWhitespaceNodes()
print(document:toXml())
--[[ Output:
<horses><horse><name> Glitter </name></horse><horse><name>Rush </name></horse></horses>
]]
setAttribute
xml.setAttribute( element, attributeName, attributeValue|nil )
Add a new attribute, or update the value of an existing. Specify a nil value to remove the attribute.
Penlight alias:
Element:set_attrib
substitute
newDocument = xml.substitute( xmlString|document, data )
Create a substituted copy of a document.
This is the opposite function of xml.match
.
See the Penlight manual on the subject for more info (look for the sections describing templates).
Returns nil and a message on error.
Penlight alias:
Element:subst
toHtml
htmlString = xml.toHtml( node [, preface=false ] )
Convert a node into an HTML string.
preface
, if given, can either be a boolean that says whether a standard <!DOCTYPE html>
string should be prepended, or be a string containing the given preface that should be added.
Example:
local document = xml.parseHtml('<html x = "y" ><body><input type=text disabled></body></html>')
print(document:toHtml())
--[[ Output:
<html x="y"><body><input type="text" disabled></body></html>
]]
toPrettyXml
xmlString = xml.toPrettyXml( node [, initIndent="", indent=noIndent, attrIndent=noIndent, preface=false ] )
Convert a node into an XML string with some "pretty" modifications.
(Generally, you probably want to use xml.toXml
instead of this function.)
initIndent
will be prepended to each line.
Specifying indent
puts each tag on a new line.
Specifying attrIndent
puts each attribute on a new line.
preface
, if given, can either be a boolean that says whether a standard <?xml...?>
string should be prepended, or be a string containing the given preface that should be added.
Examples:
local document = xml.parseXml('<foo x="y"><bar/></foo>')
print(document:toPrettyXml("", " "))
--[[ Output:
<foo x="y">
<bar/>
</foo>
]]
print(document:toPrettyXml("", " ", " ", '<?xml version="1.0"?>'))
--[[ Output:
<?xml version="1.0"?>
<foo
x="y"
>
<bar/>
</foo>
]]
This function is used when calling tostring(element)
.
Penlight alias:
xml.tostring
toXml
xmlString = xml.toXml( node [, preface=false ] )
Convert a node into an XML string.
preface
, if given, can either be a boolean that says whether a standard <?xml...?>
string should be prepended, or be a string containing the given preface that should be added.
Examples:
local document = xml.parseXml('<foo x = "y" ><bar /></foo>')
print(document:toXml())
--[[ Output:
<foo x="y"><bar/></foo>
]]
print(document:toXml('<?xml version="1.0"?>'))
--[[ Output:
<?xml version="1.0"?>
<foo x="y"><bar/></foo>
]]
updateAttributes
xml.updateAttributes( element, attributes )
Add new attributes, or update the values of existing.
Penlight alias:
Element:set_attribs
walk
xml.walk( document, depthFirst, callback )
traversalAction = callback( tag, element )
traversalAction = "stop" | "ignorechildren" | nil
Have a function recursively be called on every element in a document (including itself and excluding text nodes).
If depthFirst
is true then child elements are visited before parent elements.
Return "stop"
from the callback to stop the traversal completely,
return "ignorechildren"
to make the traversal skip all children (unless depthFirst
is true in which case it does nothing),
or return nil (or nothing) to continue the traversal.
Example:
document:walk(false, function(tag, el)
if tag == "dog" then
local dogName = (el.attr.name or "something")
printf("Found doggo called %s!", dogName)
end
end)
Settings
htmlAllowNoAttributeValue
xml.htmlAllowNoAttributeValue = bool
Whether attributes in HTML should be allowed to have no value, or if the encoder should follow the same rule as for XML. Default: true.
local doc = xml.parseHtml[[
<input disabled>
]]
echoRaw(xml.toHtml(doc))
-- Output if htmlAllowNoAttributeValue is false: <input disabled="">
-- Output if htmlAllowNoAttributeValue is true: <input disabled>
htmlScrambleEmailAddresses
xml.htmlScrambleEmailAddresses = bool
Whether or not to encode the href
attribute and text child nodes of <a>
elements using the mailto:
protocol in a way that increases the chance of fooling address-harvesting bots.
Default: true.
local doc = xml.parseHtml[[
<a href="mailto:hugh-jass@www.example">hugh-jass@www.example</a>
]]
echoRaw(xml.toHtml(doc))
--[[
Output if htmlScrambleEmailAddresses is false:
<a href="mailto:hugh-jass@www.example">hugh-jass@www.example</a>
Output if htmlScrambleEmailAddresses is true:
<a href="mailto:hugh-
jass@www.examp
le">hugh-jass@w
ww.example</a>
]]
Page updated: 2022-04-13