xml
  1. xml-xpath-introduction

XPath Introduction - (XPath Tutorial)

XPath is a query language used to navigate and select elements in an XML document. It is used extensively in web scraping and data extraction.

Syntax

The syntax for XPath uses path expressions to select and navigate nodes in an XML document. A path expression is made up of one or more location steps separated by a forward slash (/).

Absolute path

An absolute path starts at the root node and navigates through the document to find the specified element.

/element/child

Relative path

A relative path starts at the current node and navigates through the document to find the specified element.

element/child

Selecting attributes

To select an attribute, use the @ symbol followed by the attribute name.

element/@attribute

Selecting text nodes

To select the text inside an element, use the text() function.

element/text()

Example

Consider the following XML document:

<bookstore>
  <book category="children">
    <title lang="en">Harry Potter</title>
    <author>J.K. Rowling</author>
    <year>2005</year>
    <price>29.99</price>
  </book>
  <book category="web">
    <title lang="en">Learning XML</title>
    <author>Erik T. Ray</author>
    <year>2003</year>
    <price>39.95</price>
  </book>
</bookstore>

To select the title of the book in the children category:

/bookstore/book[@category='children']/title/text()

The output will be:

Harry Potter

Explanation

The path expression /bookstore/book[@category='children']/title/text() starts at the root node (/) and navigates down the document to the book element with the attribute category equal to "children". Then it selects the title element inside of that book element. Finally, it selects the text inside the title element.

Use

XPath is commonly used in web scraping and data extraction to select and extract specific data from an XML or HTML document.

Important Points

  • XPath is a query language used to navigate and select elements in an XML document.
  • A path expression is made up of one or more location steps separated by a forward slash (/).
  • Use the @ symbol to select attributes and the text() function to select text nodes.
  • XPath is commonly used in web scraping and data extraction.

Summary

XPath is a powerful tool for selecting and navigating elements in an XML document. Its syntax allows for precise selection of elements and attributes, making it a valuable tool in web scraping and data extraction.

Published on: