Extensible Markup Language (XML) is a simple, flexible, plain-text data format that can represent many different structures of data. An XML document, at its simplest, looks a little like HTML:

<people>
  <person>
    <name>Peter Cooper</name>
    <gender>Male</gender>
  </person>
  <person>
    <name>Fred Bloggs</name>
    <gender>Male</gender>
  </person>
</people>

This extremely simplistic XML document defines a set of people containing two individual persons, each of whom has a name and gender. In previous chapters we’ve used YAML in a similar way to how XML is used here, but although YAML is simpler and easier to use with Ruby, XML is more popular outside the Ruby world.
XML is prevalent when it comes to sharing data on the Internet in a form that’s easy for machines to parse, and is especially popular when using APIs and machine-accessible services provided online, such as Yahoo!’s search APIs and other programming interfaces to online services. Due to XML’s popularity, it’s worthwhile to see how to parse it with Ruby.
Ruby’s primary XML library is called REXML and comes with Ruby by default as part of the standard library.
REXML supports two different ways of processing XML files: tree parsing and stream parsing. Tree parsing is where a file is turned into a single data structure that can then be searched, traversed, and otherwise manipulated. Stream parsing is when a file is processed and parsed on the fly by calling special callback functions whenever something in the file is found. Stream parsing is less powerful in most cases than tree parsing,although it’s slightly faster. In this section we’ll focus on tree parsing, as it makes more
sense for most situations.
Here’s a basic demonstration of parsing an XML file looking for certain elements:

require 'rexml/document'
xml = &lt;&lt;END_XML
<people>
  <person>
    <name>Peter Cooper</name>
    <gender>Male</gender>
  </person>
  <person>
    <name>Fred Bloggs</name>
    <gender>Male</gender>
  </person>
</people>
END_XML
tree = REXML::Document.new(xml)
tree.elements.each(&quot;people/person&quot;) do |person|
  puts person.get_elements(&quot;name&quot;).first
end
Advertisements