Skip to content
John F. Douthat edited this page Sep 11, 2019 · 7 revisions

DOM Navigation

Problem: Finding the previous, nearest Element of a certain type.

Solution: Using a recursive method to parse all elements regardless of being a sibling or a child of another sibling.

require 'rubygems'
require 'nokogiri'

def search_for_previous_element(node, name)
  result = node
  
  while result = result.previous_sibling
    return result if result.element? && result.name == name
  end
  
  nil
end

parent = Nokogiri::HTML.fragment(DATA.read)
start_here = parent.at('div.block#foo')
# A Nokogiri::XML::Element of the nearest, previous h1.
previous_element_h1 = search_for_previous_element(start_here, 'h1')

puts previous_element_h1 #=> <h1>this is what I want</h1>

__END__
<lorem>
  <h1>wrong one!</h1>
  <ipsum>
    <h1>wrong one!</h1>
    <dolor></dolor>
    <h1>this is what I want</h1>
    <sit></sit>
    <div class="block" id="foo">
      this is where I start
    </div>
    <amet></amet>
    <h1>wrong one!</h1>
  </ipsum>
  <h1>wrong one!</h1>
</lorem>

Automatic HTML Document Hierarchy

Problem: Given an HTML document like this...

  <p>Not sure how to start your day? Let us help!</p>

  <h1>1.0 Getting Started</h1>
  <p>Welcome!</p>

  <h2>1.1 First Things First</h2>
  <p>Get out of bed.</p>

  <h2>1.2 Get Dressed</h2>
  <p>Put on your clothes.</p>

  <h3>1.2.1 First, the undergarments</h3>
  <p>...and then the rest</p>

  <h1>2.0 Eating Breakfast</h1>
  <p>And so on, and so on...</p>

...wrap the content of each 'section' in <div class='section'>...</div> for hierarchical styling (e.g. with CSS such as div.section { margin-left:1em}). The end result looks like this:

  <p>Not sure how to start your day? Let us help!</p>

  <h1>1.0 Getting Started</h1>
  <div class='section'>
     <p>Welcome!</p>

     <h2>1.1 First Things First</h2>
     <div class='section'>
        <p>Get out of bed.</p>
     </div>

     <h2>1.2 Get Dressed</h2>
     <div class='section'>
        <p>Put on your clothes.</p>

        <h3>1.2.1 First, the undergarments</h3>
        <div class='section'>
          <p>...and then the rest</p>
        </div>
     </div>
  </div>

  <h1>2.0 Eating Breakfast</h1>
  <div class='section'>
    <p>And so on, and so on...</p>
  </div>

Solution: Use a stack while walking through the top level of the document, creating and inserting nodes as appropriate.

  # Assuming doc is a Nokogiri::HTML::Document
  if body = doc.css_at('body') then
    stack = []
    body.children.each do |node|
      # non-matching nodes will get level of 0
      level = node.name[ /h([1-6])/i, 1 ].to_i
      level = 99 if level == 0

      stack.pop while (top=stack.last) && top[:level]>=level
      stack.last[:div].add_child( node ) if stack.last
      if level<99
        div = Nokogiri::XML::Node.new('div',@nokodoc)
        div.set_attribute( 'class', 'section' )
        node.add_next_sibling(div)
        stack << { :div=>div, :level=>level }
      end
    end
  end

Other Examples

Articles tagged Nokogiri on stackoverflow.com are another good resource for Nokogiri examples.