<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Jetpack Flight Log &#187; xmlpipe2</title>
	<atom:link href="http://jetpackweb.com/blog/tags/xmlpipe2/feed/" rel="self" type="application/rss+xml" />
	<link>http://jetpackweb.com/blog</link>
	<description>Rock{et}ing the interweb</description>
	<lastBuildDate>Wed, 19 May 2010 22:21:53 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>Sphinx xmlpipe2 in PHP: Part II</title>
		<link>http://jetpackweb.com/blog/2009/08/16/sphinx-xmlpipe2-in-php-part-ii/</link>
		<comments>http://jetpackweb.com/blog/2009/08/16/sphinx-xmlpipe2-in-php-part-ii/#comments</comments>
		<pubDate>Mon, 17 Aug 2009 04:41:28 +0000</pubDate>
		<dc:creator>Brian Racer</dc:creator>
				<category><![CDATA[php]]></category>
		<category><![CDATA[sphinx]]></category>
		<category><![CDATA[xmlpipe2]]></category>

		<guid isPermaLink="false">http://jetpackweb.com/blog/?p=251</guid>
		<description><![CDATA[In the last article we successfully created a PHP class that outputs XML as input for Sphinx&#8217;s indexer. However it was incredibly inefficient as we had to hold everything in memory. Here is an updated class that extends XMLWriter, which is a built in PHP class that is essentially undocumented and works great for creating [...]]]></description>
			<content:encoded><![CDATA[<p>In the <a href="http://jetpackweb.com/blog/2009/08/16/sphinx-xmlpipe2-in-php-part-i/" target="_blank">last article</a> we successfully created a PHP class that outputs XML as input for Sphinx&#8217;s indexer. However it was incredibly inefficient as we had to hold everything in memory. Here is an updated class that extends <a href="http://www.php.net/manual/en/book.xmlwriter.php" target="_blank">XMLWriter</a>, which is a built in PHP class that is essentially undocumented and works great for creating memory efficient streams of XML data. Rather than keeping each document in memory, XMLWriter will allow us to immediately flush that document&#8217;s XML elements to standard output.</p>

<div class="wp_syntax"><div class="code"><pre class="php" style="font-family: Monaco, monospace;"><span class="kw2">&lt;?php</span>
<span class="coMULTI">/*
  *  SphinxXMLFeed - efficiently generate XML for Sphinx's xmlpipe2 data adapter
  *  (c) 2009 Jetpack LLC http://jetpackweb.com
  */</span>
<span class="kw2">class</span> SphinxXMLFeed <span class="kw2">extends</span> XMLWriter
<span class="br0">&#123;</span>
  <span class="kw2">private</span> <span class="re0">$fields</span> <span class="sy0">=</span> <span class="kw3">array</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span>
  <span class="kw2">private</span> <span class="re0">$attributes</span> <span class="sy0">=</span> <span class="kw3">array</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span>
&nbsp;
  <span class="kw2">public</span> <span class="kw2">function</span> __construct<span class="br0">&#40;</span><span class="re0">$options</span> <span class="sy0">=</span> <span class="kw3">array</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#41;</span>
  <span class="br0">&#123;</span>
    <span class="re0">$defaults</span> <span class="sy0">=</span> <span class="kw3">array</span><span class="br0">&#40;</span>
      <span class="st_h">'indent'</span> <span class="sy0">=&gt;</span> <span class="kw4">false</span><span class="sy0">,</span>
    <span class="br0">&#41;</span><span class="sy0">;</span>
    <span class="re0">$options</span> <span class="sy0">=</span> <span class="kw3">array_merge</span><span class="br0">&#40;</span><span class="re0">$defaults</span><span class="sy0">,</span> <span class="re0">$options</span><span class="br0">&#41;</span><span class="sy0">;</span>
&nbsp;
    <span class="co1">// Store the xml tree in memory</span>
    <span class="re0">$this</span><span class="sy0">-&gt;</span><span class="me1">openMemory</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span>
&nbsp;
    <span class="kw1">if</span><span class="br0">&#40;</span><span class="re0">$options</span><span class="br0">&#91;</span><span class="st_h">'indent'</span><span class="br0">&#93;</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
      <span class="re0">$this</span><span class="sy0">-&gt;</span><span class="me1">setIndent</span><span class="br0">&#40;</span><span class="kw4">true</span><span class="br0">&#41;</span><span class="sy0">;</span>
    <span class="br0">&#125;</span>
  <span class="br0">&#125;</span>
&nbsp;
  <span class="kw2">public</span> <span class="kw2">function</span> setFields<span class="br0">&#40;</span><span class="re0">$fields</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
    <span class="re0">$this</span><span class="sy0">-&gt;</span><span class="me1">fields</span> <span class="sy0">=</span> <span class="re0">$fields</span><span class="sy0">;</span>
  <span class="br0">&#125;</span>
&nbsp;
  <span class="kw2">public</span> <span class="kw2">function</span> setAttributes<span class="br0">&#40;</span><span class="re0">$attributes</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
    <span class="re0">$this</span><span class="sy0">-&gt;</span><span class="me1">attributes</span> <span class="sy0">=</span> <span class="re0">$attributes</span><span class="sy0">;</span>
  <span class="br0">&#125;</span>
&nbsp;
  <span class="kw2">public</span> <span class="kw2">function</span> addDocument<span class="br0">&#40;</span><span class="re0">$doc</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
    <span class="re0">$this</span><span class="sy0">-&gt;</span><span class="me1">startElement</span><span class="br0">&#40;</span><span class="st_h">'sphinx:document'</span><span class="br0">&#41;</span><span class="sy0">;</span>
    <span class="re0">$this</span><span class="sy0">-&gt;</span><span class="me1">writeAttribute</span><span class="br0">&#40;</span><span class="st_h">'id'</span><span class="sy0">,</span> <span class="re0">$doc</span><span class="br0">&#91;</span><span class="st_h">'id'</span><span class="br0">&#93;</span><span class="br0">&#41;</span><span class="sy0">;</span>
&nbsp;
    <span class="kw1">foreach</span><span class="br0">&#40;</span><span class="re0">$doc</span> <span class="kw1">as</span> <span class="re0">$key</span> <span class="sy0">=&gt;</span> <span class="re0">$value</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
      <span class="co1">// Skip the id key since that is an element attribute</span>
      <span class="kw1">if</span><span class="br0">&#40;</span><span class="re0">$key</span> <span class="sy0">==</span> <span class="st_h">'id'</span><span class="br0">&#41;</span> <span class="kw1">continue</span><span class="sy0">;</span>
&nbsp;
      <span class="re0">$this</span><span class="sy0">-&gt;</span><span class="me1">startElement</span><span class="br0">&#40;</span><span class="re0">$key</span><span class="br0">&#41;</span><span class="sy0">;</span>
      <span class="re0">$this</span><span class="sy0">-&gt;</span><span class="me1">text</span><span class="br0">&#40;</span><span class="re0">$value</span><span class="br0">&#41;</span><span class="sy0">;</span>
      <span class="re0">$this</span><span class="sy0">-&gt;</span><span class="me1">endElement</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span>
    <span class="br0">&#125;</span>
&nbsp;
    <span class="re0">$this</span><span class="sy0">-&gt;</span><span class="me1">endElement</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span>
    <span class="kw1">print</span> <span class="re0">$this</span><span class="sy0">-&gt;</span><span class="me1">outputMemory</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span>
  <span class="br0">&#125;</span>
&nbsp;
  <span class="kw2">public</span> <span class="kw2">function</span> beginOutput<span class="br0">&#40;</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
&nbsp;
    <span class="re0">$this</span><span class="sy0">-&gt;</span><span class="me1">startDocument</span><span class="br0">&#40;</span><span class="st_h">'1.0'</span><span class="sy0">,</span> <span class="st_h">'UTF-8'</span><span class="br0">&#41;</span><span class="sy0">;</span>
    <span class="re0">$this</span><span class="sy0">-&gt;</span><span class="me1">startElement</span><span class="br0">&#40;</span><span class="st_h">'sphinx:docset'</span><span class="br0">&#41;</span><span class="sy0">;</span>
    <span class="re0">$this</span><span class="sy0">-&gt;</span><span class="me1">startElement</span><span class="br0">&#40;</span><span class="st_h">'sphinx:schema'</span><span class="br0">&#41;</span><span class="sy0">;</span>
&nbsp;
    <span class="co1">// add fields to the schema</span>
    <span class="kw1">foreach</span><span class="br0">&#40;</span><span class="re0">$this</span><span class="sy0">-&gt;</span><span class="me1">fields</span> <span class="kw1">as</span> <span class="re0">$field</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
      <span class="re0">$this</span><span class="sy0">-&gt;</span><span class="me1">startElement</span><span class="br0">&#40;</span><span class="st_h">'sphinx:field'</span><span class="br0">&#41;</span><span class="sy0">;</span>
      <span class="re0">$this</span><span class="sy0">-&gt;</span><span class="me1">writeAttribute</span><span class="br0">&#40;</span><span class="st_h">'name'</span><span class="sy0">,</span> <span class="re0">$field</span><span class="br0">&#41;</span><span class="sy0">;</span>
      <span class="re0">$this</span><span class="sy0">-&gt;</span><span class="me1">endElement</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span>
    <span class="br0">&#125;</span>
&nbsp;
    <span class="co1">// add attributes to the schema</span>
    <span class="kw1">foreach</span><span class="br0">&#40;</span><span class="re0">$this</span><span class="sy0">-&gt;</span><span class="me1">attributes</span> <span class="kw1">as</span> <span class="re0">$attributes</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
      <span class="re0">$this</span><span class="sy0">-&gt;</span><span class="me1">startElement</span><span class="br0">&#40;</span><span class="st_h">'sphinx:attr'</span><span class="br0">&#41;</span><span class="sy0">;</span>
      <span class="kw1">foreach</span><span class="br0">&#40;</span><span class="re0">$attributes</span> <span class="kw1">as</span> <span class="re0">$key</span> <span class="sy0">=&gt;</span> <span class="re0">$value</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
        <span class="re0">$this</span><span class="sy0">-&gt;</span><span class="me1">writeAttribute</span><span class="br0">&#40;</span><span class="re0">$key</span><span class="sy0">,</span> <span class="re0">$value</span><span class="br0">&#41;</span><span class="sy0">;</span>
      <span class="br0">&#125;</span>
      <span class="re0">$this</span><span class="sy0">-&gt;</span><span class="me1">endElement</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span>
    <span class="br0">&#125;</span>
&nbsp;
    <span class="co1">// end sphinx:schema</span>
    <span class="re0">$this</span><span class="sy0">-&gt;</span><span class="me1">endElement</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span>
    <span class="kw1">print</span> <span class="re0">$this</span><span class="sy0">-&gt;</span><span class="me1">outputMemory</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span>
  <span class="br0">&#125;</span>
&nbsp;
  <span class="kw2">public</span> <span class="kw2">function</span> endOutput<span class="br0">&#40;</span><span class="br0">&#41;</span>
  <span class="br0">&#123;</span>
    <span class="co1">// end sphinx:docset</span>
    <span class="re0">$this</span><span class="sy0">-&gt;</span><span class="me1">endElement</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span>
    <span class="kw1">print</span> <span class="re0">$this</span><span class="sy0">-&gt;</span><span class="me1">outputMemory</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span>
  <span class="br0">&#125;</span>
<span class="br0">&#125;</span></pre></div></div>

<p>We can use it as follows:</p>

<div class="wp_syntax"><div class="code"><pre class="php" style="font-family: Monaco, monospace;"><span class="re0">$doc</span> <span class="sy0">=</span> <span class="kw2">new</span> SphinxXMLFeed<span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span>
&nbsp;
<span class="re0">$doc</span><span class="sy0">-&gt;</span><span class="me1">setFields</span><span class="br0">&#40;</span><span class="kw3">array</span><span class="br0">&#40;</span>
  <span class="st_h">'title'</span><span class="sy0">,</span>
  <span class="st_h">'teaser'</span><span class="sy0">,</span>
  <span class="st_h">'content'</span><span class="sy0">,</span>
<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="sy0">;</span>
&nbsp;
<span class="re0">$doc</span><span class="sy0">-&gt;</span><span class="me1">setAttributes</span><span class="br0">&#40;</span><span class="kw3">array</span><span class="br0">&#40;</span>
  <span class="kw3">array</span><span class="br0">&#40;</span><span class="st_h">'name'</span> <span class="sy0">=&gt;</span> <span class="st_h">'blog_id'</span><span class="sy0">,</span> <span class="st_h">'type'</span> <span class="sy0">=&gt;</span> <span class="st_h">'int'</span><span class="sy0">,</span> <span class="st_h">'bits'</span> <span class="sy0">=&gt;</span> <span class="st_h">'16'</span><span class="sy0">,</span> <span class="st_h">'default'</span> <span class="sy0">=&gt;</span> <span class="st_h">'0'</span><span class="br0">&#41;</span><span class="sy0">,</span>
<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="sy0">;</span>
&nbsp;
<span class="re0">$doc</span><span class="sy0">-&gt;</span><span class="me1">beginOutput</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span>
&nbsp;
<span class="kw1">foreach</span><span class="br0">&#40;</span><span class="kw3">range</span><span class="br0">&#40;</span><span class="nu0">1</span><span class="sy0">,</span> <span class="nu0">1000</span><span class="br0">&#41;</span> <span class="kw1">as</span> <span class="re0">$id</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
  <span class="re0">$doc</span><span class="sy0">-&gt;</span><span class="me1">addDocument</span><span class="br0">&#40;</span><span class="kw3">array</span><span class="br0">&#40;</span>
    <span class="st_h">'id'</span> <span class="sy0">=&gt;</span> <span class="re0">$id</span><span class="sy0">,</span>
    <span class="st_h">'blog_id'</span> <span class="sy0">=&gt;</span> <span class="kw3">rand</span><span class="br0">&#40;</span><span class="nu0">1</span><span class="sy0">,</span> <span class="nu0">10</span><span class="br0">&#41;</span><span class="sy0">,</span>
    <span class="st_h">'title'</span> <span class="sy0">=&gt;</span> <span class="st0">&quot;Article Part <span class="es4">{$id}</span>&quot;</span><span class="sy0">,</span>
    <span class="st_h">'teaser'</span> <span class="sy0">=&gt;</span> <span class="st0">&quot;Article <span class="es4">{$id}</span> teaster&quot;</span><span class="sy0">,</span>
    <span class="st_h">'content'</span> <span class="sy0">=&gt;</span> <span class="st0">&quot;Article <span class="es4">{$id}</span> content&quot;</span><span class="sy0">,</span>
  <span class="br0">&#41;</span><span class="br0">&#41;</span><span class="sy0">;</span>
<span class="br0">&#125;</span>
&nbsp;
<span class="re0">$doc</span><span class="sy0">-&gt;</span><span class="me1">endOutput</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span></pre></div></div>

<p>As you can see the first thing we need to do is populate the <b>fields</b> and <b>attributes</b>. Once that is done, we call <i>beginOutput</i>, that will create the head of the XML document. After each document is added, the document&#8217;s xml markup is immediately outputted and the memory buffer is cleared.</p>
<p>Finally we call <i>endOutput</i>, which will close the <strong>sphinx:docset</strong> element.</p>
<p>I have used this class in production to index millions of records that take up dozens of gigabytes. Keep in mind if you are working with that much data, you will probably need to bach your queries so you are not loading all the records at once!</p>
]]></content:encoded>
			<wfw:commentRss>http://jetpackweb.com/blog/2009/08/16/sphinx-xmlpipe2-in-php-part-ii/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Sphinx xmlpipe2 in PHP: Part I</title>
		<link>http://jetpackweb.com/blog/2009/08/16/sphinx-xmlpipe2-in-php-part-i/</link>
		<comments>http://jetpackweb.com/blog/2009/08/16/sphinx-xmlpipe2-in-php-part-i/#comments</comments>
		<pubDate>Mon, 17 Aug 2009 04:31:30 +0000</pubDate>
		<dc:creator>Brian Racer</dc:creator>
				<category><![CDATA[php]]></category>
		<category><![CDATA[sphinx]]></category>
		<category><![CDATA[xmlpipe2]]></category>

		<guid isPermaLink="false">http://jetpackweb.com/blog/?p=246</guid>
		<description><![CDATA[Sphinx is a great open source package for implementing a full text search. Before we can use it to search, we first must inject all of our data into it. There are two primary ways of loading that data in &#8211; directly accessing the data via a sql query, or using the xmlpipe2 format. Although [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.sphinxsearch.coml" target="_blank">Sphinx</a> is a great open source package for implementing a full text search. Before we can use it to search, we first must inject all of our data into it. There are two primary ways of loading that data in &#8211; directly accessing the data via a sql query, or using the <strong>xmlpipe2</strong> format. Although using the database as a direct data source is very fast, it can sometimes be difficult to craft a query that will contain normalized data for all the fields you require in an index. The XML option gives us much more flexibility at the cost of speed(although it is still very fast). This article will deal with show you how to generate that XML. It assumed to have a basic understanding of how Sphinx works, if not <a href="http://www.sphinxsearch.com/docs/current.html">browse the docs</a> first.</p>
<p>An example xmlpipe2 format looks like this:</p>

<div class="wp_syntax"><div class="code"><pre class="xml" style="font-family: Monaco, monospace;"><span class="sc3"><span class="re1">&lt;?xml</span> <span class="re0">version</span>=<span class="st0">&quot;1.0&quot;</span> <span class="re0">encoding</span>=<span class="st0">&quot;utf-8&quot;</span><span class="re2">?&gt;</span></span>
<span class="sc3"><span class="re1">&lt;sphinx:docset<span class="re2">&gt;</span></span></span>
&nbsp;
<span class="sc3"><span class="re1">&lt;sphinx:schema<span class="re2">&gt;</span></span></span>
  <span class="sc3"><span class="re1">&lt;sphinx:field</span> <span class="re0">name</span>=<span class="st0">&quot;subject&quot;</span><span class="re2">/&gt;</span></span> 
  <span class="sc3"><span class="re1">&lt;sphinx:field</span> <span class="re0">name</span>=<span class="st0">&quot;content&quot;</span><span class="re2">/&gt;</span></span>
  <span class="sc3"><span class="re1">&lt;sphinx:attr</span> <span class="re0">name</span>=<span class="st0">&quot;published&quot;</span> <span class="re0">type</span>=<span class="st0">&quot;timestamp&quot;</span><span class="re2">/&gt;</span></span>
  <span class="sc3"><span class="re1">&lt;sphinx:attr</span> <span class="re0">name</span>=<span class="st0">&quot;author_id&quot;</span> <span class="re0">type</span>=<span class="st0">&quot;int&quot;</span> <span class="re0">bits</span>=<span class="st0">&quot;16&quot;</span> <span class="re0">default</span>=<span class="st0">&quot;1&quot;</span><span class="re2">/&gt;</span></span>
<span class="sc3"><span class="re1">&lt;/sphinx:schema<span class="re2">&gt;</span></span></span>
&nbsp;
<span class="sc3"><span class="re1">&lt;sphinx:document</span> <span class="re0">id</span>=<span class="st0">&quot;1234&quot;</span><span class="re2">&gt;</span></span>
  <span class="sc3"><span class="re1">&lt;content<span class="re2">&gt;</span></span></span>this is the main content <span class="sc2">&lt;![CDATA[[and this &lt;cdata&gt; entry must be handled properly by xml parser lib]]&gt;</span><span class="sc3"><span class="re1">&lt;/content<span class="re2">&gt;</span></span></span>
  <span class="sc3"><span class="re1">&lt;published<span class="re2">&gt;</span></span></span>1012325463<span class="sc3"><span class="re1">&lt;/published<span class="re2">&gt;</span></span></span>
  <span class="sc3"><span class="re1">&lt;subject<span class="re2">&gt;</span></span></span>note how field/attr tags can be in <span class="sc3"><span class="re1">&lt;b</span> <span class="re0">class</span>=<span class="st0">&quot;red&quot;</span><span class="re2">&gt;</span></span>randomized<span class="sc3"><span class="re1">&lt;/b<span class="re2">&gt;</span></span></span> order<span class="sc3"><span class="re1">&lt;/subject<span class="re2">&gt;</span></span></span>
  <span class="sc3"><span class="re1">&lt;misc<span class="re2">&gt;</span></span></span>some undeclared element<span class="sc3"><span class="re1">&lt;/misc<span class="re2">&gt;</span></span></span>
<span class="sc3"><span class="re1">&lt;/sphinx:document<span class="re2">&gt;</span></span></span>
<span class="sc-1">&lt;!-- ... more documents here ... --&gt;</span>
<span class="sc3"><span class="re1">&lt;/sphinx:docset<span class="re2">&gt;</span></span></span></pre></div></div>

<p>First we define the schema, which contains fields and attributes. <b>Fields</b> will be processed for fulltext searches, and <b>attributes</b> will be used to help filter those search results. More information about attributes and their options can be <a href="http://www.sphinxsearch.com/docs/current.html#attributes" target="_blank">found in the docs</a>. Once the schema is defined, we start adding our document data. A document contains elements that will map to the previously defined fields and attributes.</p>
<p>Lets try and encapsulate some of that logic into a PHP class:</p>

<div class="wp_syntax"><div class="code"><pre class="php" style="font-family: Monaco, monospace;"><span class="kw2">&lt;?php</span>
&nbsp;
<span class="kw2">class</span> SphinxXMLFeed
<span class="br0">&#123;</span>
  <span class="kw2">private</span> <span class="re0">$fields</span> <span class="sy0">=</span> <span class="kw3">array</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span>
  <span class="kw2">private</span> <span class="re0">$attributes</span> <span class="sy0">=</span> <span class="kw3">array</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span>
  <span class="kw2">private</span> <span class="re0">$documents</span> <span class="sy0">=</span> <span class="kw3">array</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span>
&nbsp;
  <span class="kw2">public</span> <span class="kw2">function</span> setFields<span class="br0">&#40;</span><span class="re0">$fields</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
    <span class="re0">$this</span><span class="sy0">-&gt;</span><span class="me1">fields</span> <span class="sy0">=</span> <span class="re0">$fields</span><span class="sy0">;</span>
  <span class="br0">&#125;</span>
&nbsp;
  <span class="kw2">public</span> <span class="kw2">function</span> setAttributes<span class="br0">&#40;</span><span class="re0">$attributes</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
    <span class="re0">$this</span><span class="sy0">-&gt;</span><span class="me1">attributes</span> <span class="sy0">=</span> <span class="re0">$attributes</span><span class="sy0">;</span>
  <span class="br0">&#125;</span>
&nbsp;
  <span class="kw2">public</span> <span class="kw2">function</span> addDocument<span class="br0">&#40;</span><span class="re0">$doc</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
    <span class="re0">$this</span><span class="sy0">-&gt;</span><span class="me1">documents</span><span class="br0">&#91;</span><span class="br0">&#93;</span> <span class="sy0">=</span> <span class="re0">$doc</span><span class="sy0">;</span>
  <span class="br0">&#125;</span>
&nbsp;
  <span class="kw2">public</span> <span class="kw2">function</span> render<span class="br0">&#40;</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
&nbsp;
    <span class="co1">// create a new XML document</span>
    <span class="re0">$dom</span> <span class="sy0">=</span> <span class="kw2">new</span> DomDocument<span class="br0">&#40;</span><span class="st_h">'1.0'</span><span class="br0">&#41;</span><span class="sy0">;</span>
    <span class="re0">$dom</span><span class="sy0">-&gt;</span><span class="me1">encoding</span> <span class="sy0">=</span> <span class="st0">&quot;utf-8&quot;</span><span class="sy0">;</span>
    <span class="re0">$dom</span><span class="sy0">-&gt;</span><span class="me1">formatOutput</span> <span class="sy0">=</span> <span class="kw4">true</span><span class="sy0">;</span>
&nbsp;
    <span class="co1">// create root node</span>
    <span class="re0">$root</span> <span class="sy0">=</span> <span class="re0">$dom</span><span class="sy0">-&gt;</span><span class="me1">createElement</span><span class="br0">&#40;</span><span class="st_h">'sphinx:docset'</span><span class="br0">&#41;</span><span class="sy0">;</span>
    <span class="re0">$root</span> <span class="sy0">=</span> <span class="re0">$dom</span><span class="sy0">-&gt;</span><span class="me1">appendChild</span><span class="br0">&#40;</span><span class="re0">$root</span><span class="br0">&#41;</span><span class="sy0">;</span>
&nbsp;
    <span class="co1">// create the schema</span>
    <span class="re0">$schema</span> <span class="sy0">=</span> <span class="re0">$dom</span><span class="sy0">-&gt;</span><span class="me1">createElement</span><span class="br0">&#40;</span><span class="st_h">'sphinx:schema'</span><span class="br0">&#41;</span><span class="sy0">;</span>
&nbsp;
    <span class="co1">// common fields we will be cloning</span>
    <span class="re0">$tmp_field</span> <span class="sy0">=</span> <span class="re0">$dom</span><span class="sy0">-&gt;</span><span class="me1">createElement</span><span class="br0">&#40;</span><span class="st_h">'sphinx:field'</span><span class="br0">&#41;</span><span class="sy0">;</span>
    <span class="re0">$tmp_attr</span>  <span class="sy0">=</span> <span class="re0">$dom</span><span class="sy0">-&gt;</span><span class="me1">createElement</span><span class="br0">&#40;</span><span class="st_h">'sphinx:attr'</span><span class="br0">&#41;</span><span class="sy0">;</span>
&nbsp;
    <span class="co1">// add fields to the schema</span>
    <span class="kw1">foreach</span><span class="br0">&#40;</span><span class="re0">$this</span><span class="sy0">-&gt;</span><span class="me1">fields</span> <span class="kw1">as</span> <span class="re0">$field</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
      <span class="re0">$new_field</span> <span class="sy0">=</span> clone<span class="br0">&#40;</span><span class="re0">$tmp_field</span><span class="br0">&#41;</span><span class="sy0">;</span>
      <span class="re0">$new_field</span><span class="sy0">-&gt;</span><span class="me1">setAttribute</span><span class="br0">&#40;</span><span class="st_h">'name'</span><span class="sy0">,</span> <span class="re0">$field</span><span class="br0">&#41;</span><span class="sy0">;</span>
      <span class="re0">$schema</span><span class="sy0">-&gt;</span><span class="me1">appendChild</span><span class="br0">&#40;</span><span class="re0">$new_field</span><span class="br0">&#41;</span><span class="sy0">;</span>
    <span class="br0">&#125;</span>
&nbsp;
    <span class="co1">// add attributes to the schema</span>
    <span class="kw1">foreach</span><span class="br0">&#40;</span><span class="re0">$this</span><span class="sy0">-&gt;</span><span class="me1">attributes</span> <span class="kw1">as</span> <span class="re0">$attributes</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
      <span class="re0">$new_attr</span> <span class="sy0">=</span> clone<span class="br0">&#40;</span><span class="re0">$tmp_attr</span><span class="br0">&#41;</span><span class="sy0">;</span>
      <span class="kw1">foreach</span><span class="br0">&#40;</span><span class="re0">$attributes</span> <span class="kw1">as</span> <span class="re0">$key</span> <span class="sy0">=&gt;</span> <span class="re0">$value</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
        <span class="re0">$new_attr</span><span class="sy0">-&gt;</span><span class="me1">setAttribute</span><span class="br0">&#40;</span><span class="re0">$key</span><span class="sy0">,</span> <span class="re0">$value</span><span class="br0">&#41;</span><span class="sy0">;</span>
        <span class="re0">$schema</span><span class="sy0">-&gt;</span><span class="me1">appendChild</span><span class="br0">&#40;</span><span class="re0">$new_attr</span><span class="br0">&#41;</span><span class="sy0">;</span>
      <span class="br0">&#125;</span>
    <span class="br0">&#125;</span>
&nbsp;
    <span class="co1">// add the schema to the document</span>
    <span class="re0">$root</span><span class="sy0">-&gt;</span><span class="me1">appendChild</span><span class="br0">&#40;</span><span class="re0">$schema</span><span class="br0">&#41;</span><span class="sy0">;</span>
&nbsp;
    <span class="co1">// go through each document</span>
    <span class="kw1">foreach</span><span class="br0">&#40;</span><span class="re0">$this</span><span class="sy0">-&gt;</span><span class="me1">documents</span> <span class="kw1">as</span> <span class="re0">$doc</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
      <span class="re0">$node</span> <span class="sy0">=</span> <span class="re0">$dom</span><span class="sy0">-&gt;</span><span class="me1">createElement</span><span class="br0">&#40;</span><span class="st_h">'sphinx:document'</span><span class="br0">&#41;</span><span class="sy0">;</span>
      <span class="re0">$node</span><span class="sy0">-&gt;</span><span class="me1">setAttribute</span><span class="br0">&#40;</span><span class="st_h">'id'</span><span class="sy0">,</span> <span class="re0">$doc</span><span class="br0">&#91;</span><span class="st_h">'id'</span><span class="br0">&#93;</span><span class="br0">&#41;</span><span class="sy0">;</span>
&nbsp;
      <span class="kw1">foreach</span><span class="br0">&#40;</span><span class="re0">$doc</span> <span class="kw1">as</span> <span class="re0">$key</span> <span class="sy0">=&gt;</span> <span class="re0">$value</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
        <span class="kw1">if</span><span class="br0">&#40;</span><span class="re0">$key</span> <span class="sy0">==</span> <span class="st_h">'id'</span><span class="br0">&#41;</span> <span class="kw1">continue</span><span class="sy0">;</span>
        <span class="re0">$tmp</span> <span class="sy0">=</span> <span class="re0">$dom</span><span class="sy0">-&gt;</span><span class="me1">createElement</span><span class="br0">&#40;</span><span class="re0">$key</span><span class="br0">&#41;</span><span class="sy0">;</span>
        <span class="re0">$tmp</span><span class="sy0">-&gt;</span><span class="me1">appendChild</span><span class="br0">&#40;</span><span class="re0">$dom</span><span class="sy0">-&gt;</span><span class="me1">createTextNode</span><span class="br0">&#40;</span><span class="re0">$value</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="sy0">;</span>
&nbsp;
        <span class="re0">$node</span><span class="sy0">-&gt;</span><span class="me1">appendChild</span><span class="br0">&#40;</span><span class="re0">$tmp</span><span class="br0">&#41;</span><span class="sy0">;</span>
      <span class="br0">&#125;</span>
&nbsp;
      <span class="co1">// add the document to the dom</span>
      <span class="re0">$root</span><span class="sy0">-&gt;</span><span class="me1">appendChild</span><span class="br0">&#40;</span><span class="re0">$node</span><span class="br0">&#41;</span><span class="sy0">;</span>
    <span class="br0">&#125;</span>
&nbsp;
    <span class="co1">// return xml text</span>
    <span class="kw1">return</span> <span class="re0">$dom</span><span class="sy0">-&gt;</span><span class="me1">saveXML</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span>
  <span class="br0">&#125;</span>
<span class="br0">&#125;</span></pre></div></div>

<p>The previous code uses PHP&#8217;s DomDocument interface because that is less error prone than manually echo&#8217;ing out XML tags. One downside of using DomDocument is we must build the entire XML tree before we can output it. This means we must keep each document in memory, so if you are indexing a large amount of data you will probably hit PHP&#8217;s memory limit. We will fix this in the <a href="#" target="_blank">next article</a>. For now, you can use this class as follows:</p>

<div class="wp_syntax"><div class="code"><pre class="php" style="font-family: Monaco, monospace;"><span class="co1">// instantiate the class</span>
<span class="re0">$doc</span> <span class="sy0">=</span> <span class="kw2">new</span> SphinxXMLFeed<span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span>
&nbsp;
<span class="co1">// set the fields we will be indexing</span>
<span class="re0">$doc</span><span class="sy0">-&gt;</span><span class="me1">setFields</span><span class="br0">&#40;</span><span class="kw3">array</span><span class="br0">&#40;</span>
  <span class="st_h">'title'</span><span class="sy0">,</span>
  <span class="st_h">'teaser'</span><span class="sy0">,</span>
  <span class="st_h">'content'</span><span class="sy0">,</span>
<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="sy0">;</span>
&nbsp;
<span class="co1">// set any attributes</span>
<span class="re0">$doc</span><span class="sy0">-&gt;</span><span class="me1">setAttributes</span><span class="br0">&#40;</span><span class="kw3">array</span><span class="br0">&#40;</span>
  <span class="kw3">array</span><span class="br0">&#40;</span><span class="st_h">'name'</span> <span class="sy0">=&gt;</span> <span class="st_h">'blog_id'</span><span class="sy0">,</span> <span class="st_h">'type'</span> <span class="sy0">=&gt;</span> <span class="st_h">'int'</span><span class="sy0">,</span> <span class="st_h">'bits'</span> <span class="sy0">=&gt;</span> <span class="st_h">'16'</span><span class="sy0">,</span> <span class="st_h">'default'</span> <span class="sy0">=&gt;</span> <span class="st_h">'0'</span><span class="br0">&#41;</span><span class="sy0">,</span>
<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="sy0">;</span>
&nbsp;
<span class="co1">// generate some random document. These would usually be pulled from a database</span>
<span class="co1">// or other data source</span>
<span class="kw1">foreach</span><span class="br0">&#40;</span><span class="kw3">range</span><span class="br0">&#40;</span><span class="nu0">1</span><span class="sy0">,</span> <span class="nu0">3</span><span class="br0">&#41;</span> <span class="kw1">as</span> <span class="re0">$id</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
  <span class="re0">$doc</span><span class="sy0">-&gt;</span><span class="me1">addDocument</span><span class="br0">&#40;</span><span class="kw3">array</span><span class="br0">&#40;</span>
    <span class="st_h">'id'</span> <span class="sy0">=&gt;</span> <span class="re0">$id</span><span class="sy0">,</span>
    <span class="st_h">'blog_id'</span> <span class="sy0">=&gt;</span> <span class="kw3">rand</span><span class="br0">&#40;</span><span class="nu0">1</span><span class="sy0">,</span> <span class="nu0">10</span><span class="br0">&#41;</span><span class="sy0">,</span>
    <span class="st_h">'title'</span> <span class="sy0">=&gt;</span> <span class="st0">&quot;Article Part <span class="es4">{$id}</span>&quot;</span><span class="sy0">,</span>
    <span class="st_h">'teaser'</span> <span class="sy0">=&gt;</span> <span class="st0">&quot;Article <span class="es4">{$id}</span> teaster&quot;</span><span class="sy0">,</span>
    <span class="st_h">'content'</span> <span class="sy0">=&gt;</span> <span class="st0">&quot;Article <span class="es4">{$id}</span> content&quot;</span><span class="sy0">,</span>
  <span class="br0">&#41;</span><span class="br0">&#41;</span><span class="sy0">;</span>
<span class="br0">&#125;</span>
&nbsp;
<span class="co1">// Render the XML</span>
<span class="re0">$doc</span><span class="sy0">-&gt;</span><span class="me1">render</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">;</span></pre></div></div>

<p>That code will generate the following XML:</p>

<div class="wp_syntax"><div class="code"><pre class="xml" style="font-family: Monaco, monospace;"><span class="sc3"><span class="re1">&lt;?xml</span> <span class="re0">version</span>=<span class="st0">&quot;1.0&quot;</span> <span class="re0">encoding</span>=<span class="st0">&quot;utf-8&quot;</span><span class="re2">?&gt;</span></span>
<span class="sc3"><span class="re1">&lt;sphinx:docset<span class="re2">&gt;</span></span></span>
  <span class="sc3"><span class="re1">&lt;sphinx:schema<span class="re2">&gt;</span></span></span>
    <span class="sc3"><span class="re1">&lt;sphinx:field</span> <span class="re0">name</span>=<span class="st0">&quot;title&quot;</span><span class="re2">/&gt;</span></span>
    <span class="sc3"><span class="re1">&lt;sphinx:field</span> <span class="re0">name</span>=<span class="st0">&quot;teaser&quot;</span><span class="re2">/&gt;</span></span>
    <span class="sc3"><span class="re1">&lt;sphinx:field</span> <span class="re0">name</span>=<span class="st0">&quot;content&quot;</span><span class="re2">/&gt;</span></span>
    <span class="sc3"><span class="re1">&lt;sphinx:attr</span> <span class="re0">name</span>=<span class="st0">&quot;blog_id&quot;</span> <span class="re0">type</span>=<span class="st0">&quot;int&quot;</span> <span class="re0">bits</span>=<span class="st0">&quot;16&quot;</span> <span class="re0">default</span>=<span class="st0">&quot;0&quot;</span><span class="re2">/&gt;</span></span>
  <span class="sc3"><span class="re1">&lt;/sphinx:schema<span class="re2">&gt;</span></span></span>
  <span class="sc3"><span class="re1">&lt;sphinx:document</span> <span class="re0">id</span>=<span class="st0">&quot;1&quot;</span><span class="re2">&gt;</span></span>
    <span class="sc3"><span class="re1">&lt;blog_id<span class="re2">&gt;</span></span></span>6<span class="sc3"><span class="re1">&lt;/blog_id<span class="re2">&gt;</span></span></span>
    <span class="sc3"><span class="re1">&lt;title<span class="re2">&gt;</span></span></span>Article Part 1<span class="sc3"><span class="re1">&lt;/title<span class="re2">&gt;</span></span></span>
    <span class="sc3"><span class="re1">&lt;teaser<span class="re2">&gt;</span></span></span>Article 1 teaster<span class="sc3"><span class="re1">&lt;/teaser<span class="re2">&gt;</span></span></span>
    <span class="sc3"><span class="re1">&lt;content<span class="re2">&gt;</span></span></span>Article 1 content<span class="sc3"><span class="re1">&lt;/content<span class="re2">&gt;</span></span></span>
  <span class="sc3"><span class="re1">&lt;/sphinx:document<span class="re2">&gt;</span></span></span>
  ...
<span class="sc3"><span class="re1">&lt;/sphinx:docset<span class="re2">&gt;</span></span></span></pre></div></div>

<p>You would setup you datasource in sphinx.conf something like this:</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family: Monaco, monospace;"><span class="kw3">source</span> xml_blog_posts
<span class="br0">&#123;</span>
    <span class="kw3">type</span> = xmlpipe
    xmlpipe_command = <span class="sy0">/</span>usr<span class="sy0">/</span>bin<span class="sy0">/</span>php <span class="sy0">/</span>home<span class="sy0">/</span>example.com<span class="sy0">/</span>lib<span class="sy0">/</span>tasks<span class="sy0">/</span>sphinx_blogs.php
<span class="br0">&#125;</span></pre></div></div>

<p>Don&#8217;t forget to checkout the next article where we optimize this class to handle millions of records!</p>
<p>Continue to next article: <a href="#">Sphinx xmlpipe2 in PHP: Part II</a></p>
]]></content:encoded>
			<wfw:commentRss>http://jetpackweb.com/blog/2009/08/16/sphinx-xmlpipe2-in-php-part-i/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
