<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Bold dream &#187; Python</title>
	<atom:link href="http://bolddream.com/category/programming/python/feed/" rel="self" type="application/rss+xml" />
	<link>http://bolddream.com</link>
	<description>Imagination is limitless. So is stupidity.</description>
	<lastBuildDate>Tue, 02 Mar 2010 22:42:48 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>Getting a random row from a relational database</title>
		<link>http://bolddream.com/2010/01/22/getting-a-random-row-from-a-relational-database/</link>
		<comments>http://bolddream.com/2010/01/22/getting-a-random-row-from-a-relational-database/#comments</comments>
		<pubDate>Fri, 22 Jan 2010 16:22:55 +0000</pubDate>
		<dc:creator>Emil Vladev</dc:creator>
				<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://bolddream.com/?p=258</guid>
		<description><![CDATA[Problem From time to time one needs to fetch a random row from a table in a db. Solution Take one The &#8220;obvious&#8221; solution is to order by RAND() and get the first: SELECT * FROM users ORDER BY RAND&#40;&#41; LIMIT 1; /* Don't do that */ It does the job, problem solved! Well, no, [...]]]></description>
			<content:encoded><![CDATA[<h3>Problem</h3>
<p>From time to time one needs to fetch a random row from a table in a db. </p>
<h3>Solution</h3>
<h4>Take one</h4>
<p>The &#8220;obvious&#8221; solution is to order by <code>RAND()</code> and get the first:</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">SELECT</span> <span style="color: #66cc66;">*</span> <span style="color: #993333; font-weight: bold;">FROM</span> users <span style="color: #993333; font-weight: bold;">ORDER</span> <span style="color: #993333; font-weight: bold;">BY</span> RAND<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">LIMIT</span> <span style="color: #cc66cc;">1</span>; <span style="color: #808080; font-style: italic;">/* Don't do that */</span></pre></div></div>

<p>It does the job, problem solved!</p>
<p>Well, no, not exactly! While it works, the performance will be awful for any table of significant size.</p>
<h4>Take two</h4>
<p>There is another, simple, but efficient, method to get a random row:</p>
<ol>
<li>Fetch the number of rows using <code>COUNT()</code>.

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">SELECT</span> COUNT<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">*</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">FROM</span> users;</pre></div></div>

</li>
<li>Get a random positive number, that is less than (but not equal) to that count.</li>
<li>Select the row using <code>OFFSET</code> (on <a href="http://www.mysql.com/">MySQL</a> &#8211; there are ways to do the same on all major RDBMS)

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">SELECT</span> <span style="color: #66cc66;">*</span> <span style="color: #993333; font-weight: bold;">FROM</span> users <span style="color: #993333; font-weight: bold;">LIMIT</span> <span style="color: #cc66cc;">1</span> OFFSET :rand</pre></div></div>

<p> where <code>:rand</code> is the number you computed.
</li>
</ol>
<p>While this uses two queries &#8211; both of them are very efficient.</p>
<p>Here is the concrete code on how to do that in <a href="http://www.djangoproject.com/">Django</a> (using a <a href="http://docs.djangoproject.com/en/dev/topics/db/managers/#id2">custom manager</a>):</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">class</span> UsersManager<span style="color: black;">&#40;</span>models.<span style="color: black;">Manager</span><span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #dc143c;">random</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        count = <span style="color: #008000;">self</span>.<span style="color: black;">aggregate</span><span style="color: black;">&#40;</span>count=Count<span style="color: black;">&#40;</span><span style="color: #483d8b;">'id'</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span><span style="color: black;">&#91;</span><span style="color: #483d8b;">'count'</span><span style="color: black;">&#93;</span>
        random_index = randint<span style="color: black;">&#40;</span><span style="color: #ff4500;">0</span>, count - <span style="color: #ff4500;">1</span><span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">self</span>.<span style="color: #008000;">all</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#91;</span>random_index<span style="color: black;">&#93;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">class</span> User<span style="color: black;">&#40;</span>models.<span style="color: black;">Model</span><span style="color: black;">&#41;</span>:
    objects = UsersManager<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
    <span style="color: #808080; font-style: italic;"># ... [fields] ...</span></pre></div></div>

<p>Usage is as simple as</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;">User.<span style="color: black;">objects</span>.<span style="color: #dc143c;">random</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></pre></div></div>

<p>Just for the record &#8211; the Django way to <strike>shoot yourself in the foot</strike> do <code>ORDER BY RAND()</code> is</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;">User.<span style="color: black;">objects</span>.<span style="color: black;">order_by</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'?'</span><span style="color: black;">&#41;</span>  <span style="color: #808080; font-style: italic;"># If you do that in production - you may as well forget about sleep!</span></pre></div></div>

<h3>Proof</h3>
<p>Here is the evidence that doing <code>ORDER BY RAND()</code> is very slow.</p>
<p>For the test we will have a simple Django model and a few of scripts &#8211; one that populates the database with bulk data, one that fetches a user using <code>ORDER BY RAND()</code> and one that fetches it using a second query. We populate the db with one million records and get a random one.</p>
<p>Here are the results (done using Python 2.6.4 and MySQL 5.1/SQLite 3.6.16 on Ubuntu 9.10):</p>
<p>Using the naive method by doing <code>ORDER BY RAND()</code>:</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">$ <span style="color: #000000; font-weight: bold;">time</span> python naive.py
<span style="color: #666666; font-style: italic;">#       MySQL        SQLite</span>
real	0m4.519s     0m2.061s <span style="color: #666666; font-style: italic;"># seconds?!?! </span>
user	0m0.152s     0m1.952s
sys	0m0.028s     0m0.056s</pre></div></div>

<p>(executing the query directly results in times close to those).</p>
<p>Using the smart method with two queries:</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">$ <span style="color: #000000; font-weight: bold;">time</span> python smart.py 
<span style="color: #666666; font-style: italic;">#       MySQL        SQLite</span>
real	0m0.367s     0m0.339s <span style="color: #666666; font-style: italic;"># much better</span>
user	0m0.156s     0m0.284s
sys	0m0.032s     0m0.052s</pre></div></div>

<p>The more records you add to the table, the slower the naive method will get, while the smart one will run at about the same speed.</p>
<p>If you need to get a random record out of a filtered set (using <code>WHERE</code>) is&#8217;s basically the same. You just need to add the <code>WHERE</code> clause to both queries.</p>
<h4>Details</h4>
<p>Below is the code that is used to make the benchmark.</p>
<p>This is the <code>models.py</code> file of a simple Django app.</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">from</span> <span style="color: #dc143c;">random</span> <span style="color: #ff7700;font-weight:bold;">import</span> randint
<span style="color: #ff7700;font-weight:bold;">from</span> django.<span style="color: black;">db</span> <span style="color: #ff7700;font-weight:bold;">import</span> models
<span style="color: #ff7700;font-weight:bold;">from</span> django.<span style="color: black;">db</span>.<span style="color: black;">models</span> <span style="color: #ff7700;font-weight:bold;">import</span> Count
&nbsp;
<span style="color: #ff7700;font-weight:bold;">class</span> UserManager<span style="color: black;">&#40;</span>models.<span style="color: black;">Manager</span><span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #dc143c;">random</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        count = <span style="color: #008000;">self</span>.<span style="color: black;">aggregate</span><span style="color: black;">&#40;</span>ids=Count<span style="color: black;">&#40;</span><span style="color: #483d8b;">'id'</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span><span style="color: black;">&#91;</span><span style="color: #483d8b;">'ids'</span><span style="color: black;">&#93;</span>
        random_index = randint<span style="color: black;">&#40;</span><span style="color: #ff4500;">0</span>, count - <span style="color: #ff4500;">1</span><span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">self</span>.<span style="color: #008000;">all</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#91;</span>random_index<span style="color: black;">&#93;</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> random_naive<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">self</span>.<span style="color: #008000;">all</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>.<span style="color: black;">order_by</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'?'</span><span style="color: black;">&#41;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">class</span> User<span style="color: black;">&#40;</span>models.<span style="color: black;">Model</span><span style="color: black;">&#41;</span>:
    username = models.<span style="color: black;">CharField</span><span style="color: black;">&#40;</span>max_length=<span style="color: #ff4500;">128</span><span style="color: black;">&#41;</span>
    password = models.<span style="color: black;">CharField</span><span style="color: black;">&#40;</span>max_length=<span style="color: #ff4500;">128</span><span style="color: black;">&#41;</span>
&nbsp;
    objects = UserManager<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></pre></div></div>

<p>The <code>bluk.py</code> script that will fill the db with junk.</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">os</span>
<span style="color: #ff7700;font-weight:bold;">import</span> settings
&nbsp;
<span style="color: #dc143c;">os</span>.<span style="color: black;">environ</span><span style="color: black;">&#91;</span><span style="color: #483d8b;">'DJANGO_SETTINGS_MODULE'</span><span style="color: black;">&#93;</span> = <span style="color: #dc143c;">os</span>.<span style="color: black;">path</span>.<span style="color: black;">join</span><span style="color: black;">&#40;</span>
    <span style="color: #dc143c;">os</span>.<span style="color: black;">path</span>.<span style="color: black;">dirname</span><span style="color: black;">&#40;</span>__file__<span style="color: black;">&#41;</span>, <span style="color: #483d8b;">'settings'</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">from</span> users.<span style="color: black;">models</span> <span style="color: #ff7700;font-weight:bold;">import</span> User
<span style="color: #ff7700;font-weight:bold;">if</span> __name__ == <span style="color: #483d8b;">'__main__'</span>:
    <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">0</span>, <span style="color: #ff4500;">1000000</span><span style="color: black;">&#41;</span>:
        u = User<span style="color: black;">&#40;</span>username=<span style="color: #483d8b;">&quot;user{0}&quot;</span>.<span style="color: black;">format</span><span style="color: black;">&#40;</span>i<span style="color: black;">&#41;</span>, password=<span style="color: #483d8b;">&quot;pass{0}&quot;</span>.<span style="color: black;">format</span><span style="color: black;">&#40;</span>i<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
        u.<span style="color: black;">save</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></pre></div></div>

<p>This is <code>naive.py</code> &#8211; the script that uses the slow method.</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">os</span>
<span style="color: #ff7700;font-weight:bold;">import</span> settings
&nbsp;
<span style="color: #dc143c;">os</span>.<span style="color: black;">environ</span><span style="color: black;">&#91;</span><span style="color: #483d8b;">'DJANGO_SETTINGS_MODULE'</span><span style="color: black;">&#93;</span> = <span style="color: #dc143c;">os</span>.<span style="color: black;">path</span>.<span style="color: black;">join</span><span style="color: black;">&#40;</span>
    <span style="color: #dc143c;">os</span>.<span style="color: black;">path</span>.<span style="color: black;">dirname</span><span style="color: black;">&#40;</span>__file__<span style="color: black;">&#41;</span>, <span style="color: #483d8b;">'settings'</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">from</span> users.<span style="color: black;">models</span> <span style="color: #ff7700;font-weight:bold;">import</span> User
&nbsp;
<span style="color: #ff7700;font-weight:bold;">if</span> __name__ == <span style="color: #483d8b;">'__main__'</span>:
    u = User.<span style="color: black;">objects</span>.<span style="color: black;">random_naive</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">print</span><span style="color: black;">&#40;</span>u.<span style="color: black;">username</span><span style="color: black;">&#41;</span></pre></div></div>

<p>And <code>smart.py</code> that uses an additional query.</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">os</span>
<span style="color: #ff7700;font-weight:bold;">import</span> settings
&nbsp;
<span style="color: #dc143c;">os</span>.<span style="color: black;">environ</span><span style="color: black;">&#91;</span><span style="color: #483d8b;">'DJANGO_SETTINGS_MODULE'</span><span style="color: black;">&#93;</span> = <span style="color: #dc143c;">os</span>.<span style="color: black;">path</span>.<span style="color: black;">join</span><span style="color: black;">&#40;</span>
    <span style="color: #dc143c;">os</span>.<span style="color: black;">path</span>.<span style="color: black;">dirname</span><span style="color: black;">&#40;</span>__file__<span style="color: black;">&#41;</span>, <span style="color: #483d8b;">'settings'</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">from</span> users.<span style="color: black;">models</span> <span style="color: #ff7700;font-weight:bold;">import</span> User
&nbsp;
<span style="color: #ff7700;font-weight:bold;">if</span> __name__ == <span style="color: #483d8b;">'__main__'</span>:
    u = User.<span style="color: black;">objects</span>.<span style="color: #dc143c;">random</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">print</span><span style="color: black;">&#40;</span>u.<span style="color: black;">username</span><span style="color: black;">&#41;</span></pre></div></div>

]]></content:encoded>
			<wfw:commentRss>http://bolddream.com/2010/01/22/getting-a-random-row-from-a-relational-database/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
	</channel>
</rss>
