<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Getting a random row from a relational database</title>
	<atom:link href="http://bolddream.com/2010/01/22/getting-a-random-row-from-a-relational-database/feed/" rel="self" type="application/rss+xml" />
	<link>http://bolddream.com/2010/01/22/getting-a-random-row-from-a-relational-database/</link>
	<description>Imagination is limitless. So is stupidity.</description>
	<lastBuildDate>Sun, 04 Apr 2010 19:47:47 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
	<item>
		<title>By: Sym Roe</title>
		<link>http://bolddream.com/2010/01/22/getting-a-random-row-from-a-relational-database/comment-page-1/#comment-233</link>
		<dc:creator>Sym Roe</dc:creator>
		<pubDate>Sun, 04 Apr 2010 19:47:47 +0000</pubDate>
		<guid isPermaLink="false">http://bolddream.com/?p=258#comment-233</guid>
		<description>Yes, I understand it&#039;s not &quot;random&quot;, however, your method will exclude some rows entirely! :)

The best solution I&#039;ve seen to this is to store all IDs (and maybe some other data) in redis and use its fast random ordering.  That adds more complexity, but if true random ordering is important then it seems about the only way.</description>
		<content:encoded><![CDATA[<p>Yes, I understand it&#8217;s not &#8220;random&#8221;, however, your method will exclude some rows entirely! :)</p>
<p>The best solution I&#8217;ve seen to this is to store all IDs (and maybe some other data) in redis and use its fast random ordering.  That adds more complexity, but if true random ordering is important then it seems about the only way.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Emil Vladev</title>
		<link>http://bolddream.com/2010/01/22/getting-a-random-row-from-a-relational-database/comment-page-1/#comment-232</link>
		<dc:creator>Emil Vladev</dc:creator>
		<pubDate>Sun, 04 Apr 2010 19:38:57 +0000</pubDate>
		<guid isPermaLink="false">http://bolddream.com/?p=258#comment-232</guid>
		<description>@Sym: Correct, but if you have &quot;holes&quot; in the PKs some of the keys have higher chance to be picked up, so it&#039;s not completely random. Nevertheless, it&#039;s a good compromise.</description>
		<content:encoded><![CDATA[<p>@Sym: Correct, but if you have &#8220;holes&#8221; in the PKs some of the keys have higher chance to be picked up, so it&#8217;s not completely random. Nevertheless, it&#8217;s a good compromise.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sym Roe</title>
		<link>http://bolddream.com/2010/01/22/getting-a-random-row-from-a-relational-database/comment-page-1/#comment-231</link>
		<dc:creator>Sym Roe</dc:creator>
		<pubDate>Sun, 04 Apr 2010 19:10:29 +0000</pubDate>
		<guid isPermaLink="false">http://bolddream.com/?p=258#comment-231</guid>
		<description>Using COUNT wont work if you have a lot of deleted rows in the middle of the table.

I would grab the highest PK (SELECT pk/ORDER BY py ASC/LIMIT 1), then get a random number in that range, then get the offset row, like above.

That way you&#039;ll always get a row in the range of 0 and the highest PK.

Also, it would be faster in postgres :)</description>
		<content:encoded><![CDATA[<p>Using COUNT wont work if you have a lot of deleted rows in the middle of the table.</p>
<p>I would grab the highest PK (SELECT pk/ORDER BY py ASC/LIMIT 1), then get a random number in that range, then get the offset row, like above.</p>
<p>That way you&#8217;ll always get a row in the range of 0 and the highest PK.</p>
<p>Also, it would be faster in postgres :)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Saiyine</title>
		<link>http://bolddream.com/2010/01/22/getting-a-random-row-from-a-relational-database/comment-page-1/#comment-168</link>
		<dc:creator>Saiyine</dc:creator>
		<pubDate>Sun, 24 Jan 2010 23:50:21 +0000</pubDate>
		<guid isPermaLink="false">http://bolddream.com/?p=258#comment-168</guid>
		<description>I&#039;ve done the tests myself, and I can confirm your results.

Statistics on getting a random row from a table:
http://www.saiyine.com/post.901.php</description>
		<content:encoded><![CDATA[<p>I&#8217;ve done the tests myself, and I can confirm your results.</p>
<p>Statistics on getting a random row from a table:<br />
<a href="http://www.saiyine.com/post.901.php" rel="nofollow">http://www.saiyine.com/post.901.php</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Emil Vladev</title>
		<link>http://bolddream.com/2010/01/22/getting-a-random-row-from-a-relational-database/comment-page-1/#comment-164</link>
		<dc:creator>Emil Vladev</dc:creator>
		<pubDate>Sun, 24 Jan 2010 10:24:58 +0000</pubDate>
		<guid isPermaLink="false">http://bolddream.com/?p=258#comment-164</guid>
		<description>@TheQuux: Thanks for the clarification. About ORDER BY RAND() in PostgreSQL being the same - this is not right. ORDER BY RAND() will have to generate a second copy of the table, sort it and then return a record, while SELECT COUNT(*) will need to check the rows but that&#039;s about it. If you have a better method I would love to here it - this one is far from perfect.</description>
		<content:encoded><![CDATA[<p>@TheQuux: Thanks for the clarification. About ORDER BY RAND() in PostgreSQL being the same &#8211; this is not right. ORDER BY RAND() will have to generate a second copy of the table, sort it and then return a record, while SELECT COUNT(*) will need to check the rows but that&#8217;s about it. If you have a better method I would love to here it &#8211; this one is far from perfect.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: TheQuux</title>
		<link>http://bolddream.com/2010/01/22/getting-a-random-row-from-a-relational-database/comment-page-1/#comment-162</link>
		<dc:creator>TheQuux</dc:creator>
		<pubDate>Sun, 24 Jan 2010 06:05:42 +0000</pubDate>
		<guid isPermaLink="false">http://bolddream.com/?p=258#comment-162</guid>
		<description>@emil: Count(*) and Count(column) are actually different.

    SELECT COUNT(*) FROM table;

gives you the number of rows in table, whereas

    SELECT COUNT(column) FROM table;

is equivalent to:

     SELECT COUNT(*) FROM table WHERE column IS NOT NULL;

So, the first would be fast if your database keeps track of the number of rows in the table, whereas the second would need to consult an index or actually run a sequential scan over the table. Incidentally, figuring out how many rows are in the table with PostgreSQL takes right about as long as the initial &quot;select * from table order by rand() limit 1&quot;, so it doesn&#039;t leave you any better off.</description>
		<content:encoded><![CDATA[<p>@emil: Count(*) and Count(column) are actually different.</p>
<p>    SELECT COUNT(*) FROM table;</p>
<p>gives you the number of rows in table, whereas</p>
<p>    SELECT COUNT(column) FROM table;</p>
<p>is equivalent to:</p>
<p>     SELECT COUNT(*) FROM table WHERE column IS NOT NULL;</p>
<p>So, the first would be fast if your database keeps track of the number of rows in the table, whereas the second would need to consult an index or actually run a sequential scan over the table. Incidentally, figuring out how many rows are in the table with PostgreSQL takes right about as long as the initial &#8220;select * from table order by rand() limit 1&#8243;, so it doesn&#8217;t leave you any better off.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Emil Vladev</title>
		<link>http://bolddream.com/2010/01/22/getting-a-random-row-from-a-relational-database/comment-page-1/#comment-157</link>
		<dc:creator>Emil Vladev</dc:creator>
		<pubDate>Sat, 23 Jan 2010 20:23:22 +0000</pubDate>
		<guid isPermaLink="false">http://bolddream.com/?p=258#comment-157</guid>
		<description>@Travis: In my experience there is no real difference between both - databases usually optimize COUNT(*) (I think that in MyISAM the row number is stored with the table and is not computed)

@Saiyine: Even with one query the difference is evident.</description>
		<content:encoded><![CDATA[<p>@Travis: In my experience there is no real difference between both &#8211; databases usually optimize COUNT(*) (I think that in MyISAM the row number is stored with the table and is not computed)</p>
<p>@Saiyine: Even with one query the difference is evident.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Saiyine</title>
		<link>http://bolddream.com/2010/01/22/getting-a-random-row-from-a-relational-database/comment-page-1/#comment-156</link>
		<dc:creator>Saiyine</dc:creator>
		<pubDate>Sat, 23 Jan 2010 19:54:36 +0000</pubDate>
		<guid isPermaLink="false">http://bolddream.com/?p=258#comment-156</guid>
		<description>Don&#039;t you think a benchmark with just one query it&#039;s a bit naive? What about doing it with a thousand querys?</description>
		<content:encoded><![CDATA[<p>Don&#8217;t you think a benchmark with just one query it&#8217;s a bit naive? What about doing it with a thousand querys?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Travis L</title>
		<link>http://bolddream.com/2010/01/22/getting-a-random-row-from-a-relational-database/comment-page-1/#comment-155</link>
		<dc:creator>Travis L</dc:creator>
		<pubDate>Sat, 23 Jan 2010 19:32:12 +0000</pubDate>
		<guid isPermaLink="false">http://bolddream.com/?p=258#comment-155</guid>
		<description>In your solution, part 1, you do a SELECT COUNT(*) rather than SELECT COUNT( col_name )?  I&#039;ve always believed that COUNT(*) is slower than counting 1 column.</description>
		<content:encoded><![CDATA[<p>In your solution, part 1, you do a SELECT COUNT(*) rather than SELECT COUNT( col_name )?  I&#8217;ve always believed that COUNT(*) is slower than counting 1 column.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
