Monday, September 08, 2008

array_intersect_key is terrible

I always had a hunch that array_intersect_key was slow in PHP, but I didn't quite realize how slow until tonight when I decided to benchmark it. Here's a quick test script:


$big = array();
for ($i = 0; $i < 10000; $i++) {
$big[$i] = 234;
}

$small = array();
for ($i = 0; $i < 10; $i++) {
$big[$i] = 2435;
}

////////////////////////////////////////////////////////////

print "Testing big first\n";

$st = microtime(true);
for ($i = 0; $i < 100; $i++) {
$junk = array_intersect_key($big, $small);
}
$bigfirst = microtime(true) - $st;

////////////////////////////////////////////////////////////

print "Testing small first\n";

$st = microtime(true);
for ($i = 0; $i < 100; $i++) {
$junk = array_intersect_key($small, $big);
}
$smallfirst = microtime(true) - $st;


print "array_intersect_key\n";
print "========================================\n";
print "Big first: $bigfirst\n";
print "Small first: $smallfirst\n";

////////////////////////////////////////////////////////////

function common_keys($a, $b) {
$res = array();
foreach ($a as $key => $val) {
if (isset($b[$key]))
$res[] = $key;
}
return $res;
}

print "Testing big first\n";

$st = microtime(true);
for ($i = 0; $i < 100; $i++) {
$junk = common_keys($big, $small);
}
$bigfirst = microtime(true) - $st;

////////////////////////////////////////////////////////////

print "Testing small first\n";

$st = microtime(true);
for ($i = 0; $i < 100; $i++) {
$junk = common_keys($small, $big);
}
$smallfirst = microtime(true) - $st;

////////////////////////////////////////////////////////////

print "common_keys\n";
print "========================================\n";
print "Big first: $bigfirst\n";
print "Small first: $smallfirst\n";
?>


And the results:


Testing big first
Testing small first
array_intersect_key
========================================
Big first: 13.1110720634
Small first: 12.5442099571
Testing big first
Testing small first
common_keys
========================================
Big first: 0.363513946533
Small first: 0.000243902206421


By implementing this in pure PHP you get a 50000x speedup over using the built-in in some cases. Wow...

Update: this appears to be fixed in PHP 5.2.5. See the Release Notes.

1 comment:

Joel Bradshaw said...

This has been fixed in 5.2.5:
http://www.php.net/releases/5_2_5.php
As verified by my testing (my localhost, at 5.2.4, is still slow):
http://quotebook.llynix.com/testak.php
But the fact that it was terrible in the first place, and was only recently fixed, means that your point still stands :P