Archive for the 'coding' Category


Detect MIME type of most popular Internet files

This code can detect among others PDF, Microsoft Word, Excel, Powerpoint & Visio, without even knowing the file extension.


sub detectMime(\$)
  my ($s)=@_;
  return 'text/html' if $$s =~ /^(\xEF\xBB\xBF)*\s*<(\!--|\!?doctype|html|head|body|title|h1)/is;
  return 'text/xml' if $$s =~ /^(\xEF\xBB\xBF)*\s*<\?xml/s;
  return 'application/rtf' if $$s =~ /^\{\\rtf/s;
  my @b=unpack('C520',$$s);
  return 'image/jpeg' if $b[0]==0xFF && $b[1]==0xD8;
  return 'image/gif' if $b[0]==0x47 && $b[1]==0x49 && $b[2]==0x46;
  return 'image/png' if $b[0]==0x89 && $b[1]==0x50 && $b[2]==0x4E && $b[3]==0x47;
  return 'image/bmp' if $b[0]==0x42 && $b[1]==0x4D;
  return 'image/tiff' if $b[0]==0x49 && $b[1]==0x49 && $b[2]==0x2A;
  return 'application/pdf' if $b[0]==0x25 && $b[1]==0x50 && $b[2]==0x44 && $b[3]==0x46;
  return 'image/x-icon' if $b[0]==0 && $b[1]==0 && $b[2]==1 && $b[3]==0 && $b[4]==1;
  if ($b[0]==0xD0 && $b[1]==0xCF && $b[2]==0x11 && $b[3]==0xE0 && @b>0x200) {
    return 'application/msword' if $b[0x200]==0xEC;
    return 'application/' if $b[0x200]==0x09;
    return 'application/' if $b[0x200]==0x40;
    return 'application/vnd.visio' if $b[0x200]==0xFD;
  for (my $i=@b-1;$i>=0;$i--) { return 'application/octet-stream' if $b[$i]<32 && $b[$i]!=9 && $b[$i]!=10 && $b[$i]!=13 }
  return 'text/plain';

SDBM hash implementation in PHP

The SDBM hashing function is a simple and fast function that provides surprizingly uniform distributions of the hash value even when applied to a series of relatively short strings (3-7 characters). This makes it an excellent algorithm for organizing multiple files in sub-directories, for example, but the possible applications are, of course, endless.

Implementing the SDBM hashing in PHP is not an easy task, however. The SDBM hash function relies on a 32-bit overflow, which doesn’t work well in PHP due to its built-in overflow handling and automatic type conversion, which is also implemented differently on different platforms.

After much trial and error, the following cute piece of code has been found to perform correctly in PHP 5.2/5.3 on 32-bit as well as 64-bit systems.

function sdbmHash($str)
	$hash = 0; $n=strlen($str);
	for ($i=0; $i<$n; $i++) {
		$h1 = $hash << 6;
		if ($h1<0) $h1+=0x100000000;
		$h2 = $hash << 16;
		if ($h2<0) $h2+=0x100000000;
		$hash = (int)((int)ord($str[$i]) + $h1 + $h2 + $h3);
		if($hash<0) $hash=$hash+0x100000000;
	return $hash;

“MySQL server has gone away”

When you try to insert a BLOB that exceeds your server’s maximum packet size, even on a local server you will see “MySQL server has gone away” on the client side, and “Error 1153 Got a packet bigger than ‘max_allowed_packet’ bytes” in the server log. To fix this you need to decide what is the size of the largest BLOB you’ll ever insert, and set max_allowed_packet in my.ini accordingly, for example:

 max_allowed_packet = 200M



PHP vs. Ruby/JRuby on Rails vs. Grails vs. Java performance comparison

In this write-up I captured my findings about the performance of various frameworks that I was considering for my next project…

Test setup:

  • One MySQL 5.1 table consisting of an ID and 2 string columns, 1000 rows (+5 for warm-up)
  • a simple web application that:
    • reads one record from the table
    • displays record data on a web page in table form
The page is accessed sequentially 5 times to warm up the caches, then 1000 times (timed). The time is captured below.

Test subjects:

  1. PHP 5.3.3 on Apache 2.2.21
  2. Ruby 1.8.7 + Rails 3.1.3 on mongrel 1.1.5
  3. JRuby 1.6.5 (emulating Ruby 1.8.7) + Rails 3.1.3 on mongrel 1.1.5
  4. Grails 2.0.0 (Groovy 1.8.4) on Tomcat 7.0.16
  5. Java 1.7 + Spring 3 on Tomcat 7.0.23

Everything was set to ‘production mode’. Test platform: win32.

Test results:

Framework Time per request
PHP 10.6ms
Ruby/Rails 14.1ms
JRuby/Rails 16.0ms
Grails/Groovy 7.4ms
Java 6.4ms

Java seems to be a clear winner here… too bad it’s by far the slowest of the 5 to develop in! :-[]

p.s. I know my JRuby setup is awkward… I just couldn’t get any sane performance out of it on a Tomcat. The perf I got in this post is the best I could achieve.