Large Binary Data and Blob’s

In: MySQL|PHP

19 Oct 2006

recently read “Binaries Belong in the Database Too” on sitepoint.com, and thought I would shed some light with regard to my experience of storing files in databases. I’m sure many of you have known this to be a taboo practice, and I would certainly agree depending on the database. A project I worked on for MTV Networks Europe/International required a completely shared nothing architecture. This meant that MTV’s hosting & operations imposed that I stored files in the database, and expressed my hesitation.

The platform:

* Linux
* Apache
* MySQL
* PHP5

The problems

You typically get the common file upload problems with upload_max_filesize, max_input_time, execution time however you also have issues with mysql connections and max packet sizes, mysql chunked streams. Uploads via JUpload allows for large file uploads, however you still encounter TCP/IP connection interuptions and errors. Some of the more major issues I encountered were with the actual management of the data. Currently MySQL has no real support for handling Binary Large Objects, for example if you try to load data in from a file you generally encounter max packet size errors. Although the most fundamental issue is that the MySQL protocol does not send chunked streams for blobs and the client has to load the entire blob into memory. Admittedly memory limits on the server were not too much of an issue, as I was using a 8 CPU 16GB of ram server, however you may not have some of the infrastructure that I had available.

Whilst there were a number of limitations I had to resolve as described above, some of the expectations that I had not anticipated for were some user errors, such as trying to upload 4mb BMP files to be streamed as images for a website. Other factors were that hosting & operations had not expected their adsales department to attempt to upload 120+ mb video files.

DataTypes

Firstly lets look at some of the limitations on the BLOB datatype in MySQL, as you can see there are length limitations on blobs.

TINYBLOB
A BLOB column with a maximum length of 255 (2^8 - 1) bytes.

BLOB[(M)]
A BLOB column with a maximum length of 65,535 (2^16 – 1) bytes.
Beginning with MySQL 4.1, an optional length M can be given. MySQL will create the column as the smallest BLOB type largest enough to hold values M bytes long.

MEDIUMBLOB
A BLOB column with a maximum length of 16,777,215 (2^24 – 1) bytes.

LONGBLOB
A BLOB column with a maximum length of 4,294,967,295 or 4GB (2^32 – 1) bytes. Up to MySQL 3.23, the client/server protocol and MyISAM tables had a limit of 16MB per communication packet / table row. From MySQL 4.0, the maximum allowed length of LONGBLOB columns depends on the configured maximum packet size in the client/server protocol and available memory.

Alternatives for storing >4Gb BLOBs are:
* Compressing the BLOB so that it fits in 4Gb
* Splitting up the BLOB into 4Gb chunks as separate rows columns.

Tips:

Get Blob length
To find the length in bytes of a stored BLOB. Simply use: SELECT LENGTH(blobcolumn) FROM table.

Get Blob fragment
To retrieve large BLOBs by using repeatedly retrieving only fragments of a BLOB


using substring, ie:
SELECT SUBSTRING(document, 1, 10240) FROM documents WHERE did=3;
and then
SELECT SUBSTRING(document, 10241, 10240) FROM documents WHERE did=3;
etc.

Inserting Blobs
Inserting data into BLOBs. It has to be inserted in hex ie: ‘A’ = 0×41 and ‘AB’ = 0×4142 and so on. The prefix is a zero not a cap o.

If you want to insert binary data into a string column (such as a BLOB), the following characters must be represented by escape sequences:

NUL 	NUL byte (ASCII 0). Represent this character by '�' (a backslash followed by an ASCII '0' character).
 	Backslash (ASCII 92). Represent this character by '\'.
' 	Single quote (ASCII 39). Represent this character by '''.
" 	Double quote (ASCII 34). Represent this character by '"'.

When writing applications, any string that might contain any of these special characters must be properly escaped before the string is used as a data value in an SQL statement that is sent to the MySQL server, base64 encoding is a good option.

Indexing Blobs


Blobs can sometimes can be indexed, depending on the storage engine you’re using:
MyISAM, InnoDB, and BDB tables support BLOB and TEXT indexing. However, you must specify a prefix size to be used for the index.This avoids creating index entries that might be huge and thereby defeat any benefits to be gained by that index.The exception is that prefixes are not used for FULLTEXT indexes on TEXT columns. FULLTEXT searches are based on the entire content of the indexed columns, so any prefix you specify is ignored.
MEMORY tables do not support BLOB and TEXT indexes.This is because the MEMORY engine does not support BLOB or TEXT columns at all.

BLOB or TEXT columns may require special care:
Due to the typical large variation in the size of BLOB and TEXT values, tables containing them are subject to high rates of fragmentation if many deletes and updates are done. If you’re using a MyISAM table to store BLOB or TEXT values, you can run OPTIMIZE TABLE periodically to reduce fragmentation and maintain good performance.

The max_sort_length system variable influences BLOB and TEXT comparison and sorting operations. Only the first max_sort_length bytes of each value are used. (For TEXT columns that use a multi-byte character set, this means that comparisons might involve fewer than max_sort_length characters.) If this causes a problem with the default max_sort_length value of 1024, you might want to increase the value before performing comparisons. If you’re using very large values, you might need to configure the server to increase the value of the max_allowed_packet parameter. See Chapter 11,“General MySQL Administration,” for more information.You will also need to increase the packet size for any client that wants to use very large values.The mysql and mysqldump clients support setting this value directly using a startup option.

Solution

The solution ended up utilizing 2 memcached servers that cached blobs and objects between the MySQL server, this saved streaming the content directly from MySQL upon each request. Then selecting chunks of data from a binary large object and concatenating the results alleviates maximum packet errors from MySQL. The only other aspects to alleviate are the initial upload, this is entirely upto you, how this is implemented whether it is via JUpload, SCP, FTP, or some other means. Finally increase the above settings. To import / export binary files I wrote a script that queried the database writing out the files, by chunking the data, this script did take a while to execute.

I have heard that Oracle has very good support for handling Binary Large Objects… Maybe thats something to look into..

Pointers.

http://jeremy.zawodny.com/blog/archives/000078.html
http://jeremy.zawodny.com/blog/archives/000840.html
http://www.lentus.se/warehouse/SlidesDW.ppt
http://sunsite.mff.cuni.cz/MIRRORS/ftp.mysql.com/doc/en/BLOB.html

Comment Form

About this blog

I have been a developer for roughly 10 years and have worked with an extensive range of technologies. Whilst working for relatively small companies, I have worked with all aspects of the development life cycle, which has given me a broad and in-depth experience.