Development, Analysis And Research


Soap, XmlRpc and Rest with the Zend Framework

Posted in Db, General, PHP, XML by Andrew Johnstone on the February 20th, 2008

The Project

I was recently working on a project to expose our trading systems via XmlRpc, Rest and SOAP. It was quite an interesting project, which took two of us three weeks to develop (Amongst other things).

This involved creating a testbed, that would automatically generate the payload and response for each protocol. The parameters are introspected for each class method capturing each parameters data type, allowing for user input via standard html forms. This is probably best described with a picture or two.

Most of the documentation was generated via reflection and comments within the docblocks, parameters, notes were also generated making it quick and simple to update. In addition to parsing the start and end line of each method for any applicable error codes/faults that may be returned.

Rest API interface

XmlRpc API Interface - executed API method

Zend Framework

Using the Zend Framework for the first time in a commercial product was not exactly hassle free, and still has quite a few issues with its webservices implementation. Currently there seems to be quite a bit of confusion regarding its Rest implementation and whether it is to be merged, would be great if someone clarify this.

The main issue I found with the Zend Frameworks implementation of XmlRpc and Rest is that it assumes that the payload it receives is valid. During my development, I tended to mix the payloads from SOAP, XmlRpc and Rest, yet it would assume that simple_xml can parse the input.

For example $this->_sxml is assumed to be a valid object, if not you will either get invalid method call or an undefined index, which doesn’t render well for an xmlrpc server.

    /**
     * Constructor
     *
     * @param string $data XML Result
     * @return void
     */
    public function __construct($data)
    {
        $this->_sxml = simplexml_load_string($data);
    }

    /**
     * toString overload
     *
     * Be sure to only call this when the result is a single value!
     *
     * @return string
     */
    public function __toString()
    {
        if (!$this->getStatus()) {
            $message = $this->_sxml->xpath('//message');
            return (string) $message[0];
        } else {
            $result = $this->_sxml->xpath('//response');
            if (sizeof($result) > 1) {
                return (string) "An error occured.";
            } else {
                return (string) $result[0];
            }
        }
    }

One of the main issues with Rest was that it needed ksort when using the Rest client as the arguments were not necessarily passed in order. This can be “rest.php?method=x&arg1=1&arg0=0″ and it would interpret each arg in the order it received them. This should be sorted in the next release of the ZF.

As the webservices we are exposing needs to have quite good performance with the number of transactions it will be handling and the amount of reflection that Zend Server Reflection (Only noticed after I started profiling) performs and I wanted to optimize any overhead, which got me looking at Zend_XmlRpc_Server_Cache. First thing I did was profile Zend_XmlRpc_Server_Cache, which added a considerable amount of overhead. Looking at its implementation, it uses serialize, which is a relatively slow process and should be avoided, unless there is a large overhead in initializing objects. So most likely Zend_XmlRpc_Server_Cache will not add any benefit. And var_dump’ing out the reflection in XmlRpc spews out a shocking amount of information on some fairly large classes.

  if (!Zend_XmlRpc_Server_Cache::get($cacheFile, $server)) {

  }

Generating WSDL

I tried a number of WSDL generators including the implementation in incubator for ZF, which I found to be the best, yet I still had to write a large chunk of the WSDL by hand and adapt it.

The best way to debug is to run the soap client with verbose mode on, and it will typically tell you the issue straight away.

  • Zend_Soap_AutoDiscover: Duplicates an operation in WSDL for methods with parameters that are optional. (ZF-2642)
  • Zend_Soap_AutoDiscover: If missing the @return in your docblock the message response in the WSDL is not generated. (ZF-2643)
  • AutoDiscover duplicates response if using set class multiple times. (ZF-2641 )
  • One of my colleagues typically writes their docblocks with “@return int, comment.”, which the comma caused return types to be dropped with AutoDiscover, more of an issue with Zend Server Reflection

Other odd issues

Raw input bug

Some other obscurities I found was capturing the raw request data. In our local development environment reading the raw request input, and then once again within the Zend Frameworks appears to work fine. However in our pre-production environment it fails to read the second request to read the raw request. (PHP 5.2.2)


if (!isset($HTTP_RAW_POST_DATA)){
$HTTP_RAW_POST_DATA = file_get_contents('php://input');
}

It does seem a little odd that the XmlRpc does not check whether $HTTP_RAW_POST_DATA isset before attempting to re-read raw input.

Internal error: Wrong return type

Whilst running PHPUnit I noticed a very weird quirk in our local dev environment, which essentially did the following… You would expect this to output the contents of an array right? Well between the method call to x and return the result back to method y returns NULL. This is very obscure and i’ve never seen anything like it especially considering it is explicitly set. I had a number of colleagues check this, which had us all scratching our heads. Has anyone else seen anything similar to this?

class test {

  public function x() {
    $ret = array();
    for(...) {
      $ret[] = $row;
    }
    return $ret;
  }

  public function y() {
    $response = $this->x();
    var_dump($response);
  }
}

$t = new test();
$t->y();

Conclusion

Overall the project went pretty well, I’m confident it is now stable especially with the number of tests we ran against it. It is adaptable to other projects that we may need to expose via an API, in total there is about 6000 lines of code alone just testing the 3 different protocols it supports. I would have rather avoided the Rest implementation with ZF as it still needs a lot of work, however XmlRpc is a lot more stable and I would quite happily use again. As there is a lot of overhead with reflection it is not the fastest implementation and was contrasted to some of the heavier web pages we have for some simple functionality. It would be ideal to replace the reflection with something lighter such as an array with the corresponding methods, parameters and types, however I would have to look into that if performance did become a major issue.

PS. Just to note I used PHP’s in built soap server.

Technorati

Popularity: 31% [?]

MySQL & PHP Performance Optimization Tips

Posted in Db, General by Andrew Johnstone on the July 25th, 2007
, , , ,

In high performance web applications you will always have bottlenecks within your application. Identifying these bottlenecks and optimizing is a tedious task and typically show themselves underload. A single bad/unindexed query can bring a server to its knees. A large number of rows will also help to highlight any poor queries, and on very large datasets you may come to the point where you may have to make decisions whether to denormilize database schema.

Explain each page

Whilst I develope sites, I typically print out all queries, EXPLAIN each select statement at the bottom of each page, and highlight it red if its doing a full table scan, temp tables or a filesort. As well as displaying SHOWS INDEXES FROM TABLE…

Not only will it help you to optimize sites, you can also see bad logic and areas to optimize such as a query for each loop when looking through a users table for example.

MySQL indexing optimization

How do you identify where bottlenecks occur?

One of my favourite linux commands lately is the watch command. For Mac users you can get this from macports via “sudo port install watch”. Also a few other handy applications are mysqlreport, mytop.

# Appends file with processlist
watch -n1 “mysqladmin -uroot processlist >>watch.processlist.txt”

# Count the number of locked processes
watch -n1 “mysqladmin -uroot processlist | grep -i ‘lock’ | wc -l “;

# Count the number of processes sleep
watch -n1 “mysqladmin -uroot processlist | grep -i ’sleep’ | wc -l “;

# Run a specific query every second
watch -n1 “mysql -uadmin -p`cat /etc/psa/.psa.shadow` trade_engine –execute “SELECT NOW(),date_quote FROM sampleData WHERE 1=1 AND permission = ‘755′ AND  symbol=’IBZL’ GROUP BY date_quote;” ”

# Emails mysqlreport every 60 seconds
watch -n60 mysqlreport –all –email andrew@email.com

# Displays process list as well as appending the contents to a file
watch -n1 “mysqladmin -uadmin -p`cat /etc/psa/.psa.shadow` processlist | tee -a process.list.txt”

Watching the processlist is very handy in identifying locked, sleeping or sorting process states. If you have a large number of locked processes you typically should change the table type to INNODB, which supports row level locking. if you have a large number of sleeping connections, and you have persistent connections enabled, most likely indicates that connections are not being reused.

Running a specific query every second is exceptionally handy, the example I gave indicates whether one of our crons is correctly functioning and as each row is inserted you can watch something being either inserted or updated. mysqlreport gives numerous peices of information, extremely helpful in identifying issues, you can see more indepth at hackmysql.com/mysqlreportguide.

Look at the mysql slow query log and optimize each query starting with the most common, think whether you have to execute that query at all and use a cache such as memcached.

I also typically tend to look at the following:

  • vmstat -S M
  • ps axl | grep -i ‘mysql’
  • pstree –G
  • free –m

Reference:
http://dev.mysql.com/tech-resources/presentations/presentation-oscon2000-20000719/index.html

Technorati , , , ,

Popularity: 83% [?]

Large Binary Data and Blob’s

Posted in Db, General, PHP by Andrew Johnstone on the October 19th, 2006
, ,

I recently read “Binaries Belong in the Database Too” on sitepoint.com, and thought I would shed some light with regard to my experience of storing files in databases. I’m sure many of you have known this to be a taboo practice, and I would certainly agree depending on the database. A project I worked on for MTV Networks Europe/International required a completely shared nothing architecture. This meant that MTV’s hosting & operations imposed that I stored files in the database, and expressed my hesitation.

The platform:

* Linux
* Apache
* MySQL
* PHP5

The problems

You typically get the common file upload problems with upload_max_filesize, max_input_time, execution time however you also have issues with mysql connections and max packet sizes, mysql chunked streams. Uploads via JUpload allows for large file uploads, however you still encounter TCP/IP connection interuptions and errors. Some of the more major issues I encountered were with the actual management of the data. Currently MySQL has no real support for handling Binary Large Objects, for example if you try to load data in from a file you generally encounter max packet size errors. Although the most fundamental issue is that the MySQL protocol does not send chunked streams for blobs and the client has to load the entire blob into memory. Admittedly memory limits on the server were not too much of an issue, as I was using a 8 CPU 16GB of ram server, however you may not have some of the infrastructure that I had available.

Whilst there were a number of limitations I had to resolve as described above, some of the expectations that I had not anticipated for were some user errors, such as trying to upload 4mb BMP files to be streamed as images for a website. Other factors were that hosting & operations had not expected their adsales department to attempt to upload 120+ mb video files.

DataTypes

Firstly lets look at some of the limitations on the BLOB datatype in MySQL, as you can see there are length limitations on blobs.

TINYBLOB
A BLOB column with a maximum length of 255 (2^8 - 1) bytes.

BLOB[(M)]
A BLOB column with a maximum length of 65,535 (2^16 - 1) bytes.
Beginning with MySQL 4.1, an optional length M can be given. MySQL will create the column as the smallest BLOB type largest enough to hold values M bytes long.

MEDIUMBLOB
A BLOB column with a maximum length of 16,777,215 (2^24 - 1) bytes.

LONGBLOB
A BLOB column with a maximum length of 4,294,967,295 or 4GB (2^32 - 1) bytes. Up to MySQL 3.23, the client/server protocol and MyISAM tables had a limit of 16MB per communication packet / table row. From MySQL 4.0, the maximum allowed length of LONGBLOB columns depends on the configured maximum packet size in the client/server protocol and available memory.

Alternatives for storing >4Gb BLOBs are:
* Compressing the BLOB so that it fits in 4Gb
* Splitting up the BLOB into 4Gb chunks as separate rows columns.

Tips:

Get Blob length
To find the length in bytes of a stored BLOB. Simply use: SELECT LENGTH(blobcolumn) FROM table.

Get Blob fragment
To retrieve large BLOBs by using repeatedly retrieving only fragments of a BLOB


using substring, ie:
SELECT SUBSTRING(document, 1, 10240) FROM documents WHERE did=3;
and then
SELECT SUBSTRING(document, 10241, 10240) FROM documents WHERE did=3;
etc.

Inserting Blobs
Inserting data into BLOBs. It has to be inserted in hex ie: ‘A’ = 0×41 and ‘AB’ = 0×4142 and so on. The prefix is a zero not a cap o.

If you want to insert binary data into a string column (such as a BLOB), the following characters must be represented by escape sequences:

NUL 	NUL byte (ASCII 0). Represent this character by '' (a backslash followed by an ASCII '0' character).
 	Backslash (ASCII 92). Represent this character by '\'.
' 	Single quote (ASCII 39). Represent this character by '''.
" 	Double quote (ASCII 34). Represent this character by '"'.


When writing applications, any string that might contain any of these special characters must be properly escaped before the string is used as a data value in an SQL statement that is sent to the MySQL server, base64 encoding is a good option.

Indexing Blobs


Blobs can sometimes can be indexed, depending on the storage engine you’re using:
MyISAM, InnoDB, and BDB tables support BLOB and TEXT indexing. However, you must specify a prefix size to be used for the index.This avoids creating index entries that might be huge and thereby defeat any benefits to be gained by that index.The exception is that prefixes are not used for FULLTEXT indexes on TEXT columns. FULLTEXT searches are based on the entire content of the indexed columns, so any prefix you specify is ignored.
MEMORY tables do not support BLOB and TEXT indexes.This is because the MEMORY engine does not support BLOB or TEXT columns at all.

BLOB or TEXT columns may require special care:
Due to the typical large variation in the size of BLOB and TEXT values, tables containing them are subject to high rates of fragmentation if many deletes and updates are done. If you’re using a MyISAM table to store BLOB or TEXT values, you can run OPTIMIZE TABLE periodically to reduce fragmentation and maintain good performance.

The max_sort_length system variable influences BLOB and TEXT comparison and sorting operations. Only the first max_sort_length bytes of each value are used. (For TEXT columns that use a multi-byte character set, this means that comparisons might involve fewer than max_sort_length characters.) If this causes a problem with the default max_sort_length value of 1024, you might want to increase the value before performing comparisons. If you’re using very large values, you might need to configure the server to increase the value of the max_allowed_packet parameter. See Chapter 11,“General MySQL Administration,” for more information.You will also need to increase the packet size for any client that wants to use very large values.The mysql and mysqldump clients support setting this value directly using a startup option.

Solution

The solution ended up utilizing 2 memcached servers that cached blobs and objects between the MySQL server, this saved streaming the content directly from MySQL upon each request. Then selecting chunks of data from a binary large object and concatenating the results alleviates maximum packet errors from MySQL. The only other aspects to alleviate are the initial upload, this is entirely upto you, how this is implemented whether it is via JUpload, SCP, FTP, or some other means. Finally increase the above settings. To import / export binary files I wrote a script that queried the database writing out the files, by chunking the data, this script did take a while to execute.

I have heard that Oracle has very good support for handling Binary Large Objects… Maybe thats something to look into..

Pointers.

http://jeremy.zawodny.com/blog/archives/000078.html
http://jeremy.zawodny.com/blog/archives/000840.html
http://www.lentus.se/warehouse/SlidesDW.ppt
http://sunsite.mff.cuni.cz/MIRRORS/ftp.mysql.com/doc/en/BLOB.html

Technorati , ,

Popularity: 73% [?]

Exceptions, Exceptions, Exceptions

Posted in Db, PHP by Andrew Johnstone on the January 22nd, 2006
, , ,
Advanced PHP Programming, George SchlossnagleLately, i’ve been reading Advanced PHP Programming, by George Schlossnagle, which I must say is an excellent book. Below is an excerpt, which I find particularly interesting on the topic of error handling.

“Production Display of Errors

How to notify users of errors is often a political issue. All the large clients I have worked for have had strict rules regarding what to do when a user incurs an error. Business rules have ranged from display of a customized or themed error page to complex logic regarding display of some sort of cached version of the content they were looking for. From a business perspective, this makes complete sense: Your Web presence is your link to your customers, and any bugs in it can color their perceptions of your whole business.

Regardless of the exact content that needs to be returned to a user in case of an unexpected error, the last thing I usually want to show them is a mess of debugging information. Depending on the amount of information in your error messages, that could be a considerable disclosure of information. “, PHP Error Handling

Along with some brief discussions with Håvard Eide on some SQL Injection Attacks, I found in a large public site, the end result was simply terminating the execution, and displaying the error. E.g.

Note: The database is using MySQL 3. MySQL supports multiple result sets, from multiple statements, although this is a flag in MySQL 4.1+. So, when upgrading we have numerous secruity exploits, that has the potential to delete or drop the entire database. Also, with more rigorous permissions on the credentials used would also help alleviate such problems.

“If $userid is passed in, unvalidated, from the end user, a malicious user could pass in this:

$userid = “10; DELETE FROM users;”;

MySQL (like many other RDBMS systems) supports multiple queries inline, if this value is passed in unchecked, you will have lost your user’s table. This is just one of a number of variations on this sort of attack. The moral of the story is that you should always validate any data in queries.”, George Schlossnagle


mysql_query('') or die('MySQL Error: '.mysql_error());

Firstly, if there was a fatal error such as that, someone with a malicious intent can use the debugging information printed to acquire further knowledge and deduce other methods of attack. Also, from a usability point of view, why is the execution terminated? At least give something more meaningful or redirect to an errors page. Ultimately, a message such as that should never, ever, be shown on a live site.

More to the point of this post, displaying a cached version of the page is a great method, it gives the end user the content that they were looking for, and illudes the fundamental issue of deteriorating your users perception. Now, this entails a number of further questions to think about:

  • What if it is a dynamic service, such as a search?
  • Caching Policy and retention.

Whilst, dynamic services would be very difficult to cater for, and each case would be unique, in terms of search. Some possibilities, are to cache paged results, and apply some business logic on the retention of that cache (We don’t want to serve out-dated content), also state that the user is searching cached content and for any new terms, that haven’t been cached display an error page. This could become very convoluted, and in essence results in how critical the information is or the service!

You also don’t want to display partial pages where execution has ultimately terminated, this should be rather trivial with output buffering. However, it can be a rather gray area to associate, every possibility and as George Schlossnagle mentioned each company has different requirements.

It would be interesting to expand out and detail case examples.

I would be interested to understand better about one of the statements in the book…. “Exceptions expose the possibility of leaking memory.”, if anyone has more information on this, I would be very interested to hear further…

Further Information:

The addslashes() Versus mysql_real_escape_string() Debate
mysql_real_escape_string() versus Prepared Statements
Security Corner: SQL Injection

Technorati , , ,

Popularity: 5% [?]

MySQL 3

Posted in Db by Andrew Johnstone on the December 17th, 2005
,

Lately i’ve been developing with MySQL3, and it’s almost like hitting your head against a brick wall with its limitations and specific documentation is difficult to find.

Some of the limitations:

  • Subqueries. ’select id from t where grp in (select grp from g where u > 100)’
  • Add DISTINCT to Aggregates
  • JOINs with ambiguous columns causes a MySQL error.
  • Foreign Key Definition Ignored on MyISAM.
    • In MySQL Server 3.23.44 and up, the InnoDB storage engine supports checking of foreign key constraints, including CASCADE, ON DELETE, and ON UPDATE. See Section 14.2.6.4, “FOREIGN KEY Constraints”.

      For storage engines other than InnoDB, MySQL Server parses the FOREIGN KEY syntax in CREATE TABLE statements, but does not use or store it. In the future, the implementation will be extended to store this information in the table specification file so that it may be retrieved by mysqldump and ODBC. At a later stage, foreign key constraints will be implemented for MyISAM tables as well.

  • Function CASE
  • Add a handler interface for table information so one can use it as a system table. This would be a bit slow if one requested information about all tables, but very flexible. One should also implement ’show info from table’ for basic table information.
  • Reordering columns. E.g. ALTER TABLE FormObjects MODIFY COLUMN FormFieldRuleID INT(11) AFTER FOLID;
  • JOINS on UPDATE’s.

Other limitations

  • Don’t add automatic DEFAULT values to columns. Give an error when using an INSERT that doesn’t contain a column that doesn’t have an DEFAULT.
  • Multiple SQL in a query commands
  • LOCK DATABASES.
  • NATURAL JOIN
Technorati ,

Popularity: 3% [?]

MS Access & a Handy Little App

Posted in Db, General by Andrew Johnstone on the July 11th, 2005
, , ,

Theres always something to learn, and a couple things i’ve learnt recently.

MS Access does not support multiple querys as in the following…

ID=Request.Form(TableKeyIDName)

SQL = ""
SQL = SQL & "INSERT INTO MetaData            " & VBCRLF
SQL = SQL & " (Fid, IndexKey, IndexValue)    " & VBCRLF
SQL = SQL & "VALUES                          " & VBCRLF

For i=1 To Request.Form("AssociateDataKey").Count
   If (Request.Form("AssociateDataKey") <> EMPTY) Then
    SQL = SQL & " (                       " & VBCRLF
    SQL = SQL & "    " & CInt(ID) & ",    " & VBCRLF
    SQL = SQL & "    '" & Request.Form("AssociateDataKey")(i) & "',     " & VBCRLF
    SQL = SQL & "    '" & Request.Form("AssociateDataValue")(i) & "'    " & VBCRLF
    SQL = SQL & " )                       "
    If (Request.Form("AssociateDataKey").Count <> i) Then
       SQL = SQL & ", " & VBCRLF
    End If
   End If
Next

The Query:

INSERT INTO MetaData
    (Fid, IndexKey, IndexValue)
VALUES
    ( 18, 'a', 'a' ),
    ( 18, 'b', 'b' ),
    ( 18, 'c', 'c' ),
    ( 18, 'd', 'd' ),
    ( 18, 'e', 'e' );

The Error:

Error Type:
Microsoft JET Database Engine (0×80040E14)
Missing semicolon ( at end of SQL statement.
/admin/example/Submit.asp, line 84


The wildcard for MS Access is ‘*’ not ‘%’.

where (
        A.TITLE like    '*criteria1*'
     or A.SUBTITLE like '*criteria1*'
     or A.ABSTRACT like '*criteria1*'
     or A.CONTENT like  '*criteria1*'
        )
    and (
        A.category = 937
     or A.date_added >= '2005-01-01'
        )

and a Handy App…

Also, i’d just like to thank Eric for pointing out a nice app to mount ISO’s; a very handy little application.

Technorati , , ,

Popularity: 2% [?]