PHP/MySQL & the Real Time dilemma
Currently I’m working with stock market data, and its quite an interesting topic when we are getting to the point of real time data as it brings a number of new concepts into the mix. The first challenge is to import information from the feeds into our databases (MySQL), whilst this should be a relatively straight forward task, I’m sure we are going to hit issues in terms of writes to the database (INSERTS/UPDATES). The information from these feeds will be used for various tasks, that will require alot of processing. The information displayed to the user will be via the web, therefore we have to maintain updated stock market information dynamically to the user via the use of AJAX.
The concept of real time computing should ideally be under 1 millisecond, however I have previously worked for companies where their distinction of real time meant a 15 minute delay. Whilst delays over the web are inevitable I believe a one to three second delay would be acceptable for users to view current information via AJAX.
As we will be using the data from the stock market for multiple applications, we will need to replicate the data from MySQL, this will only add a further bottleneck in the application. Most notably performance with replication will become an issue because every slave still needs to execute the same write queries as the master. Whilst the majority of queries, will be writes over reads, this becomes a fundamental problem in itself, making replication questionable. So we will have to look at multi-master MySQL server setup, or MySQL cluster, which holds databases in memory. The fundamental problem with replication is ensuring the consistency of the data between writes once replicated. Ideally if a slave falls behind we want to ignore Updates, that have previously been issued and just use the current values to ensure we do not have stale data.
We will ideally have to create a heartbeat monitor and validate the latency of data between nodes. As mentioned previously we would want to ensure that all slaves do not fall behind, however any slave that did fall behind we would want to ensure that updates for stocks were only applied with the latest and the rest of the binary log is ignored. Additionally we would need to seperate inserts for historical data to be inserted based on a sample time (‘1 Min’,’15 Min’,’Hour’,’Midday’,’End Of Day’,’End Of Week’,’End Of Month’), ideally this would most benefically be horizontally scaled.
The website itself will have to use AJAX to dynamically update all stock prices and activity in the market that are applicable on that page. The fundamental issue is that the prices are updating in real time, how often do we create a http request that is with in reason on server resources? Looking at this further, we will have the bottleneck of TCP/IP connections, the clients bandwidth, ideally testing users bandwidth, and whether the client accepts gzip or compressed content to reduce bandwidth costs.
AJAX request every second, servers typically handles 200 requests per second
say 25 users online, 25*60 =1500 requests per minute or 2,160,000 p/d
say 100 users online, 100*60 =6000 requests per minute or 8,640,000 p/d
We could optionally increase the clients connection limit in internet explorer, with a registry key to increase the
2 connection limit standards from rfc 2216 for persistent connections to http 1.1 agents.
IE7 release does not increase this limit by default however this is more notable when a user downloads 2 files and IE waits for the connection to release before starting a 3rd download for example.
Windows Registry Editor Version 5.00 [HKEY_CURRENT_USERSoftwareMicrosoftWindowsCurrentVersionInternet Settings] "MaxConnectionsPerServer"=dword:00000010 "MaxConnectionsPer1_0Server"=dword:0000010