bug found while using caching and its log
Archived from the Xataface Users forum.
kevinwen — Wed Jan 06, 2010 6:30 pm
I found some bugs when turning on cache_queries and cache_queries_log:
When cache_queries=1, at line 66 in xataface/Dataface/DB.php:
$this->_fcache_base_path = DATAFACE_PATH.’/templates_c/query_results’;
It should be:
$this->_fcache_base_path = DATAFACE_PATH.’/Dataface/templates_c/query_results’;
since we never had a directory templates_c under xataface. For line 64, should we have Dataface directory under our application so it looks the same as xataface? Now it is:
$this->_fcache_base_path = DATAFACE_SITE_PATH.’/templates_c/query_results’;
Should we change to:
$this->_fcache_base_path = DATAFACE_SITE_PATH.’/Dataface/templates_c/query_results’;
??
When cache_queries_log=1, at line 316, 324, 337, 655 in DB.php and line 404 in OutputCache.php, the path to querylog.log and dump.sql have been hard-coded, which make it not work on Windows. Should we use a path under the DATAFACE_PATH so it’s compatible for all platforms?
shannah — Thu Jan 07, 2010 11:04 am
- Code: Select all
$this->_fcache_base_path = DATAFACE_SITE_PATH.'/Dataface/templates_c/query_results';
No. Your first change was correct as in Xataface the templates_c directory is inside the Dataface directory. But in the context of the site path, the templates_c directory is meant to be under the main app directory (not a subdirectory). Hence this line should stay unchanged.
When cache_queries_log=1, at line 316, 324, 337, 655 in DB.php and line 404 in OutputCache.php, the path to querylog.log and dump.sql have been hard-coded, which make it not work on Windows. Should we use a path under the DATAFACE_PATH so it’s compatible for all platforms?
Good thought. I think it is probably best to use the sys_get_temp_dir() function, but this isn’t available in PHP less than 5.2.1, so it would be necessary to implement it for older installs.
kevinwen — Thu Jan 07, 2010 1:24 pm
Good thought. I think it is probably best to use the sys_get_temp_dir() function, but this isn’t available in PHP less than 5.2.1, so it would be necessary to implement it for older installs.
Well, the log file is specific to the xataface and all of the log files may be grouped together for better organized. Not everyone likes to put log files into a temp directory. So saving it under DATAFACE_PATH is better. On linux (even in Mac), the log file may be a symbolic link to the actually log file under a centralized log directory.
shannah — Thu Jan 07, 2010 1:32 pm
good point.
Perhaps a subdirectory of templates_c e.g. templates_c/logs would be the best solution.
I haven’t really made it a point to tell people to cut off access to the templates_c directory in their Xataface installation. If they are storing log files in there, it would become especially important to do this though. (e.g. an .htaccess rule)
kevinwen — Thu Jan 07, 2010 4:07 pm
Could we create a directory just under DATAFACE_PATH like DATAFACE_PATH . ‘/logs’? templates_c is a place for the cached content and can be deleted without preventing the app from running. If we put logs under templates_c, it may cause problem when this directory is emptied.
shannah — Thu Jan 07, 2010 5:35 pm
I suppose this is the better solution. The only drawback is the increased amount of setup required, as the logs directory would need to be writable by the webserver. Currently templates_c is the only directory that needs to be writable. We would just need to make sure that Xataface just ignores it if the directory isn’t there or isn’t writable.
kevinwen — Fri Jan 08, 2010 12:09 am
Can we add a variable at the beginning of conf.ini, like cache_log=”/usr/logs/xataface”, so the developer is able to decide where to store the cache log?
shannah — Fri Jan 08, 2010 12:21 am
Along the same lines, a more general purpose setting for a logs directory to contain all types of logs that could occur.
Default location:
DATAFACE_SITE_PATH/logs
If the “logs” directive is set in the conf.ini file, the location could be overridden with another location.
kevinwen — Fri Jan 08, 2010 3:43 pm
I made some changes in DB.php and make the caching work. However, when I added a new record, the cached results come out, instead of the newer version of results as mentioned in this article:
Table Modification Times and Dependencies
In order to determine whether a result cache is current, Xataface checks the table modification times of the tables that are used in a query and compares them to the cached result modification time. If the cache is newer, then it uses the cache.
http://xataface.blogspot.com/2009/06/using-query-caching-in-xataface.html
Can somebody tell me what would go wrong? Thanks.
shannah — Fri Jan 08, 2010 3:56 pm
INNODB tables and Views don’t store their table modification times, so caching has not worked well in those situations. in SVN (and in the latest 1.2.3b2 release) I have added support for storing table modification times by adding a table to the database that stores these - but this solution isn’t perfect as the table only gets updated if records are inserted via xataface.
kevinwen — Fri Jan 08, 2010 6:39 pm
I see. I have an idea but not sure if it would work:
Create a db table(or in memcache) store the freshness of the cache, which will have 4 fields: Primary key, cache_id, involved_tables.
The SQL Parser should do the following for Insert/delte/update: search for this table and grep all cache_ids that have the corresponding tables updated, then delete those caches before caching the new results. This way, all the caches should be the latest. Would that work?
shannah — Fri Jan 08, 2010 7:04 pm
This is one possible solution… and it should be considered. If you are planning on hacking the code/making changes to Xataface, I suggest that you work with the latest in SVN. Are you familiar with subversion?
The solution that I have implemented (and should be available in SVN) is just to store the update time in a special table, then it compares the update time with the cache at the time that it serves the cache. It should achieve the same results - it just does the checking at different times. The table name is dataface__mtimes. You may want to check and see if this table has been created. This will indicate whether your version includes this fix. If the table does exist, and the cache still isn’t updating, then it is possible there is a bug and we’ll need to look at it more closely.
kevinwen — Sat Jan 09, 2010 12:22 am
Here’s what I got from MySQL site: http://dev.mysql.com/doc/refman/5.1/en/show-table-status.html
Update_time
When the data file was last updated. For some storage engines, this value is NULL. For example, InnoDB stores multiple tables in its tablespace and the data file timestamp does not apply. For MyISAM, the data file timestamp is used; however, on Windows the timestamp is not updated by updates so the value is inaccurate.
My app is running on Windows. It may be the cause. Is that possible to consider the solution I came up with? There is no time checking at all, but just flush all caches that are related to the tables being involved in the current insert/update/delete statement.
shannah — Sat Jan 09, 2010 12:28 am
Can you confirm whether your database has a dataface__mtimes table?
kevinwen — Mon Jan 11, 2010 10:43 am
Yes, dataface__mtimes tables exists, and it look like working on Windows, but I did some modification to specify the DATAFACE_QUERY_LOG in config.inc.php and use DATAFACE_QUERY_LOG in DB.php, so I can turn on the cache_query_log. Did you make that change as well? I need to grep the latest one. Thanks.
kevinwen — Fri Jan 08, 2010 6:39 pm
I see. I have an idea but not sure if it would work:
Create a db table(or in memcache) store the freshness of the cache, which will have 4 fields: Primary key, cache_id, involved_tables.
The SQL Parser should do the following for Insert/delte/update: search for this table and grep all cache_ids that have the corresponding tables updated, then delete those caches before caching the new results. This way, all the caches should be the latest. Would that work?
shannah — Fri Jan 08, 2010 7:04 pm
This is one possible solution… and it should be considered. If you are planning on hacking the code/making changes to Xataface, I suggest that you work with the latest in SVN. Are you familiar with subversion?
The solution that I have implemented (and should be available in SVN) is just to store the update time in a special table, then it compares the update time with the cache at the time that it serves the cache. It should achieve the same results - it just does the checking at different times. The table name is dataface__mtimes. You may want to check and see if this table has been created. This will indicate whether your version includes this fix. If the table does exist, and the cache still isn’t updating, then it is possible there is a bug and we’ll need to look at it more closely.
kevinwen — Sat Jan 09, 2010 12:22 am
Here’s what I got from MySQL site: http://dev.mysql.com/doc/refman/5.1/en/show-table-status.html
Update_time
When the data file was last updated. For some storage engines, this value is NULL. For example, InnoDB stores multiple tables in its tablespace and the data file timestamp does not apply. For MyISAM, the data file timestamp is used; however, on Windows the timestamp is not updated by updates so the value is inaccurate.
My app is running on Windows. It may be the cause. Is that possible to consider the solution I came up with? There is no time checking at all, but just flush all caches that are related to the tables being involved in the current insert/update/delete statement.
shannah — Sat Jan 09, 2010 12:28 am
Can you confirm whether your database has a dataface__mtimes table?
kevinwen — Mon Jan 11, 2010 10:43 am
Yes, dataface__mtimes tables exists, and it look like working on Windows, but I did some modification to specify the DATAFACE_QUERY_LOG in config.inc.php and use DATAFACE_QUERY_LOG in DB.php, so I can turn on the cache_query_log. Did you make that change as well? I need to grep the latest one. Thanks.
shannah — Mon Jan 11, 2010 11:59 am
Do you mean is doesn’t look like it is working on windows? Are there entries in the dataface__mtimes table or is it empty.. If non-empty, what do the entries look like. It should contain table names with unix timestamps.
kevinwen — Tue Jan 12, 2010 11:48 am
Steve,
I found a bug when inserting a new record with cache enabled. Here’s how I produce the bug:
- Delete all caches, if any.
- Turn on cache.
- Remember the total number in a table, e.g. ‘Found 1190 records’ in attachment.
- Use ‘new record’ in the attachment to create a new record.
- The ‘Found 1190 records’ remains same, but the record just created shows 1191 (process_id) as the primary key.
I think the problem is: the cache is updated only when a select statement on the table is examined, instead of updated on new record inserted. The solution I mentioned above will solve this problem.
Another big problem in your current solution is the cost. You can imagine that every select statement has to introduce another sql statement – show table status – as the price of using cache. Each query to database has to be authenticated and checked against privileges, which would take some time for huge volume of select statement. My solution will solve this problem as well. The workaround is to execute a simple query on each table (select 1 from table1, select 1 from table2, etc.) to force the cache to be updated.
There are 2 issuees I’m really concerned about against performance:
-
The use of views in xataface. I saw that there are a lot of views like that: dataface__view_processes_934276e660154d9f61793761035c273a. It would slow down the performance as well, since a result set for the view has to be generated first in order to get another result set from this result set. If cache is disabled or in some situation is not applicable using cache, querying on the view takes more time. Could we create a Materialized View (a term from Oracle and is actually a table storing data) to store result set for those views and then create a database TRIGGER to update these tables?
-
In the drop down list in the attachment, each number contributes a query. Could we use a aggregation table (like Materialized View) to store those number? In my applicaiton, there are 29 queries executed with cache disabled only for the ‘list all’ of the processes table (This table has 10 filters).
Let me know your thought.
Kevin
shannah — Tue Jan 12, 2010 12:28 pm
I think the problem is: the cache is updated only when a select statement on the table is examined, instead of updated on new record inserted. The solution I mentioned above will solve this problem.
I’ll look into this bug. Since the row count is produced by a select query, the cache should be refreshed also. What happens if you hit browser “refresh” on this page. Does the count correct itself? (i wonder if there could be a difference between the mysql time and the server time?)
Another big problem in your current solution is the cost. You can imagine that every select statement has to introduce another sql statement – show table status – as the price of using cache. Each query to database has to be authenticated and checked against privileges, which would take some time for huge volume of select statement. My solution will solve this problem as well. The workaround is to execute a simple query on each table (select 1 from table1, select 1 from table2, etc.) to force the cache to be updated.
The cost is an issue either way. The difference is that refreshing the cache at query time allows us to amortize the cost over all of the queries, and the maximum cost is O(1) per query (either it refreshes the cache for that single query or it doesn’t).
If we decide to clear the cache on update, we face a large buildup of queries whose cache needs to be cleared. On a busy site it is not out of the question that we would need to delete thousands or hundreds of thousands of cached queries all in one shot.
- The use of views in xataface. I saw that there are a lot of views like that: dataface__view_processes_934276e660154d9f61793761035c273a. It would slow down the performance as well, since a result set for the view has to be generated first in order to get another result set from this result set. If cache is disabled or in some situation is not applicable using cache, querying on the view takes more time. Could we create a Materialized View (a term from Oracle and is actually a table storing data) to store result set for those views and then create a database TRIGGER to update these tables?
Dataface uses views for something different than the query cache. These particular views are used to cache result sets that you specify in the __sql__ directive of a fields.ini file (i.e. you are overriding the select query for a table). Prior to using a view for this, I used subqueries. The performance between the two strategies is not even close - using a view is much, much faster. (it allows MySQL to use its own internal caching and optimization).
Using a table instead of a view to store these results would create a relatively complex system that would be difficult to debug, as it would create more places where we need to worry about clearing a cache. Using a view, we never have to clear the cache when the database is changed - only when the __sql__ directive’s query is changed. We let mysql take care of caching and optimization for this query.
- In the drop down list in the attachment, each number contributes a query. Could we use a aggregation table (like Materialized View) to store those number? In my applicaiton, there are 29 queries executed with cache disabled only for the ‘list all’ of the processes table (This table has 10 filters).
Actually each number doesn’t contribute a query. There is only one query per filter. The query uses a group by clause to generate these results. Of course as we get down to the nitty-gritty, we always want to ask “is there a way I can improve performance”. I fear that the added maintenance of a materialized view would outweigh the performance gain. These filters already benefit from query caching (when enabled) anyways…. and MySQl is uncanny for its ability to use its own internal caching and clever algorithms to churn out these types of queries quickly.
Thank you for taking the time to dissect the innards of Xataface. It can definitely benefit from more eyes.
shannah — Tue Jan 12, 2010 12:31 pm
On a side note, my personal experience with the query caching has been mixed. On databases with very complex __sql__ directives in the fields.ini file, or very complex queries with many joins and subqueries - the query caching offers a noticable performance gain.
Otherwise I have found that the performance gains are not tremendously noticeable.
For sites where there are few updates, but lots of traffic, the output cache is a must to use. It makes a huge difference to server load and site performance. (of course query caching probably helps out server load for busy sites also).
shannah — Tue Jan 12, 2010 12:56 pm
You can imagine that every select statement has to introduce another sql statement – show table status – as the price of using cache. ….
Each query to database has to be authenticated and checked against privileges, which would take some time for huge volume of select statement.
Missed this in my first pass. Actually, show table status is called once per request - so even if there are hundreds of SQL queries in a given request, show status is only called once. (Although it is marked to be recalled if any update or insert statements are executed).
Also, I’m not sure what you mean by “Each query to database has to be authenticated and checked against privileges”?
kevinwen — Tue Jan 12, 2010 1:21 pm
Thank you so much for your response, and I benefit from your comments a lot.
“Each query to database has to be authenticated and checked against privileges”
For each of query, MySQL will examine if the authenticated user has the show privilege to execute this statement. There are some other privileges as well, like SELECT, UPDATE, EXECUTE, GRANT OPTION, etc. If the user doesn’t have the privilege to execute whatever statement it is. reference: http://dev.mysql.com/doc/refman/5.1/en/request-access.html
5.4.5. Access Control, Stage 2: Request Verification
After you establish a connection, the server enters Stage 2 of access control. For each request that you issue via that connection, the server determines what operation you want to perform, then checks whether you have sufficient privileges to do so. This is where the privilege columns in the grant tables come into play. These privileges can come from any of the user, db, host, tables_priv, columns_priv, or procs_priv tables. (You may find it helpful to refer to Section 5.4.2, “Privilege System Grant Tables”, which lists the columns present in each of the grant tables.)
The authentication were done and is not an issue when the connection is established. I take it back.
kevinwen — Tue Jan 12, 2010 1:29 pm
Your comments:
I’ll look into this bug. Since the row count is produced by a select query, the cache should be refreshed also. What happens if you hit browser “refresh” on this page. Does the count correct itself? (i wonder if there could be a difference between the mysql time and the server time?)
Refreshing in browser doesn’t help.
kevinwen — Tue Jan 12, 2010 1:21 pm
Thank you so much for your response, and I benefit from your comments a lot.
“Each query to database has to be authenticated and checked against privileges”
For each of query, MySQL will examine if the authenticated user has the show privilege to execute this statement. There are some other privileges as well, like SELECT, UPDATE, EXECUTE, GRANT OPTION, etc. If the user doesn’t have the privilege to execute whatever statement it is. reference: http://dev.mysql.com/doc/refman/5.1/en/request-access.html
5.4.5. Access Control, Stage 2: Request Verification
After you establish a connection, the server enters Stage 2 of access control. For each request that you issue via that connection, the server determines what operation you want to perform, then checks whether you have sufficient privileges to do so. This is where the privilege columns in the grant tables come into play. These privileges can come from any of the user, db, host, tables_priv, columns_priv, or procs_priv tables. (You may find it helpful to refer to Section 5.4.2, “Privilege System Grant Tables”, which lists the columns present in each of the grant tables.)
The authentication were done and is not an issue when the connection is established. I take it back.
kevinwen — Tue Jan 12, 2010 1:29 pm
Your comments:
I’ll look into this bug. Since the row count is produced by a select query, the cache should be refreshed also. What happens if you hit browser “refresh” on this page. Does the count correct itself? (i wonder if there could be a difference between the mysql time and the server time?)
Refreshing in browser doesn’t help.