Package home | Report new bug | New search | Development Roadmap Status: Open | Feedback | All | Closed Since Version 2.5.0b5

Request #4666 Different database charsets
Submitted: 2005-06-23 08:31 UTC
From: dan at yes dot lt Assigned: quipo
Status: Closed Package: MDB2
PHP Version: Irrelevant OS:
Roadmaps: (Not assigned)    
Subscription  


 [2005-06-23 08:31 UTC] dan at yes dot lt
Description: ------------ Now when we are working with databases - all queries and results must be on the same charset as the current php charset is. If current charset is different from the current one, then we must convert all queries and results manualy. So, how about to implement additional URI parameter "charset" (or "encoding") and to do automatically charset conversion. I.e. we will be able to work with database using ISO-8859-13 (baltic) when our internal charset is UTF-8 just by adding &charset=ISO-8859-13 to our database uri. Below is my working implementation of this functionality: function _convertQuery($query) { if (! empty($this->dsn['charset'])) { $database = $this->dsn['charset']; if (extension_loaded('mbstring')) { // Working with "MultiByte String" extension $internal = mb_internal_encoding(); // Checking if database charset is different from internal charset if (strcasecmp($database, $internal)) { // Converting query return mb_convert_encoding($query, $database, $internal); } } elseif (extension_loaded('iconv')) { // Working with "Iconv" extension $internal = iconv_get_encoding('internal_encoding'); // Checking if database charset is different from internal charset if (strcasecmp($database, $internal)) { // Converting query return iconv($internal, $database, $query); } } else { $this->raiseError('Charset conversion not supported', 0, PEAR_ERROR_DIE); } } return $query; } function simpleQuery($query) { // prepare query ... $query = $this->_convertQuery($query); ... // execute query } function _convertResult(&$arr) { if (! empty($this->dsn['charset'])) { $database = $this->dsn['charset']; if (extension_loaded('mbstring')) { // Working with "MultiByte String" extension $internal = mb_internal_encoding(); // Checking if database charset is different from internal charset if (strcasecmp($database, $internal)) { // Converting result foreach ($arr as $key => $val) { $arr[$key] = mb_convert_encoding($val, $internal, $database); } } } elseif (extension_loaded('iconv')) { // Working with "Iconv" extension $internal = iconv_get_encoding('internal_encoding'); // Checking if database charset is different from internal charset if (strcasecmp($database, $internal)) { // Converting result foreach ($arr as $key => $val) { $arr[$key] = iconv($database, $internal, $val); } } } else { $this->raiseError('Charset conversion not supported', 0, PEAR_ERROR_DIE); } } } function fetchInto($result, &$arr, $fetchmode, $rownum = null) { // fetch $arr ... $this->_convertResult($arr); ... // apply portability }

Comments

 [2005-08-11 20:47 UTC] pear dot php dot net at chsc dot dk
I believe mysqli_set_character_name() does all this for you, if you are using mysqli. But your code is necessary for drivers like mysql and others that does not support this natively. BTW I don't think it should be part of the DSN but rather an option, because the character set is only relevant within the application as opposed to the DSN that is supplied from outside the application.
 [2005-08-11 20:55 UTC] pear dot php dot net at chsc dot dk
Sorry, I meant mysqli_set_charset(). The mysql case can probably be solved using $DB->query("SET NAMES utf8") or similar - see http://dev.mysql.com/doc/mysql/en/charset-connection.html But of course it would be nice if all this were standardized in the DB library.
 [2005-08-11 21:11 UTC] dan at yes dot lt
yes, it would be nice if all this were standardized in the DB library. yes, charset may be set as option. also there may be default methods for databases with no charset support, and overriden methods for specified databases (ie. mysql, mysqli). ps. thanks for mysql charset queries :) but our software must be db independent (today it works on mysql, mssql, pgsql and oracle), so we need that db library feature :)
 [2005-10-18 10:15 UTC] cryptographite at comcast dot net
Adding a DSN option for 'set names' would be wonderful. Sometimes doing a query() just isn't possible (such as when DB is being loaded by another PEAR module and you don't have direct query access to the connection).
 [2005-10-18 10:18 UTC] lsmith
I suggest you guys sit down and work out how this is handled in different RDBMS .. maybe collect a few ressources etc. If you want you can do this inside this bug report or instead use the MDB2 wiki: http://oss.backendmedia.com/MDB2/
 [2005-11-14 23:18 UTC] art at siit dot net
$DB->query("SET CHARACTER SET 'utf8'"); is required, in order to convert the results. I have added an entry about supporting input/result charset to MDB2's todo list at http://oss.backendmedia.com/MDB2/ToDo Currently, I found info about converting charset in two DBMS: - MySQL (SET NAMES charset, SET CHARACTER SET 'charset') - SQLite (PRAGMA encoding="charset"; , not really a 'conversion', only has effect for new table creation) PostgreSQL also supports UTF-8 but I don't how to set it, as well as other DMBSs.
 [2005-11-14 23:30 UTC] art at siit dot net
Note, that $DB->query("SET CHARACTER SET 'utf8'"); is only for MySQL. (just for an example). for the MDB2 API, may be we can have something like: $DB->setCharset("utf-8"); .. or should we set the charset at the result / datatype ?
 [2005-11-15 11:22 UTC] lsmith
The information has been moved to its own page: http://oss.backendmedia.com/MDB2/CharacterSet Lets collect more information here, so that we can come up with a truely portable API.
 [2005-11-16 11:15 UTC] art at siit dot net
Add more info to http://oss.backendmedia.com/MDB2/CharacterSet It seems like we have at least 5 different types of charset to deal with: client, connection, database, table, and results (may not be settable in every dbms). Please check it out + comment :)
 [2006-03-07 17:45 UTC] lsmith
I have relabled this as an MDB2 bug as it makes no sense to add this feature to DB at this stage of development.
 [2006-03-12 09:59 UTC] lsmith
We currently have a "charset" setting in the DSN to set the connection charset. I propose adding a "client_charset" to set the client charset. What should happen if the given setting is not settable in the chosen backend? I guess it should simply error out?
 [2006-03-12 14:52 UTC] lsmith
Ok I have implemented things: client and connection: - mysql - mysqli - oracle (client only through setting an env variable) client: - pgsql connection: - ibase support for the schema management code is still missing ..
 [2006-03-13 08:10 UTC] lsmith
- removed support for setting the connection charset, since i realized this is really just a mysql specific setting and all the other rdbms really only support setting the client charset
 [2006-06-30 21:19 UTC] tokul at users dot sourceforge dot net (Tomas Kuliavas)
Include library version information when you add new options. I think DSN charset option was introduced in MDB2 v2.1.0 and your docs does not mention that. Only short notice in changelog about added SetCharset function.
 [2006-06-30 21:20 UTC] lsmith (Lukas Smith)
no this option was available since MDB2 became stable. i just expanded its support .. but yeah i should probably start adding @since tags. patches welcome ..
 [2007-01-11 20:57 UTC] quipo (Lorenzo Alberton)
Does the new setCharset() method suit your needs?
 [2007-03-12 13:53 UTC] quipo (Lorenzo Alberton)
If the current solution isn't satisfactory, please reopen the bug report.