PDA

View Full Version : The string functions should use mb_ functions



petsagouris
07-30-2010, 04:19 PM
In functions/funcs.strings.php I see a lot of non-multibyte functions being used.
Maybe it would be wiser to use the Multibyte versions (http://php.net/manual/en/book.mbstring.php) of those functions for such operations ?

petsagouris
07-30-2010, 04:39 PM
I tested the truncate function with Unicode characters and was able to get the correct length for the description summary.

Please see the attached patch.
1234

Note: Please enable the ability to attach .patch and .diff files to the post in the forums ;)

Nick
07-30-2010, 10:51 PM
I recently removed the need for the mbstring extension because some web hosts don't have it enabled, e.g. HostGator. Do other CMSs check for the mbstring extension first, and fall back on non-mb functions if it's not enabled?

I'm reluctant to add mbstring functions back in if it's going to prevent people from installing Hotaru (like it did before). What do you suggest?


Note: Please enable the ability to attach .patch and .diff files to the post in the forums ;)

Done. :)

petsagouris
07-30-2010, 11:04 PM
I recently removed the need for the mbstring extension because some web hosts don't have it installed, e.g. HostGator. Do other CMSs check for the mbstring extension first, and fall back on non-mb functions if it's not installed?

I'm reluctant to add mbstring functions back in if it's going to prevent people from installing Hotaru (like it did before). What do you suggest?

Surprise (http://gr.php.net/manual/en/mbstring.overload.php)!
I did not expect this to turn out so well.... seriously I thought that I would find some function to do this in pure PHP.
Anyhow it seems that you can transparently make normal string functions via the mbstring php.ini settings (http://php.net/manual/en/mbstring.configuration.php) if the mbstring extension is loaded.

The only thing left now is to test for this kind of behavior to be working.

if(function_exists('mb_internal_encoding')){
mb_internal_encoding("UTF-8");
ini_set("mbstring.func_overload", 7);
}

edit: the ini_set can't set the func_overload, see here (http://bugs.php.net/bug.php?id=49235). The other option is to try doing it from the htaccess which is not a fail safe method.

edit2: It is really interesting to see how Dokuwiki is handling this (http://dev.splitbrain.org/reference/dokuwiki/nav.html?inc/utf8.php.source.html).

Nick
07-30-2010, 11:22 PM
Now that looks like a clean solution. The function_exists check should go in /libs/Initialize.php. Would you mind testing it, and if it works fine we can add it as a last minute addition to Hotaru 1.4?

petsagouris
07-30-2010, 11:25 PM
Now that looks like a clean solution. The function_exists check should go in /libs/Initialize.php. Would you mind testing it, and if it works fine we can add it as a last minute addition to Hotaru 1.4?
There are some shortcomings with this approach.
Please read the edits in the previous post.

edit: Also there are: php_utf8 (http://github.com/Xeoncross/php_utf8), phputf8 (http://github.com/FSX/php-utf8)

petsagouris
07-30-2010, 11:35 PM
If it was up to me I would use one of the libraries (dokuwiki, php_utf8, phputf8) as posted in the last messages and get on with it.
I can test it if you made a decision, but not tonight. I need to go to sleep. Goodnight from me :)

Nick
07-30-2010, 11:42 PM
Okay, we can probably use one of those 3rd party scripts, but it'll have to be for Hotaru 1.5.

I appreciate your input in this since it's not easy for me to work with different languages.

petsagouris
07-31-2010, 07:36 AM
I gave this some thought. Here is the results:

There are the following situations regarding mbstring extension.

If it is enabled, no issues come up.
If it is disabled, there are two possibilities:

the user is probably running an English-only site, in which case there is no serious consequences.
the user is running a site with a specific non utf8 encoding (for example japanese, Shift JIS), in which case god knows what will happen.



Only a set of functions is required at this point for Hotaru's funcs.string.php (strlen, substr, strpos, strtolower) to which we can provide fallback functions quite easily.

Maybe there are more places that utf8 strings need to be dealt with mbstring extension functions or their fallback functions, I need to dig the code more to find them out. I guess that there will be some ucfirst instances for example.

I will try to post some replacements for the functions mentioned above, later.

Nick
07-31-2010, 10:15 AM
Okay, thanks. There are a lot of places where string functions are used (nearly 100 plugins worth!), but focusing on the core and bookmarking plugin pack would be enough for now.

This will be a very positive step for Hotaru.

petsagouris
07-31-2010, 10:37 AM
Well for now the following patch will check if the mbstring extension is loaded and will make the truncate function operate accordingly.

1235

Nick
08-02-2010, 12:17 PM
I recently removed the need for the mbstring extension because some web hosts don't have it enabled, e.g. HostGator.

Sorry, this is a rather big mistake. It wasn't mbstring I removed, it was bcmath. HostGator don't allow bcmath on their shared servers. In fact, Hotaru already requires mbstring in the installation script, so if you don't have mbstring, you can't install Hotaru anyway! Sorry for wasting your time on this. We can go ahead converting all string functions to multibyte without worrying about whether mbstring is enabled or not. :o

petsagouris
08-02-2010, 01:36 PM
Sorry, this is a rather big mistake. It wasn't mbstring I removed, it was bcmath. HostGator don't allow bcmath on their shared servers. In fact, Hotaru already requires mbstring in the installation script, so if you don't have mbstring, you can't install Hotaru anyway! Sorry for wasting your time on this. We can go ahead converting all string functions to multibyte without worrying about whether mbstring is enabled or not. :o

No problem, this is great news.
Here you go with the patch for using mbstring, tested it too :)

1241

While searching for a utf8 php library to use I came across a dead project in sourceforge,
named PHP-UTF8m which has been revived but still in comma in github.com (http://github.com/FSX/php-utf8).
I have put some effort into it, with utter goal to propose its usage in the Hotaru CMS.
Hopefully we are bringing it into a good state in the following weeks.

At least a zombie project was revived, so there is a good part in this string of events. :)

Nick
08-02-2010, 01:44 PM
Haha, funny how things turn out.

About this patch... it's quite long. So far I've been manually copying your changes into files (since I don't use NetBeans), but this time it would be much quicker for me if you could attach the edited .php file itself. :) Thanks!

I've heard good things about NetBeans, but I paid for my text editor (EditPadPro) so am reluctant to give up on it! :o

Nick
08-02-2010, 02:34 PM
I'm afraid I was unable to apply thepatch, and the filedropper link to the php file wouldn't open. I've given you edit permissions on the SVN so if you could commit the changes of attach the .php file here, that would help a lot. Thanks.

Edit: Never mind, I downloaded NetBeans and applied the patch. :)