Jump to content

building Japanese websites with php and mysql

What a headache

  • (2 Pages)
  • +
  • 1
  • 2
  • You cannot start a new topic
  • This topic is locked

building Japanese websites with php and mysql What a headache Rate Topic: -----

#1 User is offline   haku 

  • 日本語 Ninja
  • Group: Members
  • Posts: 652
  • Joined: 21-September 07
  • Gender:Male
  • Location:Yokohama, Japan

Posted 13 November 2007 - 10:51 AM

Coding for Japanese websites - what a headache. I had no idea.

Here is my problem:

I can input Japanese data into my database, and I can extract that data and it displays no problem in my browser. But when I try to view that data with phpmyadmin, it comes out garbled. I can input data directly into the database with phpmyadmin, and when I view it in phpmyadmin, it looks fine. But if I try to call it with php, it comes out garbled.

So you would think that the problem is in the encoding of phpmyadmin - but when I look at my browsers encoding, it still doesnt work even when its the same. So I tried setting phpmyadmin to Japanese - no luck there either. Anyone have any ideas?
<a href="http://www.jaypan.com" target="_blank">Jaypan</a>
<a href="http://www.dudes-japan.com" target="_blank">Dudes Japan</a>
0

#2 User is offline   sypher 

  • the owner3r
  • Group: Administrators
  • Posts: 1,578
  • Joined: 04-April 06
  • Location:North Wales, UK
  • Interests:Art, Boxing, MMA, Graphic Design, Web Design etc. ;)

Posted 13 November 2007 - 11:23 AM

Hmm never had to design a japanese website before.

But take a look at your character set (like web radiance is
<meta http-equiv="content-type" content="text/html; charset=iso-8859-1" />
for western english) Make sure yours is set to Japanese.
sypher design - North Wales Web Design | Latest Work: - Scala Cinema

CSS - Can't See Sh*t
0

#3 User is offline   haku 

  • 日本語 Ninja
  • Group: Members
  • Posts: 652
  • Joined: 21-September 07
  • Gender:Male
  • Location:Yokohama, Japan

Posted 13 November 2007 - 11:50 AM

Thanks. I started with that though, and I don't think the problem is there.

From what I can tell, I have to take care of the encoding in each step - from the browser (user input) to php, from php to the database, from the database back to php, from php to the browser. But those steps seem to be working fine actually, as I am able to input Japanese characters into the database, and then output them to the screen later on. The problem is that I can't look in the database and see what that data is - to be able to see it I have to use php to output it to the browser.
<a href="http://www.jaypan.com" target="_blank">Jaypan</a>
<a href="http://www.dudes-japan.com" target="_blank">Dudes Japan</a>
0

#4 User is offline   sypher 

  • the owner3r
  • Group: Administrators
  • Posts: 1,578
  • Joined: 04-April 06
  • Location:North Wales, UK
  • Interests:Art, Boxing, MMA, Graphic Design, Web Design etc. ;)

Posted 13 November 2007 - 12:45 PM

Post the mysql code your using, (Export and show ur the table setups)
sypher design - North Wales Web Design | Latest Work: - Scala Cinema

CSS - Can't See Sh*t
0

#5 User is offline   haku 

  • 日本語 Ninja
  • Group: Members
  • Posts: 652
  • Joined: 21-September 07
  • Gender:Male
  • Location:Yokohama, Japan

Posted 15 November 2007 - 12:49 AM

I did it! I figured it out. It was difficult as there really isn't much information in English about this. I actually ended up doing a lot of searching around in Japanese before I was able to find clues to the direction I was going. I never found anything particularly comprehensive though.


In the end, this is what happens:

Insertion-
1) text is inputted into the browser in any encoding
2) php automatically detects the encoding of the text that has been inputted
3) php then converts the text to the internal encoding that PHP is set to (which I set to EUC-JP)
4) php then inserts the text into the database (I set the database to EUC-JP to match the php internal encoding)

Retrieval-
1) first I have to tell php what language encoding php can expect when getting text from the database
2) then, I use php to retrieve the data from the database
3) php then uses its mbstring functions and internally (as well as automatically) converts the text to Shift_JIS when it outputs it to the browser. note: the mbstring functions have to be enabled in the php.ini file for this to work.
4) The <meta> tag of the document is set to Shift_JIS.

There are all sorts of settings that have to be set up in the php.ini file, so if for some reason someone ever has to do this, PM me and I will tell you what settings have to be changed.

One thing to keep in mind is that I will have to insert a command telling the database what encoding to expect before EVERY query, unless I set the encoding for the whole database to EUC-JP. I may also have to convert text from non-EUC-JP columns of the database to Shift_JIS before outputting them to the browser. Not sure on that yet.

As for why its set this way:

utf-8: The problem seems to be that for kanji, utf-8 collocation doesnt work in mysql, and as such queries on utf-8 encoded tables in the database are unpredictable. So even though Japanese characters seem to display fine in utf-8, its better not to use utf-8.

euc-jp: PHP documentation strongly recommends not using Shift_JIS for the internal character encoding as its unpredictable, so thats why its better to use euc-jp. It also says that php internal character encoding and the database encoding have to be the same, which is why the database is also set to euc-jp

Shift_JIS: I found this quote on a site for foreigners living in Japan (site is not design related)

Why shift_jis?
Most people on this planet use Windows. When writing a document or text file in Windows the encoding is shift_jis. Programmers work with translators when making web applications. Translators are not using Linux nor Mac. They are simpliest of PC users and you will have to work with them to translate your language text files. Thus shift_jis.[quote]


It seem that Shift_JIS is used to target the largest audience. I don't fully get what the guy was saying here, but since I made it work I, may as well go with it just to be safe (unless someone can think of a reason not to). So since I cant use Shift-JIS internally with php, I convert it when outputting to the browser.

And thats that!

This post has been edited by haku: 10 December 2007 - 10:28 PM

<a href="http://www.jaypan.com" target="_blank">Jaypan</a>
<a href="http://www.dudes-japan.com" target="_blank">Dudes Japan</a>
0

#6 User is offline   Catalyst 

  • Codesmith
  • Group: Administrators
  • Posts: 1,049
  • Joined: 04-April 06
  • Gender:Male
  • Location:San Diego

Posted 15 November 2007 - 01:33 AM

Thanks for taking the time to write what you learned, I'm sure it'll be of use to someone.
0

#7 User is offline   haku 

  • 日本語 Ninja
  • Group: Members
  • Posts: 652
  • Joined: 21-September 07
  • Gender:Male
  • Location:Yokohama, Japan

Posted 15 November 2007 - 02:17 AM

I hope so!

Honestly, I imagine I will be running a deficit on this site insofar as I expect that people will give me more advice than I will be able to give, so when I have something to give Im more than willing to give it!
<a href="http://www.jaypan.com" target="_blank">Jaypan</a>
<a href="http://www.dudes-japan.com" target="_blank">Dudes Japan</a>
0

#8 User is offline   sypher 

  • the owner3r
  • Group: Administrators
  • Posts: 1,578
  • Joined: 04-April 06
  • Location:North Wales, UK
  • Interests:Art, Boxing, MMA, Graphic Design, Web Design etc. ;)

Posted 15 November 2007 - 12:15 PM

Nice one haku, glad you got it sorted :D
sypher design - North Wales Web Design | Latest Work: - Scala Cinema

CSS - Can't See Sh*t
0

#9 User is offline   haku 

  • 日本語 Ninja
  • Group: Members
  • Posts: 652
  • Joined: 21-September 07
  • Gender:Male
  • Location:Yokohama, Japan

Posted 21 November 2007 - 11:52 PM

Damn Japanese encoding sucks! So I thought I had my problem solved, right? If only it were that simple. It turns out that while I had taken care of the dynamic side of the language encoding, I had not taken care of the static side of the language encoding.

Today I'm at work, and I hard coded some Japanese into my code (I made a link with the Japanese text for delete in it). So what happens? It came out mojibake(gibberish). Frustrated the hell out of me! I couldn't figure out why it wasn't displaying after all that setup.

I finally got it to work by doing the following two things:

1) Before the doctype declaration, I had to set an XML doctype declaration (XHTML is a variation of XML). It looks like this:

<?xml version="1.0" encoding="Shift_JIS" ?>

But even that was difficult, because PHP tries to parse the <? as a php declaration rather than an xml declaration. So I had to output that declaration in php using this function:

<?php
echo "<?xml version=\"1.0\" encoding=\"Shift_JIS\" ?>" . PHP_EOL;
?>

(note PHP_EOL outputs a new line in the source code)

2) In dreamweaver, I actually had to go into the default, and set new documents to be encoded with EUC-JP by default. Even then, I actually had to copy all the info in my document, delete the document, and create a new document with the same name so that it was in EUC-JP encoding, then paste all the code back into that document.


It worked though! (Im a Japanese encoding ninja now!). I'm actually not sure that step one was necessary, as I did it before trying step 2, but I have read that its good to add an xml doctype declaration to XHTML documents as it helps them parse faster.


But now I think that this solves ALL my language encoding troubles. Dynamic text as well as static text has all been resolved!
<a href="http://www.jaypan.com" target="_blank">Jaypan</a>
<a href="http://www.dudes-japan.com" target="_blank">Dudes Japan</a>
0

#10 User is offline   haku 

  • 日本語 Ninja
  • Group: Members
  • Posts: 652
  • Joined: 21-September 07
  • Gender:Male
  • Location:Yokohama, Japan

Posted 10 December 2007 - 10:29 PM

Quote

But now I think that this solves ALL my language encoding troubles. Dynamic text as well as static text has all been resolved!


Famous last words! As it turns out, I hadn't actually solved all my language encoding troubles. I came across one more issue - mail! I have set up a script to send out confirmation mails to people when they register for the site, but the mails were also coming out in gibberish! Freakin encoding. Anyways, just to make this thread complete (even though Im the only one who cares, and maybe will be the only one who ever refers to it), I will add what I had to do to make the mails come through properly:

I am using the phpmailer plugin to send mails, as it helps prevent mails from being auto-filtered as junk mail. So after adding the phpmailer class to my documents (I wont go into this, as its not relevant to the problem, but you can find out more info about phpmailer here), I had to add the following lines of code:

$mail = new PHPMailer(); // required to create the phpmailer class. Not relevant to Japanese encoding though
$subject = "some subject;
$message = "some body";
$mail->Subject = mb_encode_mimeheader($subject, "ISO-2022-JP", "B", "\n");
$mail->Body= mb_convert_encoding($message, "ISO-2022-JP", "EUC-JP"); // note: EUC-JP is the encoding I am using internally, as mentioned in an earlier post. If you aren't using EUC-JP, then change that part accordingly


More code needs to be added to the phpmailer class to work (such as a 'to' address and a 'from' address etc), but as it is not relevant to the encoding issues, I have left it out. There are plenty of English tutorials on this on the net. Also, the mbstring library has to be enabled within php.ini (which I explained in an earlier post) for the mb_encode_mimeheader and mb_convert_encoding functions to work.

So maybe, thats the end of my encoding worries. I sure hope so! I have had more than enough of encoding. But I sure understand it well now! I have read hundreds of pages of different stuff to find out
<a href="http://www.jaypan.com" target="_blank">Jaypan</a>
<a href="http://www.dudes-japan.com" target="_blank">Dudes Japan</a>
0

#11 User is offline   sypher 

  • the owner3r
  • Group: Administrators
  • Posts: 1,578
  • Joined: 04-April 06
  • Location:North Wales, UK
  • Interests:Art, Boxing, MMA, Graphic Design, Web Design etc. ;)

Posted 10 December 2007 - 10:58 PM

This is actually very useful :D thanks for posting it!
sypher design - North Wales Web Design | Latest Work: - Scala Cinema

CSS - Can't See Sh*t
0

#12 User is offline   Catalyst 

  • Codesmith
  • Group: Administrators
  • Posts: 1,049
  • Joined: 04-April 06
  • Gender:Male
  • Location:San Diego

Posted 10 December 2007 - 11:52 PM

That'll be pure gold one of these days when I need to do the same thing.
0

#13 User is offline   Akashic 

  • W.R. Private
  • Group: Members
  • Posts: 2
  • Joined: 21-January 08

Posted 21 January 2008 - 11:55 PM

Hello all.

Very useful thread, indeed.
But I don't think I get the phpmyadmin part - or what should I do. Could you paste part of your code, please?

I'm doing simply export from excel to cvs, read the file and input it into database. Then list the data on a page and make xml file. Everything is going fine exept the phpmyadmin, which shows something like this: 祗園ÃÂ
®ãˆã¹ÃÂ
£ã•ã‚ââ
¬Å“

I'm doing everything in UTF8_general_ci - as I use other encoding, webpage doesn't display it popertly (WordPress).

In php there are no any headers, only code. Sth like this:

$csvcontent = file_get_contents("ap.csv");
$csvcontent = mb_convert_encoding($csvcontent, "UTF8", "auto");

According to haku's 'insertion':

$csvcontent = mb_convert_encoding($csvcontent, "EUC-JP", mb_detect_encoding($csvcontent));

and changing all in DB to eucjpms_japanese_ci also wasn't satifactionary.

Just in case:
SHOW VARIABLES LIKE "char%";
Variable_name Value
character_set_client utf8
character_set_connection utf8
character_set_database utf8
character_set_filesystem binary
character_set_results utf8
character_set_server utf8
character_set_system utf8
character_sets_dir /usr/local/mysql-src/share/mysql/charsets/

I spent so many time on this and totally run out of possibilities.
Any ideas?

Thank you in advance,
Akashic

This post has been edited by Akashic: 22 January 2008 - 02:55 AM

0

#14 User is offline   smoseley 

  • W.R. Private
  • Group: Members
  • Posts: 14
  • Joined: 15-January 08

Posted 22 January 2008 - 11:41 AM

If you're using Multibyte characters (unicode or other DBCS), don't use VARCHAR or TEXT fields in your database.

Instead, use NVARCHAR and NTEXT fields. This might solve your problems with data display.

This post has been edited by smoseley: 22 January 2008 - 11:42 AM

0

#15 User is offline   haku 

  • 日本語 Ninja
  • Group: Members
  • Posts: 652
  • Joined: 21-September 07
  • Gender:Male
  • Location:Yokohama, Japan

Posted 23 January 2008 - 12:20 AM

If the encoding is set correctly, varchar and text are fine (Im using them both).

First you need to set the encoding for any text-based columns in your database to UJIS

Next, you have to set your php.ini file (you won't be able to do it without changing that. Talk to your provider if you don't know how to make changes to it)

Here are the php.ini settings you have to set:

;; Set HTTP header charset
default_charset = Shift_JIS 

enable_dl		= On
extension=php_mbstring.dll
enable-mbstring
;; Set default language to Japanese
mbstring.language = Japanese

;; HTTP input encoding translation is enabled
mbstring.encoding_translation = On

;; Set HTTP input encoding conversion to auto
mbstring.http_input   =  auto

;; Convert HTTP output to Shift_JIS
mbstring.http_output  = Shift_JIS

;; Set internal encoding to EUC-JP
mbstring.internal_encoding = EUC-JP

;; Do not print invalid characters
mbstring.substitute_character = none

;; Set output_handler to perform multibyte conversion
output_handler	  = mb_output_handler


After changing your php.ini to look like this, if you use phpinfo(), it should look like this:

Posted Image

You also need to set the charset in the head of your document so it looks like this:

<meta http-equiv="Content-Type" content="text/html; charset=Shift_JIS" />


And finally, you need to save your actual php files with EUC-JP encoding. This part is VERY important - if you don't, the non-dynamic code on your page wont work.

Good luck!

p.s. Emails settings are even different, so if you need to know those, post in this thread again.

This post has been edited by haku: 23 January 2008 - 12:24 AM

<a href="http://www.jaypan.com" target="_blank">Jaypan</a>
<a href="http://www.dudes-japan.com" target="_blank">Dudes Japan</a>
0

#16 User is offline   sypher 

  • the owner3r
  • Group: Administrators
  • Posts: 1,578
  • Joined: 04-April 06
  • Location:North Wales, UK
  • Interests:Art, Boxing, MMA, Graphic Design, Web Design etc. ;)

Posted 23 January 2008 - 01:33 AM

Damn thats alot of work :D well done though, great guide for the future.
sypher design - North Wales Web Design | Latest Work: - Scala Cinema

CSS - Can't See Sh*t
0

#17 User is offline   haku 

  • 日本語 Ninja
  • Group: Members
  • Posts: 652
  • Joined: 21-September 07
  • Gender:Male
  • Location:Yokohama, Japan

Posted 23 January 2008 - 03:09 AM

No worries! If it can save someone some of the headaches that I had to go through, then I'm happy to help.

I had to figure that all out on my own. Well, with the help of the internet of course. But I didn't find any sites (in English) dedicated to working in PHP and Japanese, and there wasn't much more to be found in Japanese. I just kept reading EVERYTHING, and trying out anything. Fortunately, I found one post on one site by a guy who knew what he was talking about, but only explained what had to be done, not how to do it. At first it seemed like a bunch of gibberish, but it was a good base to start from, and the more I started to learn, the more I realized that he really did know what he was talking about.


After having figured out how to do all the encoding, I have started to check out how other Japanese sites have done their encoding, and from what I can tell only about 40% or so have done it right. Most of them have just managed to put together something that works, but is not ideal.

Japanese is not a language that was made to be digital, and its sure a pain in the ass to use! But I think I have all my coding woes solved (for now). My last huge challenge came a couple weeks ago when doing emails. I really wanted to use phpmailer so that mail sent from my databases doesn't get filtered by junkmail filters, but the language file enclosed in the phpmailer bundle doesn't properly encode the subject for all browsers. It showed up properly in some, not in others. After screwing around with it for hours and hours and reading more of everything on the net, I finally managed to put something together that properly encodes both the subject and the body in Outlook, Thunderbird, Hotmail and my cell phone. I had found solutions that worked in one or some of the email clients before that, but it took me forever to find a solution that worked for those four. Maybe it will fail in something else, but I figure I have pretty good market coverage with all those, and I'm hoping that since it works in all four of those (which each handle email differently), it will work in most other email clients as well.
<a href="http://www.jaypan.com" target="_blank">Jaypan</a>
<a href="http://www.dudes-japan.com" target="_blank">Dudes Japan</a>
0

#18 User is offline   Akashic 

  • W.R. Private
  • Group: Members
  • Posts: 2
  • Joined: 21-January 08

Posted 23 January 2008 - 04:36 AM

Hi.

I don't have access to the php.ini, but mbstring is almost the same (it's a Japanese sever):

mbstring.http_input auto auto
mbstring.http_output pass pass
mbstring.internal_encoding UTF-8 UTF-8
mbstring.language Japanese Japanese

This internal encoding is worrying me a little bit.

Additionally I have some kind of control panel to create new databases, where I can choose encoding type (it's set to EUC-JP by default), but 'show variables' returns that everything is in utf8 (as in my previous post).
Collation of the columns are "ujis_japanese_ci".

When I'm adding <meta http-equiv="Content-Type" content="text/html; charset=Shift_JIS" /> to the file and saving it as a EUC-JP meta is automatically changed to "charset=euc-jp" (in BBEdit).

I found out on Japanese forums, that there are some problems with older versions of MySQL, but that doesn't seems to be a case - mine specs are: MySQL ver: 5.1.20, PHP 5.2.3, phpMyAdmin 2.10.1

Also I don't have any problem at all with phpmail. Forms are being correctly sent to thunderbird and gmail, but while inputting to a database, they turning into mojibake ;)

I'm trying to connect all the things to the wordpress (which is doing pretty fine with inputting the Japanese texts into database), but there's too much code in it to analyze.


Ok, tomorrow is another day - maybe some ideas will come during the night.

Thanks,
Akashic
0

#19 User is offline   haku 

  • 日本語 Ninja
  • Group: Members
  • Posts: 652
  • Joined: 21-September 07
  • Gender:Male
  • Location:Yokohama, Japan

Posted 23 January 2008 - 06:11 AM

Find out from your provider how you can change the php.ini settings. That internal utf-8 encoding is going to cause you some trouble if you are going to make any fields that need to be searchable. Apparently somewhere in the mysql documentation, it says that collation of Japanese characters is not correct when using utf-8. This is why EUC-JP is important. I think that you can *maybe* override your php.ini with commands in your code, but I haven't tried it or looked into it that much. But if you really can't get at your php.ini, then that is probably your best bet.

Make sure you check your email on outlook - that (not surprisingly as its microsoft) was the one that gave me the most problems.

Also, you said you aren't having troubles with php mail, but I was using the phpmailer class (free download from sourceforge.net) as regular php mail is sometimes filtered by mail filters. phpmailer allows you to add smtp authentication so that it doesn't get filtered.
<a href="http://www.jaypan.com" target="_blank">Jaypan</a>
<a href="http://www.dudes-japan.com" target="_blank">Dudes Japan</a>
0

#20 User is offline   haku 

  • 日本語 Ninja
  • Group: Members
  • Posts: 652
  • Joined: 21-September 07
  • Gender:Male
  • Location:Yokohama, Japan

Posted 29 January 2008 - 12:27 AM

Just to add another level that has to be thought about, I have discovered that for any external javascript with Japanese text in it (which is most of my AJAX/DOM scripts), a charset has to be added to the tag as follows:

<script type="text/javascript" src="javascript/functions.js" charset="EUC-JP"></script>


I've actually come across one final encoding issue that I'm having troubles with. I am forcing downloads of jpegs and pdfs ect through a script that sets sets some PHP headers, but anytime the filename is Japanese it comes out garbled. I'm working on that one right now, I'll post it in here if I figure it out.

I figure this is going to be one of the most, if not the most, comprehensive threads on the internet about programming Japanese sites with PHP!
<a href="http://www.jaypan.com" target="_blank">Jaypan</a>
<a href="http://www.dudes-japan.com" target="_blank">Dudes Japan</a>
0

Share this topic:


  • (2 Pages)
  • +
  • 1
  • 2
  • You cannot start a new topic
  • This topic is locked

1 User(s) are reading this topic
0 members, 1 guests, 0 anonymous users