Do the File Type Encoding for your doc AND the Content/charset meta tag need to be set to the same?
Whether you're a seasoned veteran or a struggling beginner, Web Radiance is the web development and web design forum for you. You'll find answers to all your HTML, CSS, SEO, and Programming needs. Pull up a chair and stay awhile.
Do the File Type Encoding for your doc AND the Content/charset meta tag need to be set to the same?
#1
Posted 24 March 2008 - 08:23 PM
<meta http-equiv="Content-Type" content="text/html; charset=Shift_JIS" />
Shift_JIS is a requirement for displaying on cellphones in Japan but it doesnt work with PHP. Go figure.
#2
Posted 25 March 2008 - 05:09 AM
#3
Posted 25 March 2008 - 01:45 PM
<?xml version="1.0" encoding="UTF-8"?>
...or similar at the start of the document will cause IE to show the page's source code when calling up the file. (Don't know about newest IE version though.)
TextMate/W3C sees my pages valid though and seems like it's your choice to either use that or...
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
(with the attribute values changed to whatever you need)
#4
Posted 25 March 2008 - 06:40 PM
Or the short answer to your question: yes.
BUT, if its a hardcoded HTML document and not a php document, then you cant do that. This will only work with php documents.
edit: I wrote that while still drinking my coffee after just waking up. I'm going to try this again so I can say it a little clearer:
The answer to your question is yes, you can do that. However, in order to be able to do that, you need to have your php.ini settings set the correct way. As you said, PHP doesn't work very well with Shift_JIS, so you shouldn't use Shift_JIS as the internal encoding. You can use EUC-JP instead, as PHP works well with that. Then you set the output endocing to Shift-JIS. In doing this, PHP does everything in EUC-JP and then at the very end out re-encodes the output as Shift_JIS. So if you have your php.ini set the way I just described above, not only CAN you do what you are asking, but rather you have to, or things won't display correctly. Here is the reason why:
The meta tag just tells the server what charset to expect in the data that follows. This is why it has to be encoded in the head of the document*. However, the document itself is processed in PHP. So the actual encoding of the document has to be saved in a charset that PHP will understand - which means it has to be saved in the same charset as the internal encoding of PHP is set to. So PHP reads the document in EUC-JP, and outputs it in Shift_JIS, including the meta tag, which is then sent to the server, telling the server to expect Shift_JIS, in which the rest of the data is encoded.
You know, I don't know that I explained that any clearer than the first time. Need more coffee!
*I was actually reading somewhere recently that servers read through a document until they find the expected charset, then go back and start again at the top of the document and read through everything once. The article I was reading said that as a result of this, the charset meta tag should be the very first thing in the head of your document, even before the title.
This post has been edited by haku: 25 March 2008 - 08:16 PM
<a href="http://www.dudes-japan.com" target="_blank">Dudes Japan</a>
#5
Posted 26 March 2008 - 04:37 AM
haku, on Mar 26 2008, 08:40 AM, said:
Or the short answer to your question: yes.
BUT, if its a hardcoded HTML document and not a php document, then you cant do that. This will only work with php documents.
edit: I wrote that while still drinking my coffee after just waking up. I'm going to try this again so I can say it a little clearer:
The answer to your question is yes, you can do that. However, in order to be able to do that, you need to have your php.ini settings set the correct way. As you said, PHP doesn't work very well with Shift_JIS, so you shouldn't use Shift_JIS as the internal encoding. You can use EUC-JP instead, as PHP works well with that. Then you set the output endocing to Shift-JIS. In doing this, PHP does everything in EUC-JP and then at the very end out re-encodes the output as Shift_JIS. So if you have your php.ini set the way I just described above, not only CAN you do what you are asking, but rather you have to, or things won't display correctly. Here is the reason why:
The meta tag just tells the server what charset to expect in the data that follows. This is why it has to be encoded in the head of the document*. However, the document itself is processed in PHP. So the actual encoding of the document has to be saved in a charset that PHP will understand - which means it has to be saved in the same charset as the internal encoding of PHP is set to. So PHP reads the document in EUC-JP, and outputs it in Shift_JIS, including the meta tag, which is then sent to the server, telling the server to expect Shift_JIS, in which the rest of the data is encoded.
You know, I don't know that I explained that any clearer than the first time. Need more coffee!
*I was actually reading somewhere recently that servers read through a document until they find the expected charset, then go back and start again at the top of the document and read through everything once. The article I was reading said that as a result of this, the charset meta tag should be the very first thing in the head of your document, even before the title.
Hello Haku
You seem to be the resident expert on Japanese coding so I hope I can learn a lot from you.
I think I follow what you say.
Basically for PHP
1) Document saved as EUC-JP
2) PHP internal string options set as EUC-JP
3) charset in the document set to Shift_JIS
For regular XHTML documents.
1) Must be saved as EUC-JP and have the charset in the document as EUC-JP
OR
2) Saved as Shift_JIS and have the charset in the document as Shift_JIS.
I hope thats right.
#6
Posted 26 March 2008 - 07:10 AM
You get 100 yen
<a href="http://www.dudes-japan.com" target="_blank">Dudes Japan</a>
#7
Posted 26 March 2008 - 06:30 PM
Follow up question, what other language specifiers are needed in your document?
For example some pages have one or both of the following.
1.
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="ja" lang="ja">
[Of course the first part of this statement is a requirement but I`m not sure about the end.
Maybe its only relevent for xml driven applications like AJAX? In which case it might be
good to get in the habit of including it anyway so we dont encounter problems if such
functionality is added down the track?
2.
<meta http-equiv="Content-Language" content="ja" />
Some sources say that this tag doesnt really achieve anything.
I`m wondering if the the charset meta tag really just overides whatever is placed in this tag.
Thank you in advance
This post has been edited by Beavis: 26 March 2008 - 06:30 PM
#8
Posted 27 March 2008 - 08:42 AM
<a href="http://www.dudes-japan.com" target="_blank">Dudes Japan</a>
#9
Posted 27 March 2008 - 09:10 AM
haku, on Mar 27 2008, 10:42 PM, said:
It`s a deal!
I`m using a book called PHP Solutions : Dynamic Web Design Made Easy by David Powers.
Interestingly the author lived in Japan for years and apparently is a bit of an expert at building
PHP sites in Japanese. So far haven`t been able to contact him though.
Did you read my other post about the downloadable PDF book on CJKV encoding?
As I said, it doesnt say anything specifically about PHP in the table of contents.
Also it was published in 1999 which is a bit long in the tooth now perhaps.
Web years are like dog years you know.
#10
Posted 27 March 2008 - 10:37 PM
I found this great document from the W3C on the matter:
http://www.w3.org/TR...html-tech-lang/
VERY comprehensive. Anyways, it broke down to this. If you are using XHTML but are serving the page as text/html, then you set your html tag to look like this:
<html lang="ja" xml:lang="ja" xmlns="http://www.w3.org/1999/xhtml">
However, if you are serving it as application/xhtml+xml (which Im not, as IE doesn't recognize it), then you can leave out the xml:lang="ja" tag.
The page also said that you can set this meta tag:
<meta http-equiv="Content-Language" content="ja" />
But that you should only do this as an addition to setting the language in your html tag, and not by itself exclusively, for the following four reasons:
Quote
2. The language information contained in HTTP headers is rarely used by mainstream browsers for text-processing applications, and such implementation as there is is inconsistent (see the test results). Unfortunately, we have yet to identify any user agent or application that recognizes information declared in a meta tag when it comes to text-processing. On the other hand, language information declared in the html tag is consistently recognized.
3. Since changes in the text-processing language within the document can only be done using attributes, it promotes consistency to use attributes on the html element to express the default text-processing language of the document.
4. It is important to always know the default text-processing language for the document, but if the document is not read from a server, or the author is unable to apply the necessary server settings, the HTTP content header will not be available.
There was also one more meta tag mentioned in the article:
<meta name="dc.language" content="ja" />
However, the article said that nothing is known about the 'dublin core' tag, and that this is enough reason for some people to leave it out of their documents (it was enough of a reason for me!).
Finally, I added a php header declaration myself setting the accept language to Japanese. This wasn't mentioned in the article, however when checking out the headers that were being sent to my browser, I saw that the accept language was English, so I decided to head this off at the pass.
So in the end, the top of my document, including php, look like this:
<?php header("Accept-Language: ja");?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html lang="ja" xml:lang="ja" xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Shift_JIS" />
<title>Page Title</title>
<meta http-equiv="Content-Language" content="ja" />
One last point to mention, is the big capital 'EN' in the doctype - this is not a problem! It refers to the language of the schema, not the document, and therefore should be left as is.
This post has been edited by haku: 27 March 2008 - 10:39 PM
<a href="http://www.dudes-japan.com" target="_blank">Dudes Japan</a>
#11
Posted 28 March 2008 - 03:26 AM
haku, on Mar 28 2008, 12:37 PM, said:
I found this great document from the W3C on the matter:
http://www.w3.org/TR...html-tech-lang/
VERY comprehensive. Anyways, it broke down to this. If you are using XHTML but are serving the page as text/html, then you set your html tag to look like this:
<html lang="ja" xml:lang="ja" xmlns="http://www.w3.org/1999/xhtml">
However, if you are serving it as application/xhtml+xml (which Im not, as IE doesn't recognize it), then you can leave out the xml:lang="ja" tag.
The page also said that you can set this meta tag:
<meta http-equiv="Content-Language" content="ja" />
But that you should only do this as an addition to setting the language in your html tag, and not by itself exclusively, for the following four reasons:
There was also one more meta tag mentioned in the article:
<meta name="dc.language" content="ja" />
However, the article said that nothing is known about the 'dublin core' tag, and that this is enough reason for some people to leave it out of their documents (it was enough of a reason for me!).
Finally, I added a php header declaration myself setting the accept language to Japanese. This wasn't mentioned in the article, however when checking out the headers that were being sent to my browser, I saw that the accept language was English, so I decided to head this off at the pass.
So in the end, the top of my document, including php, look like this:
<?php header("Accept-Language: ja");?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html lang="ja" xml:lang="ja" xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Shift_JIS" />
<title>Page Title</title>
<meta http-equiv="Content-Language" content="ja" />
One last point to mention, is the big capital 'EN' in the doctype - this is not a problem! It refers to the language of the schema, not the document, and therefore should be left as is.
Cool! Nice bit of work haku san.
I will implement all of this on my site and let you know the outcome.
#12
Posted 28 March 2008 - 03:59 AM
#13
Posted 28 March 2008 - 04:27 AM
Quote
I did it on the site I'm working on now, and I can see that the headers are being sent with the accept-language as Japanese now. I don't really know how to tell for sure that the site will actually be indexed as a Japanese site though. I mean, I set the languages the way they said, but I didn't really know how to test it after the fact. Any ideas?
<a href="http://www.dudes-japan.com" target="_blank">Dudes Japan</a>
#14
Posted 28 March 2008 - 10:01 AM
One step at a time.
Can I double check the PHP.ini settings you are using?
[mbstring]
mbstring.language = Japanese
mbstring.internal_encoding = EUC-JP
mbstring.http_input = auto
mbstring.http_output = SJIS
mbstring.encoding_translation = On
mbstring.detect_order = auto
mbstring.substitute_character = none;
mbstring.func_overload = 0
And I also enabled
extension=php_mbstring.dll
Do you have the same?
#15
Posted 28 March 2008 - 12:09 PM
Beavis, on Mar 29 2008, 12:01 AM, said:
One step at a time.
Can I double check the PHP.ini settings you are using?
[mbstring]
mbstring.language = Japanese
mbstring.internal_encoding = EUC-JP
mbstring.http_input = auto
mbstring.http_output = SJIS
mbstring.encoding_translation = On
mbstring.detect_order = auto
mbstring.substitute_character = none;
mbstring.func_overload = 0
And I also enabled
extension=php_mbstring.dll
Do you have the same?
Hmm, I cant check right now (formatting my other computer), but that looks alright for the most part. Try changing SJIS to Shift_JIS. Although I don't think that should matter as SJIS seems to be an alias of Shift_JIS.
What is the actual problem you are having? Maybe its not your php.ini. Did you set your database encoding to EUC-JP?
I'll check my php.ini tomorrow after I've finished formatting my computer (and when I'm not drunk
<a href="http://www.dudes-japan.com" target="_blank">Dudes Japan</a>
#16
Posted 28 March 2008 - 01:09 PM
#17
Posted 28 March 2008 - 01:31 PM
<a href="http://www.dudes-japan.com" target="_blank">Dudes Japan</a>
#18
#19
Posted 29 March 2008 - 05:43 AM
I found this old post of yours
http://www.webradiance.com/lofiversion/web...ql-lot1676.html
In particular I looked at your screenshot of the phpinfo mbstring settings.
I changed my mbstring.detect_order to "none" from "auto" to match yours. Doesnt really seem to affect anything so far as I can tell.
Also I changed SJIS to Shift_JIS as you suggested and again no real differences.
Anyway, my site works 90% Ok.
(the 10% which doesn`t perhaps deserves starting another thread because I think it might be more coding related that "encoding" as such).
The home page displays alright and so does the contact form.
The contact form even sends mail in Japanese, no problem.
I found that its important to include this code in the mail() function.
// create additonal headers $additionalHeaders = "Content-Type: text/plain; charset=EUC-JP"; // send it $mailSent = mail($to, $subject, $message, $additionalHeaders );
Without the $additionalHeaders variable the email won`t send non-western text.
The thing that I can`t seem to do (which you have done) is changing THIS part of my document
<meta http-equiv="Content-Type" content="text/html; charset=Shift_JIS" />whilst still keeping the document as EUC-JP.
It seems to demand that the charset and doc encoding agree
But in any case since it seems to do what`s required, I`m left pondering is it really necessary to change the charset to Shift_JIS in my XHTML document rather than just leave it as EUC-JP?
The PHP in this case is just processing the contact mail, not outputting any text to the screen directly (it switches back to XHTML for confirmation/error messages etc.)
(Forget this last theory. I added a "今日は World" to the end of the script as a test. It echo`d Japanese to the page and worked fine! Seems to me that Shift_JIS might not be necessary for non-mobile environments at least).
Have I muddied the water sufficiently?
This post has been edited by Beavis: 29 March 2008 - 08:06 AM
#20
Posted 30 March 2008 - 08:52 PM
$sql = "SET NAMES ujis; "; mysql_query($sql); $sql = "SET character_set_results = NULL;"; mysql_query($sql);
For myself, I have a file that connects to the database that I just include on any page that requires a database connection. I just added the above code to that page. So if I need a connection to the database, I include the page and the connection is made and the above code is executed automatically.
Although I'm suspecting this won't solve your problem, as this only refers to the dynamic text, not the static text, which is where it seems you are having troubles (is that correct?)
I would suggest having your output as Shift_JIS. So if you find that saving your files as Shift_JIS with the meta tag set to Shift_JIS works, then go with that!
Mail is a whole different thing. Its good you figured out a way to make it work! Mail was a huge headache for me. I ended up using the PHP mailer class, and making some of my own personalized changes to it. But I tested it on four different mail clients (hotmail, outlook, thunderbird and my cell phone) and I got it to work, so that's what counts.
Check your mail function on all of those at the least (and more would probably be better). You may find that while your mail worked on one thing, it didn't work on another.
Also, as a side note, mail should be encoded as ISO-2022-JP, not EUC-JP. That's the mail standard unfortunately.
Don't stress too much! This is all a pain in the ass, but once you get it going you don't have to do anything anymore.
I supposed I could scale all my files down to the basic needs and zip them up and upload them here for you if you want. Let me know if its necessary, and if it is, I will do that. It may even be a good idea in case I accidentally wipe both my hard drives clean again like I just did this weekend (losing ALL my pictures for the last year in the process
<a href="http://www.dudes-japan.com" target="_blank">Dudes Japan</a>


Help
This topic is locked
MultiQuote











