Jump to content

Do the File Type Encoding for your doc AND the Content/charset meta tag need to be set to the same?

Whether you're a seasoned veteran or a struggling beginner, Web Radiance is the web development and web design forum for you. You'll find answers to all your HTML, CSS, SEO, and Programming needs. Pull up a chair and stay awhile.

  • (2 Pages)
  • +
  • 1
  • 2
  • You cannot start a new topic
  • This topic is locked

Do the File Type Encoding for your doc AND the Content/charset meta tag need to be set to the same? Rate Topic: -----

#1 User is offline   Beavis 

  • W.R. Corporal
  • Group: Members
  • Posts: 167
  • Joined: 24-March 08

Posted 24 March 2008 - 08:23 PM

For example if I save a file as EUC-JP, can I set the

<meta http-equiv="Content-Type" content="text/html; charset=Shift_JIS" />

Shift_JIS is a requirement for displaying on cellphones in Japan but it doesnt work with PHP. Go figure.
0

#2 User is offline   marcamos 

  • W.R. General
  • Group: Administrators
  • Posts: 2,849
  • Joined: 04-April 06
  • Gender:Male
  • Location:Massachusetts - USA

Posted 25 March 2008 - 05:09 AM

Another member, haku, has a lot of experience with this sort of stuff. If he doesn't answer this question soon, you might have luck sending him a personal message.
0

#3 User is offline   temhawk 

  • W.R. Private First-Class
  • Group: Members
  • Posts: 322
  • Joined: 30-August 07
  • Gender:Male
  • Interests:travel, cg art, macs, music, skateboarding, programming, discovery channel, TextMate 2

Posted 25 March 2008 - 01:45 PM

All I can tell you is that...

<?xml version="1.0" encoding="UTF-8"?>

...or similar at the start of the document will cause IE to show the page's source code when calling up the file. (Don't know about newest IE version though.)

TextMate/W3C sees my pages valid though and seems like it's your choice to either use that or...

<meta http-equiv="content-type" content="text/html; charset=utf-8" />

(with the attribute values changed to whatever you need)
0

#4 User is offline   haku 

  • 日本語 Ninja
  • Group: Members
  • Posts: 652
  • Joined: 21-September 07
  • Gender:Male
  • Location:Yokohama, Japan

Posted 25 March 2008 - 06:40 PM

All I can tell you is what I do. I have the internal encoding for PHP set to EUC-JP, as PHP doesn't like Shift_JIS. But since all my documents are passing through PHP (its php that processes them), I save the document itself in EUC-JP so that PHP can properly read it. I then have it set to output the document in Shift_JIS (I set this in my php.ini). So the meta tag has to be set to Shift_JIS.

Or the short answer to your question: yes.

BUT, if its a hardcoded HTML document and not a php document, then you cant do that. This will only work with php documents.


edit: I wrote that while still drinking my coffee after just waking up. I'm going to try this again so I can say it a little clearer:

The answer to your question is yes, you can do that. However, in order to be able to do that, you need to have your php.ini settings set the correct way. As you said, PHP doesn't work very well with Shift_JIS, so you shouldn't use Shift_JIS as the internal encoding. You can use EUC-JP instead, as PHP works well with that. Then you set the output endocing to Shift-JIS. In doing this, PHP does everything in EUC-JP and then at the very end out re-encodes the output as Shift_JIS. So if you have your php.ini set the way I just described above, not only CAN you do what you are asking, but rather you have to, or things won't display correctly. Here is the reason why:

The meta tag just tells the server what charset to expect in the data that follows. This is why it has to be encoded in the head of the document*. However, the document itself is processed in PHP. So the actual encoding of the document has to be saved in a charset that PHP will understand - which means it has to be saved in the same charset as the internal encoding of PHP is set to. So PHP reads the document in EUC-JP, and outputs it in Shift_JIS, including the meta tag, which is then sent to the server, telling the server to expect Shift_JIS, in which the rest of the data is encoded.

You know, I don't know that I explained that any clearer than the first time. Need more coffee!


*I was actually reading somewhere recently that servers read through a document until they find the expected charset, then go back and start again at the top of the document and read through everything once. The article I was reading said that as a result of this, the charset meta tag should be the very first thing in the head of your document, even before the title.

This post has been edited by haku: 25 March 2008 - 08:16 PM

<a href="http://www.jaypan.com" target="_blank">Jaypan</a>
<a href="http://www.dudes-japan.com" target="_blank">Dudes Japan</a>
0

#5 User is offline   Beavis 

  • W.R. Corporal
  • Group: Members
  • Posts: 167
  • Joined: 24-March 08

Posted 26 March 2008 - 04:37 AM

View Posthaku, on Mar 26 2008, 08:40 AM, said:

All I can tell you is what I do. I have the internal encoding for PHP set to EUC-JP, as PHP doesn't like Shift_JIS. But since all my documents are passing through PHP (its php that processes them), I save the document itself in EUC-JP so that PHP can properly read it. I then have it set to output the document in Shift_JIS (I set this in my php.ini). So the meta tag has to be set to Shift_JIS.

Or the short answer to your question: yes.

BUT, if its a hardcoded HTML document and not a php document, then you cant do that. This will only work with php documents.


edit: I wrote that while still drinking my coffee after just waking up. I'm going to try this again so I can say it a little clearer:

The answer to your question is yes, you can do that. However, in order to be able to do that, you need to have your php.ini settings set the correct way. As you said, PHP doesn't work very well with Shift_JIS, so you shouldn't use Shift_JIS as the internal encoding. You can use EUC-JP instead, as PHP works well with that. Then you set the output endocing to Shift-JIS. In doing this, PHP does everything in EUC-JP and then at the very end out re-encodes the output as Shift_JIS. So if you have your php.ini set the way I just described above, not only CAN you do what you are asking, but rather you have to, or things won't display correctly. Here is the reason why:

The meta tag just tells the server what charset to expect in the data that follows. This is why it has to be encoded in the head of the document*. However, the document itself is processed in PHP. So the actual encoding of the document has to be saved in a charset that PHP will understand - which means it has to be saved in the same charset as the internal encoding of PHP is set to. So PHP reads the document in EUC-JP, and outputs it in Shift_JIS, including the meta tag, which is then sent to the server, telling the server to expect Shift_JIS, in which the rest of the data is encoded.

You know, I don't know that I explained that any clearer than the first time. Need more coffee!
*I was actually reading somewhere recently that servers read through a document until they find the expected charset, then go back and start again at the top of the document and read through everything once. The article I was reading said that as a result of this, the charset meta tag should be the very first thing in the head of your document, even before the title.



Hello Haku

You seem to be the resident expert on Japanese coding so I hope I can learn a lot from you.

I think I follow what you say.

Basically for PHP

1) Document saved as EUC-JP
2) PHP internal string options set as EUC-JP
3) charset in the document set to Shift_JIS

For regular XHTML documents.
1) Must be saved as EUC-JP and have the charset in the document as EUC-JP
OR
2) Saved as Shift_JIS and have the charset in the document as Shift_JIS.

I hope thats right.
0

#6 User is offline   haku 

  • 日本語 Ninja
  • Group: Members
  • Posts: 652
  • Joined: 21-September 07
  • Gender:Male
  • Location:Yokohama, Japan

Posted 26 March 2008 - 07:10 AM

Bingo!

You get 100 yen :D
<a href="http://www.jaypan.com" target="_blank">Jaypan</a>
<a href="http://www.dudes-japan.com" target="_blank">Dudes Japan</a>
0

#7 User is offline   Beavis 

  • W.R. Corporal
  • Group: Members
  • Posts: 167
  • Joined: 24-March 08

Posted 26 March 2008 - 06:30 PM

Thanks Haku!

Follow up question, what other language specifiers are needed in your document?

For example some pages have one or both of the following.

1.
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="ja" lang="ja">
[Of course the first part of this statement is a requirement but I`m not sure about the end.
Maybe its only relevent for xml driven applications like AJAX? In which case it might be
good to get in the habit of including it anyway so we dont encounter problems if such
functionality is added down the track?


2.
<meta http-equiv="Content-Language" content="ja" />
Some sources say that this tag doesnt really achieve anything.
I`m wondering if the the charset meta tag really just overides whatever is placed in this tag.

Thank you in advance :notworthy:

This post has been edited by Beavis: 26 March 2008 - 06:30 PM

0

#8 User is offline   haku 

  • 日本語 Ninja
  • Group: Members
  • Posts: 652
  • Joined: 21-September 07
  • Gender:Male
  • Location:Yokohama, Japan

Posted 27 March 2008 - 08:42 AM

To tell the truth I don't use either of those, but on that note my sites probably aren't indexed as Japanese, so I maybe should be using them. But on pure speculation alone, I would say that the first one would be necessary, not the second one. But I don't really see a big issue in including them both - try it out and let me know how it goes! You are pretty much the first person I've found that is facing the same issues I am, so maybe we can work on them together :D
<a href="http://www.jaypan.com" target="_blank">Jaypan</a>
<a href="http://www.dudes-japan.com" target="_blank">Dudes Japan</a>
0

#9 User is offline   Beavis 

  • W.R. Corporal
  • Group: Members
  • Posts: 167
  • Joined: 24-March 08

Posted 27 March 2008 - 09:10 AM

View Posthaku, on Mar 27 2008, 10:42 PM, said:

You are pretty much the first person I've found that is facing the same issues I am, so maybe we can work on them together :D


It`s a deal!

I`m using a book called PHP Solutions : Dynamic Web Design Made Easy by David Powers.
Interestingly the author lived in Japan for years and apparently is a bit of an expert at building
PHP sites in Japanese. So far haven`t been able to contact him though.

Did you read my other post about the downloadable PDF book on CJKV encoding?
As I said, it doesnt say anything specifically about PHP in the table of contents.
Also it was published in 1999 which is a bit long in the tooth now perhaps.
Web years are like dog years you know.
0

#10 User is offline   haku 

  • 日本語 Ninja
  • Group: Members
  • Posts: 652
  • Joined: 21-September 07
  • Gender:Male
  • Location:Yokohama, Japan

Posted 27 March 2008 - 10:37 PM

Well your questions on setting the language (not charset) of a document prompted me to look deeper into the situation, as I both didn't know the answers, and hadn't even thought of the questions!

I found this great document from the W3C on the matter:

http://www.w3.org/TR...html-tech-lang/

VERY comprehensive. Anyways, it broke down to this. If you are using XHTML but are serving the page as text/html, then you set your html tag to look like this:

<html lang="ja" xml:lang="ja" xmlns="http://www.w3.org/1999/xhtml">


However, if you are serving it as application/xhtml+xml (which Im not, as IE doesn't recognize it), then you can leave out the xml:lang="ja" tag.

The page also said that you can set this meta tag:

<meta http-equiv="Content-Language" content="ja" />


But that you should only do this as an addition to setting the language in your html tag, and not by itself exclusively, for the following four reasons:

Quote

1. HTTP and meta declarations allow you to specify more than one language value. This is inappropriate for labeling the text-processing language, which must be done one language at a time. On the other hand, multiple language values are appropriate when declaring language for documents that are aimed at speakers of more than one language. Attribute-based language declarations can only specify one language at a time, so they are less appropriate for specifying the language of the intended audience, but they are perfect for labeling the text-processing language for text.
2. The language information contained in HTTP headers is rarely used by mainstream browsers for text-processing applications, and such implementation as there is is inconsistent (see the test results). Unfortunately, we have yet to identify any user agent or application that recognizes information declared in a meta tag when it comes to text-processing. On the other hand, language information declared in the html tag is consistently recognized.
3. Since changes in the text-processing language within the document can only be done using attributes, it promotes consistency to use attributes on the html element to express the default text-processing language of the document.
4. It is important to always know the default text-processing language for the document, but if the document is not read from a server, or the author is unable to apply the necessary server settings, the HTTP content header will not be available.


There was also one more meta tag mentioned in the article:

<meta name="dc.language" content="ja" />


However, the article said that nothing is known about the 'dublin core' tag, and that this is enough reason for some people to leave it out of their documents (it was enough of a reason for me!).

Finally, I added a php header declaration myself setting the accept language to Japanese. This wasn't mentioned in the article, however when checking out the headers that were being sent to my browser, I saw that the accept language was English, so I decided to head this off at the pass.

So in the end, the top of my document, including php, look like this:

<?php header("Accept-Language: ja");?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html lang="ja" xml:lang="ja" xmlns="http://www.w3.org/1999/xhtml">
	
	<head>
		<meta http-equiv="Content-Type" content="text/html; charset=Shift_JIS" />
		<title>Page Title</title>
		<meta http-equiv="Content-Language" content="ja" />


One last point to mention, is the big capital 'EN' in the doctype - this is not a problem! It refers to the language of the schema, not the document, and therefore should be left as is.

This post has been edited by haku: 27 March 2008 - 10:39 PM

<a href="http://www.jaypan.com" target="_blank">Jaypan</a>
<a href="http://www.dudes-japan.com" target="_blank">Dudes Japan</a>
0

#11 User is offline   Beavis 

  • W.R. Corporal
  • Group: Members
  • Posts: 167
  • Joined: 24-March 08

Posted 28 March 2008 - 03:26 AM

View Posthaku, on Mar 28 2008, 12:37 PM, said:

Well your questions on setting the language (not charset) of a document prompted me to look deeper into the situation, as I both didn't know the answers, and hadn't even thought of the questions!

I found this great document from the W3C on the matter:

http://www.w3.org/TR...html-tech-lang/

VERY comprehensive. Anyways, it broke down to this. If you are using XHTML but are serving the page as text/html, then you set your html tag to look like this:

<html lang="ja" xml:lang="ja" xmlns="http://www.w3.org/1999/xhtml">


However, if you are serving it as application/xhtml+xml (which Im not, as IE doesn't recognize it), then you can leave out the xml:lang="ja" tag.

The page also said that you can set this meta tag:

<meta http-equiv="Content-Language" content="ja" />


But that you should only do this as an addition to setting the language in your html tag, and not by itself exclusively, for the following four reasons:
There was also one more meta tag mentioned in the article:

<meta name="dc.language" content="ja" />


However, the article said that nothing is known about the 'dublin core' tag, and that this is enough reason for some people to leave it out of their documents (it was enough of a reason for me!).

Finally, I added a php header declaration myself setting the accept language to Japanese. This wasn't mentioned in the article, however when checking out the headers that were being sent to my browser, I saw that the accept language was English, so I decided to head this off at the pass.

So in the end, the top of my document, including php, look like this:

<?php header("Accept-Language: ja");?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html lang="ja" xml:lang="ja" xmlns="http://www.w3.org/1999/xhtml">
	
	<head>
		<meta http-equiv="Content-Type" content="text/html; charset=Shift_JIS" />
		<title>Page Title</title>
		<meta http-equiv="Content-Language" content="ja" />


One last point to mention, is the big capital 'EN' in the doctype - this is not a problem! It refers to the language of the schema, not the document, and therefore should be left as is.


Cool! Nice bit of work haku san.

I will implement all of this on my site and let you know the outcome. :notworthy:
0

#12 User is offline   sushil 

  • W.R. Private
  • Group: Members
  • Posts: 7
  • Joined: 28-March 08

Posted 28 March 2008 - 03:59 AM

Absolutely nice work by haku.....
0

#13 User is offline   haku 

  • 日本語 Ninja
  • Group: Members
  • Posts: 652
  • Joined: 21-September 07
  • Gender:Male
  • Location:Yokohama, Japan

Posted 28 March 2008 - 04:27 AM

Thanks guys!

Quote

I will implement all of this on my site and let you know the outcome.


I did it on the site I'm working on now, and I can see that the headers are being sent with the accept-language as Japanese now. I don't really know how to tell for sure that the site will actually be indexed as a Japanese site though. I mean, I set the languages the way they said, but I didn't really know how to test it after the fact. Any ideas?
<a href="http://www.jaypan.com" target="_blank">Jaypan</a>
<a href="http://www.dudes-japan.com" target="_blank">Dudes Japan</a>
0

#14 User is offline   Beavis 

  • W.R. Corporal
  • Group: Members
  • Posts: 167
  • Joined: 24-March 08

Posted 28 March 2008 - 10:01 AM

Things still a bit of a train wreck at this end unforetunately.
One step at a time.
Can I double check the PHP.ini settings you are using?

[mbstring]

mbstring.language = Japanese

mbstring.internal_encoding = EUC-JP

mbstring.http_input = auto

mbstring.http_output = SJIS

mbstring.encoding_translation = On

mbstring.detect_order = auto

mbstring.substitute_character = none;

mbstring.func_overload = 0

And I also enabled
extension=php_mbstring.dll

Do you have the same?
0

#15 User is offline   haku 

  • 日本語 Ninja
  • Group: Members
  • Posts: 652
  • Joined: 21-September 07
  • Gender:Male
  • Location:Yokohama, Japan

Posted 28 March 2008 - 12:09 PM

View PostBeavis, on Mar 29 2008, 12:01 AM, said:

Things still a bit of a train wreck at this end unforetunately.
One step at a time.
Can I double check the PHP.ini settings you are using?

[mbstring]

mbstring.language = Japanese

mbstring.internal_encoding = EUC-JP

mbstring.http_input = auto

mbstring.http_output = SJIS

mbstring.encoding_translation = On

mbstring.detect_order = auto

mbstring.substitute_character = none;

mbstring.func_overload = 0

And I also enabled
extension=php_mbstring.dll

Do you have the same?


Hmm, I cant check right now (formatting my other computer), but that looks alright for the most part. Try changing SJIS to Shift_JIS. Although I don't think that should matter as SJIS seems to be an alias of Shift_JIS.

What is the actual problem you are having? Maybe its not your php.ini. Did you set your database encoding to EUC-JP?

I'll check my php.ini tomorrow after I've finished formatting my computer (and when I'm not drunk :D)
<a href="http://www.jaypan.com" target="_blank">Jaypan</a>
<a href="http://www.dudes-japan.com" target="_blank">Dudes Japan</a>
0

#16 User is offline   marcamos 

  • W.R. General
  • Group: Administrators
  • Posts: 2,849
  • Joined: 04-April 06
  • Gender:Male
  • Location:Massachusetts - USA

Posted 28 March 2008 - 01:09 PM

Just wanted to second (or, third?) what others have said; Great post, there, haku. WebRadiance is extremely fortunate to have you around.
0

#17 User is offline   haku 

  • 日本語 Ninja
  • Group: Members
  • Posts: 652
  • Joined: 21-September 07
  • Gender:Male
  • Location:Yokohama, Japan

Posted 28 March 2008 - 01:31 PM

Thanks! I feel fortunate to have webradiance around, so I guess its a win-win situation for us all :D
<a href="http://www.jaypan.com" target="_blank">Jaypan</a>
<a href="http://www.dudes-japan.com" target="_blank">Dudes Japan</a>
0

#18 User is offline   sushil 

  • W.R. Private
  • Group: Members
  • Posts: 7
  • Joined: 28-March 08

Posted 29 March 2008 - 05:22 AM

View Posthaku, on Mar 28 2008, 01:31 PM, said:

Thanks! I feel fortunate to have webradiance around, so I guess its a win-win situation for us all :D


Yeah you a fortunated guys,you know that?
0

#19 User is offline   Beavis 

  • W.R. Corporal
  • Group: Members
  • Posts: 167
  • Joined: 24-March 08

Posted 29 March 2008 - 05:43 AM

Haku

I found this old post of yours
http://www.webradiance.com/lofiversion/web...ql-lot1676.html
In particular I looked at your screenshot of the phpinfo mbstring settings.

I changed my mbstring.detect_order to "none" from "auto" to match yours. Doesnt really seem to affect anything so far as I can tell.
Also I changed SJIS to Shift_JIS as you suggested and again no real differences.

Anyway, my site works 90% Ok.
(the 10% which doesn`t perhaps deserves starting another thread because I think it might be more coding related that "encoding" as such).

The home page displays alright and so does the contact form.
The contact form even sends mail in Japanese, no problem.
I found that its important to include this code in the mail() function.

// create additonal headers
	$additionalHeaders = "Content-Type: text/plain; charset=EUC-JP";

// send it  
	$mailSent = mail($to, $subject, $message, $additionalHeaders );


Without the $additionalHeaders variable the email won`t send non-western text.

The thing that I can`t seem to do (which you have done) is changing THIS part of my document
<meta http-equiv="Content-Type" content="text/html; charset=Shift_JIS" />
whilst still keeping the document as EUC-JP.
It seems to demand that the charset and doc encoding agree :confused1:

But in any case since it seems to do what`s required, I`m left pondering is it really necessary to change the charset to Shift_JIS in my XHTML document rather than just leave it as EUC-JP?
The PHP in this case is just processing the contact mail, not outputting any text to the screen directly (it switches back to XHTML for confirmation/error messages etc.)

Perhaps if it were sending PHP text (using Shift_JIS output setting) to an EUC-JP charset document that would be a problem but I`m not so that may be why its ok.
(Forget this last theory. I added a "今日は World" to the end of the script as a test. It echo`d Japanese to the page and worked fine! Seems to me that Shift_JIS might not be necessary for non-mobile environments at least).

Have I muddied the water sufficiently? :rolleyes1:

This post has been edited by Beavis: 29 March 2008 - 08:06 AM

0

#20 User is offline   haku 

  • 日本語 Ninja
  • Group: Members
  • Posts: 652
  • Joined: 21-September 07
  • Gender:Male
  • Location:Yokohama, Japan

Posted 30 March 2008 - 08:52 PM

Hmm, that's strange. I just realized though, I left out one important piece of the puzzle! You need to add this before the first query to the database on any page:

$sql = "SET NAMES ujis; ";
mysql_query($sql);
$sql = "SET character_set_results = NULL;";
mysql_query($sql);


For myself, I have a file that connects to the database that I just include on any page that requires a database connection. I just added the above code to that page. So if I need a connection to the database, I include the page and the connection is made and the above code is executed automatically.

Although I'm suspecting this won't solve your problem, as this only refers to the dynamic text, not the static text, which is where it seems you are having troubles (is that correct?)

I would suggest having your output as Shift_JIS. So if you find that saving your files as Shift_JIS with the meta tag set to Shift_JIS works, then go with that!

Mail is a whole different thing. Its good you figured out a way to make it work! Mail was a huge headache for me. I ended up using the PHP mailer class, and making some of my own personalized changes to it. But I tested it on four different mail clients (hotmail, outlook, thunderbird and my cell phone) and I got it to work, so that's what counts.

Check your mail function on all of those at the least (and more would probably be better). You may find that while your mail worked on one thing, it didn't work on another.

Also, as a side note, mail should be encoded as ISO-2022-JP, not EUC-JP. That's the mail standard unfortunately.

Don't stress too much! This is all a pain in the ass, but once you get it going you don't have to do anything anymore.

I supposed I could scale all my files down to the basic needs and zip them up and upload them here for you if you want. Let me know if its necessary, and if it is, I will do that. It may even be a good idea in case I accidentally wipe both my hard drives clean again like I just did this weekend (losing ALL my pictures for the last year in the process :o )
<a href="http://www.jaypan.com" target="_blank">Jaypan</a>
<a href="http://www.dudes-japan.com" target="_blank">Dudes Japan</a>
0

Share this topic:


  • (2 Pages)
  • +
  • 1
  • 2
  • You cannot start a new topic
  • This topic is locked

1 User(s) are reading this topic
0 members, 1 guests, 0 anonymous users