编码 – Windows-1252(1/3/4)和ISO-8859-1之间的确切区别是什么?

我们在基于Debian的LAMP安装中托管PHP应用程序。
一切都很好 – 表现,行政和管理明智。
然而,作为一个新的开发者(我们仍然在高中),我们遇到了一些西方字符编码的问题。

在做了大量的研究后,我得出结论,在线信息有点令人困惑。它正在谈论Windows-1252是ANSI,完全符合ISO-8859-1标准。

那么无论如何,Windows-1252(1/3/4)和ISO-8859-1有什么区别?
而且ANSI在哪里进来?

我们应该在我们的Debian服务器(和工作站)上使用什么编码,以确保客户以预期的方式获取所有信息,并且我们不会丢失任何字符?

我想以更像网络的方式回答这个问题,为了回答这个问题,所以我们需要一点点的历史。 Joel Spolsky已经写了非常good introductionary article绝对最小的每个开发人员应该知道Unicode字符编码。
与我在一起,因为这将是一个很大的答案。 🙂

作为一个历史,我会指出一些引用:(非常感谢Joel!:))

The only characters that mattered were good old unaccented English letters, and we had a code for them called ASCII which was able to represent every character using a number between 32 and 127. Space was 32, the letter “A” was 65, etc. This could conveniently be stored in 7 bits. Most computers in those days were using 8-bit bytes, so not only could you store every possible ASCII character, but you had a whole bit to spare, which, if you were wicked, you could use for your own devious purposes.

And all was good, assuming you were an English speaker.
Because bytes have room for up to eight bits, lots of people got to thinking, “gosh, we can use the codes 128-255 for our own purposes.” The trouble was, lots of people had this idea at the same time, and they had their own ideas of what should go where in the space from 128 to 255.

所以现在,“OEM字符集”与电脑分发,这些仍然是不同的和不兼容的。而对我们当代的惊奇 – 这一切都很好!他们没有互联网,人们很少在不同地区的系统之间交换文件。

乔尔说:

In fact as soon as people started buying PCs outside of America all kinds of different OEM character sets were dreamed up, which all used the top 128 characters for their own purposes.
Eventually this OEM free-for-all got codified in the ANSI standard. In the ANSI standard, everybody agreed on what to do below 128, which was pretty much the same as ASCII, but there were lots of different ways to handle the characters from 128 and on up, depending on where you lived. These different systems were called 07002.

这就是“Windows代码页”的诞生。他们实际上是由DOS代码页“养育”的。然后Unicode诞生了! :)和UTF-8是“存储你的Unicode代码点字符串的另一个系统”,实际上“0-127中的每个代码点都存储在一个字节”,与ASCII相同。我将不再详细说明Unicode和UTF-8,但是您应该在BOMEndiannessCharacter Encoding上阅读。

@Jukka K. Korpela是“正确的钱”,说最可能你是指Windows-1252

在“ANSI阴谋”中,微软实际上承认了glossary of terms中的错误标签:

The so-called Windows character set (WinLatin1, or Windows code page 1252, to be exact) uses some of those positions for printable characters. Thus, the Windows character set is NOT identical with ISO 8859-1. The Windows character set is often called “ANSI character set”, but this is SERIOUSLY MISLEADING. It has NOT been approved by ANSI.

所以,ANSI在引用Windows字符集时是不经ANSI认证的! 🙂

正如Jukka指出的(信用卡给你很好的答案)

Windows-1252 ISO Latin 1, also known as ISO-8859-1 as a character encoding, so that the code range 0x80 to 0x9F is reserved for control characters in ISO-8859-1 (so-called C1 Controls), wheres in Windows-1252, some of the codes there are assigned to printable characters (mostly punctuation characters), others are left undefined.

不过我的个人观点和技术理解是Windows-1252和ISO-8859-1都不是WEB编码! :)所以:

>对于网页,请使用UTF-8作为内容的编码
所以将数据存储为UTF-8,并使用HTTP Header:Content-Type:text / html“吐出”数据。字符集= UTF-8。

还有一个称为HTML内容类型元标记的东西:
< HTML>
< HEAD>
< meta http-equiv =“Content-Type”content =“text / html; charset = utf-8”>
现在,浏览器在遇到此标签时实际执行的操作是从HTML文档的开头再次开始,以便它们能够以声明的编码重新解释文档。只有当没有“Content-type”标头时才应该发生这种情况。
>如果您的系统的用户需要从其生成的文件,请使用其他特定编码。
例如,一些西方用户可能需要Excel生成的文件,或Windows-1252中的CSV。如果是这种情况,请在该区域设置中对文本进行编码,然后将其存储在fs上,并将其作为可下载文件提供。
>在设计HTTP时还需要注意一点:
内容编码分发机制应该这样工作。

I.客户端通过以下方式请求特定内容类型和编码的网页:’Accept’和’Accept-Charset’request headers

II。然后,服务器(或Web应用程序)将传输的内容返回到该编码和字符集。

这不是大多数现代网络应用程序中的情况。 Web应用程序实际发生的事情(强制客户端)内容为UTF-8。这样做是因为浏览器根据响应标头解释接收的文档,而不是他们实际期望的内容。

我们应该去Unicode,所以请,请使用UTF-8尽可能分发您的内容,最重要的是适用。否则the elders of the Internet会困扰你! 🙂

附:
关于在网页中使用MS Windows字符的一些更好的文章可以在herehere找到。

http://stackoverflow.com/questions/19109899/what-is-the-exact-difference-between-windows-12521-3-4-and-iso-8859-1

本站文章除注明转载外,均为本站原创或编译
转载请明显位置注明出处:编码 – Windows-1252(1/3/4)和ISO-8859-1之间的确切区别是什么?