We might have to use something like UTF-7 to encode URLs for the Japanese pages or for other languages. But for the more common ones that fit within the normal ISO-8859-1 range (Spanish, French, German, et al.) it might be better to anglicise for the purpose of searching. I can imagine, for example, English-speaking people who might want to read a French page about Paris, or a German page about Kurt Gödel. I can imagine people (perhaps even Germans used to English web sites) searching for "Godel" or "Goedel", but I can't imagine them ever searching for "G+APY-del" (the UTF-7 rendering). Of course, the software could do that in the background, but I think taking advantage of 8-bit ISOwould generate more links and hits. |
Although I don't have a library routine I can whip out for you, I'm sure you can find one somewhere to hack up for Wikipedia. - Flavor
We might have to use something like UTF-7 to encode URLs for the Japanese pages or for other languages. But for the more common ones that fit within the normal ISO-8859-1 range (Spanish, French, German, et al.) it might be better to anglicise for the purpose of searching. I can imagine, for example, English-speaking people who might want to read a French page about Paris, or a German page about Kurt Gödel. I can imagine people (perhaps even Germans used to English web sites) searching for "Godel" or "Goedel", but I can't imagine them ever searching for "G+APY-del" (the UTF-7 rendering). Of course, the software could do that in the background, but I think taking advantage of 8-bit ISOwould generate more links and hits.