The current algorithm is essentially
A better strategy might be
This keeps coming up, so lets chat.
What I would most enjoy hearing is an example of two distinct Free Links that, by my suggestion, would canonize to the same url, but really shouldn't.
I don't think there are any. Indeed, canonizing different things to the same thing is often good, even when it leads to ambiguity. The ambiguity can be resolved on the canonical page.
It is unfortunate but true that we have to be careful to support what has been done in the past. We only used CrammedTogetherWords? links for a month or so, and we still haven't gotten rid of them completely, and I expect to keep finding a few months from now.
$NonEnglish = 0
), or treat them as letters (if $NonEnglish = 1
). I would rather not try to recognize accented characters just to remove the accents. Also, some conversions may not be that simple like "Gödel" where the preferred English version is "Goedel".
I was, indeed, suggesting that we do the wrong thing, and turn [[Gödel]] into [[Godel]]. I still believe that doing so would make for fewer user mistakes than not doing so.
WikiName
links to be allowed in the Free Link format.
The development version of UseModWiki now converts all separate "words" in a page name to start with uppercase letters. Words are separated by spaces/underscores or punctuation. (In this version the 5 punctuation characters (),.- are allowed in page titles.)
Links to pages (either within pages or in URLs) will be able to use lowercase words. For instance: naming conventions [naming Conventions]? Naming conventions [Naming Conventions]? ...will all link to the same page (which will be titled titled "Naming Conventions"). One could link to http.../naming_conventions or http.../naming_Conventions, etc.
The only drawback of this solution that I can find is that irrelevant words (like and, of, or a) will also be capitalized in page titles. For instance, the page The Canon of Scripture will be titled [The Canon Of Scripture]?. The page title (fully capitalized) is what is shown on the Recent Changes page, the search results page, and the top of the actual wiki page itself.
Before the code can be changed, a conversion script must be run to convert all the old pages with lowercase words to uppercase names. In most cases this will be easy, but there are a few pages with names that differ only in case. Links will not have to be converted--they will work as-is.
Does anyone strongly object to this plan? Eventually further canonization may occur, but this simple step should solve some recent problems. --CliffordAdams
There are two separate functions here that are currently closely tied, but that perhaps could be separated at some point: (1) the page "address" and (2) the displayed page title. The address should be such that accidental or ad hoc links are maximized. For example, if I'm typing some text on another page and use the word "republic", I should be able to put brackets around it on a whim and hope it goes somewhere useful. It would be nice to do the same thing with "Plato's ''Republic''", "Kurt Gödel", and "War and Peace". Removing punctuation, extra spaces, and markup; anglicizing foreign characters (or maybe encoding them); and standardizing case achieves this purpose nicely. But doing the same thing to the displayed title of a page makes it ugly and (more seriously) less accurate.
I can imagine two ways to produce cleaner titles and simple ad hoc links: (1) Rather than simply removing punctuation and foreign characters, encode them in a way that allows them to be reproduced for the title, perhaps URL-encoding. That takes care of "Plato%27s" and "G%F6del", but not the capitalization of "War and Peace". (2) Allow the page itself to override the address-title with its own display-title. This takes care of all of the above, at the risk of possibly confusing users by allowing titles that don't relate to the address at all. Perhaps allow only certain changes; for example, the given title must cannonize to the same address. Also, I assume this is harder to implement.
If you decide to eventually adopt one of these schemes or a similar one, it might be a good idea to use a near-term solution that does not conflict with its later adoption.
Also, it is unclear from your description of the proposed capitalization scheme whether "Nirvana (band)" would remanin as is or become "Nirvana (Band)". --Lee Daniel Crocker
http://es.wikipedia.com/wiki.cgi?action=edit&id=Cómo_se_edita_una_página
; then you can copy the text into the new page. A pain in the ass, but since there are only a dozen pages or so, it's not that bad. --LDC
So does this mean that the articles at the English Wikipedia will convert to all leading uppercase?
I think that page-name (the 'address') and page-title must be decoupled sooner or later. Delay only makes the task more difficult. These have different functions and it seems that much of the discontent with current practice is caused by trying to balance "good" page-names against "good" page-titles, causing both to be less than ideal.
Titles can default to the page-name is some form, but only on creation of the page. After that, they should be subject to editing--after all, you trust the community to maintain the rest of the page's content. Requiring the title to canonify (?) to the address could be a possible 'feature'.
Page-names/free-links should be canonified with an aggressive algorithm. Keep in mind that the primary goals of the page-name are to provide linkage into the physical implementation of the Wiki and to provide a minimally ambiguous (sorry) 'address' based on arbitrarily complex titles.
A non-goal for page-names is to provide a 'user-friendly' alternative to entering a search string or selecting a free link within a page. A page-name that 'looks' similar to the page-title whould be nice, but not a strict requirement.
Granted, this would make the software more complex, but I think the users would be happier. As the project grows, I suspect most of the users will be casual accessors who will be somewhat put off by strangly formatted titles and links.
--loh
What happened finally with the wiki canonization discussed here?? The new scheme was implemented on the non-english wikis, but the new version (magnus' one) used the ols scheme. what will happen? (specially since the links and article names in the non-english wikipedias are made using the all uppercase scheme... AN