Wikipedia: Wikipedia PHP script

This is a new PHP script that could eventually replace the current Wikipedia software. It uses MySQL and has other advantages, and it is running "live" at [wikipedia.sourceforge.net].

UPDATE :

A copy of the tarball from the German wikipedia was installed at the test site. Please go there and help find bugs! Also, the conversion from the tarball lost the article histories. Help in that area would be very much appreciated!

You can see the CVS archive of the current script version at [1] (see also the "wikipedia" project at sourceforge).

Why is this replacing UseModWiki?

Isn't Perl more robust and faster than PHP? (Anyone who can answer this should overwrite the question.)

: It's largely a religious question, though there's no denying that Perl has many more features than PHP and it's probably faster too, when properly configured. I'm personally sad to see Perl wikipedia being replaced by PHP code: perl is wider known, more stable between versions, its development community is much larger, and I know it ;) Having said that, there's still need for better wiki to be written for wikipedia which would support namespaces together with many other features, and it's probably time to move to MySQL rather than flat-files underlying storage system, too. Magnus has volunteered his time and code and no Perl enthusiast did the same - that, I suspect, is the reason. --AV
: It seems that from all the volunteers for writing article, none of them did volunteer to write a single line of code for the PHP wiki. So far, [Clifford Adams]? was most helpful by giving me a short Perl script for the conversion, which I was able to translate into PHP. Anyway, I am sure Perl has more features, but I doubt that for an application like wikipedia, it has real advantages; on the contrary, as PHP was designed especially for WWW interfaces, it is probably more suited for this purpose. And, the PHP program doesn't read like a segmentation fault ;) --Magnus Manske

Magnus, I'm not anti-PHP, I'm merely pro-Perl ;) Anyway, I would like to contribute code. If there're significant areas of code in the new wiki you need help on, could you please email me and tell me what they are? This is a serious offer ;) --AV

: Now that sounds nice;) AFAIK, despite some effort, the parser is still flawed in my script. If you click through the (German) pages, you should sooner or later spot some stuff that isn't displayed correctly. Also, I only filter out < SCRIPT > tags, and I don't check for matching tags. That kind of thing (which is probably more easy in Perl!).
: Also, the conversion script (.db file to MySQL) works for the latest article version, but loses the history of the articles. A brief PHP (or Perl) script that generates the different versions of an article, together with the information about the change like author, minor change, etc., directly from the .db file would help a lot. --Magnus Manske

What do you want? (I know, Kosh, never ask that question...)

How should I handle case sensitivity in titles? There are two versions of this wiki.cgi, so should it be this one (100% compatible), or the new one (more possibilities in the future), or something completely different?
- Even though the new UseModWiki version (English language Wikipedia hasn't implemented) lists all page titles in upper case, I actually like things the way they are now. --LMS
What HTML tags should I support? All of them? Just blank out JavaScript? Or insist on wiki-style, outlawing < b > and < i > ?
- At least all of them that are supported in this wiki... --LMS
What special pages would you like? Currently, I have detailed up-to-the-minute statistics, random page, upload files, empty topics (with no content), all topics, demanded topics (non-existing topics, sorted by how many articles try to link there), central topics (existing topics, sorted by how many articles link there), and lonely topics (that exist but no other article links there). All of these pages are created "on-the-fly", of course! :)
- Sounds groovy! See feature requests for more ideas.
- I'd expand the definition of "lonely topics" to include any topic unreachable from the HomePage. --Damian Yerrick

Known bugs:

The parser (to convert the source text into readable stuff) is very basic. I am currently rewriting it.
Currently, no way to convert wikipedia to MySQL automatically.

Magnus Manske is the author of the basic script here. Other programmers are strongly urged to get involved. Some discussion of the script and requirements are ongoing on Wikipedia-L.

Which database?

Are there any reasons for choosing MySQL over PostgreSQL ? Some benchmarks show that one or another performs much better. I think that the best way would be allow both and test which will perform better. --Taw

: I'll stick to the MySQL version for now, since I have barely the time to maintain that one alone right now. Once it is up at this site (probably early January), and has no real bugy anymore, I could try PostgreSQL at a test site and convert it if it is really faster. --Magnus Manske

If the app. relies much on any complex joins, than PostgreSQL will probably be able to outperform MySQL. If it's mostly simple selects from a single table, than probably not. But as the number of users increases, PostgreSQL handles locking issues much better through its Multi-Version Concurrency Control, which is essentially better than row-level locking. So if you get the chance and time permits, I think it would be worthwhile to give PostgreSQL a try. --Wesley

other scrips for PHP wikis can be found at:

: http://sourceforge.net/projects/tavi/
: http://sourceforge.net/projects/sfwiki/
: http://sourceforge.net/projects/phpwiki/

Wishes:

The more HTML is supported, the better. Some people write naturally in HTML, others prefer the simple wiki style -- why restrict anyone? The only issue with unrestricted HTML is that malicious people could screw up the overall layout of the page, but who cares? We can always just revert back to the previous version.
- Just don't permit frames, please. Actually, I never missed any tags tht aren't implemented in the current version. What more tags are needed? Actually, I would oppose using <a> tags and I would prefer (contrary to an earlier opinion of mine) marking external links with [ and ] in order to see that they are in fact external links. This way we can tell easily when someone is linking within Wikipedia and when they're abusing the system by adding external links in inappropriate places. Generally speaking, let's keep the markup simple so that the people who can add content, but who don't necessarily know how to code, will feel 100% comfortable contributing. --LMS
  - I still need some more eyeballs to look at the parser. The question about the HTML tagsa was, should I have a list of allowed tags, or a list of "dangerous" ones and allow all others? Right now, all tags are allowed, except <SCRIPT> to prevent JavaScript or similar diseases;)
  - I second rejecting SCRIPT, FRAME, and IFRAME, but I don't oppose all A elements. For instance, named targets (<a name="">) make it easy to move around an entry that has not yet been broken up into subsections. --Damian Yerrick

XML is the de facto standard. If we want to attract users from Slashdot, Kuro5hin, Everything, etc., we really need support for a language that feels like HTML. Perhaps we could act like h2g2.com and implement HTML support through XML, requiring all entities to be properly closed and automatically changing submitted pages to be well-formed when submitted in "XHTML mode". XML would also help solve the WYSIWYG vs. WYSIWYM problem by separating content from presentation, with multiple ways to italicize something (<em> for emphasis, <cite> for citing names of works, etc.) and to bold something (<strong> for strong emphasis, <hw> for headwords at the beginning of paragraph, etc.). Could we use some sort of filter to let users edit a page as wikistyle, as XHTML, as UBBCode, etc? --Damian Yerrick
- I disagree; implementing XML within a wiki context would be a disaster. We can add XML markup later, to more complete versions of the articles. One of the main reasons why there is this pared-down markup code is so that it is easy to use. Remember that. Your mother should feel like she can figure out how to contribute. --LMS
  - My idea was to store text as XML on the server, translate it to wikistyle or UBBCode in the edit page, and translate the submitted text back to XML. XML would also make using Wikipedia content with non-Wiki sites easier. --Damian Yerrick
    - I am going to implement different output modes (e.g., printable page [without the links], maybe RTF or PDF), so I could make an XML output function too. That could also be used for generating daily XML tarballs. But, all of this will take place after wikipedia is running on the new script. --Magnus Manske

How about a reverse query on "pages that link to this page"? And how about making the search function also search titles of pages? --Damian Yerrick
- My search function covers search in titles, but case-sensitive, due to some setting in MySQL. I'll look into a "reverse link" feature.
  - It doesn't pick up on the latest titles because somebody seems not to be running the update script too often. Reverse links would automate "see also" sections much as the other site's soft links do. Of course, the ideal would be to run it from a cron? job.

In general, see feature requests, particularly the top priorities.