How to mirror Wikipedia
From WikiPaul - Paul Swanson's wiki
My new apartment doesn't have Internet access, and I wanted Wikipedia. I then concocted a plan. I've set up Mediawiki before (duh), and Wikipedia has database dumps... hmm.
Contents |
Make your computer a LAMP
Install
LAMP: Linux Apache MySQL PHP. Indeed. Courtesy of this Ubuntu LAMP setup guide, I came up with these installs:
apt-get update apt-get install apache2 php5 libapache2-mod-php5 mysql-server mysql-client php5-mysql phpmyadmin
That will run for a while. Once it finishes, point your web browser of choice toward http://localhost/ and see your /var/www/ directory in your browser!
Setup
You need to set the root password in MySQL.
$ mysql mysql> USE mysql; mysql> UPDATE user SET Password=PASSWORD(’new-password’) WHERE user=’root’; mysql> FLUSH PRIVILEGES;
You also need to create a database for your incoming Wikipedia. Go to http://localhost/ and click on phpmyadmin. Log in using your new root password. Under Create new database, enter wikidb and click Create. On the new page, click on Privileges, add the new user wikiuser and click check all, then Go.
Download MediaWiki software
Go to the MediWiki download page. On the right, download the .tar.gz file (e.g., mediawiki-1.10.0.tar.gz) under Download a package and Current version. As of 5-22-07, that would be
wget http://download.wikimedia.org/mediawiki/1.10/mediawiki-1.10.0.tar.gz
Then decompress it and move it to /var/www/
tar xf mediawiki-1.10.0.tar.gz mv mediawiki-1.10.0.tar.gz w sudo mv w /var/www/
(I traditionally install it under the w directory.)
Change the permissions on the w/config/ directory.
cd /var/www/w/ chmod a+x config/
Now navigate to http://localhost/w/ From here, the only things you need to put in are
- Site name (I chose Wikipauldia)
- WikiSysop's password
- DB password
Now you need to move LocalSettings.php out of config.
mv config/LocalSettings.php .
Now you can go to http://localhost/w/ and see your virgin MediaWiki install!
Wikipedia's DB dump
Download it
Subscribe to pages-articles and get the link to the newest dump. As of 5-22-07, it was
wget http://download.wikimedia.org/enwiki/20070402/enwiki-20070402-pages-articles.xml.bz2
It's 2.44 GB. Divide that by however fast your connection is. Took me 5 hours or so. Great term paper topic, by the way.
Decompress it
Once that finishes, you need to decompress it.
tar xf enwiki-20070402-pages-articles.xml.bz2
Its uncompressed size is over 9 GB, so if you're using fat32 (why are you using fat32?), you're out of luck. It took me about 20 minutes.
Import it
The longest portion. Download mwimport. You can also get it from me:
wget http://modzer0.cs.uaf.edu/~dev2c/mwimport.sh chmod mwimport.sh
Run it like this:
cat enwiki-<date>.xml | mwimport | mysql -f -u <admin name> -p <database name>
Or, assuming you used the wget links above
cat enwiki-20070402-pages-articles.xml | ./mwimport.sh | mysql -f -u <admin name> -p wikidb
This will take about 6-12 hours, depending on the speed of your hard drive, and to a lesser extent, your processor.
As a giant script
Download it here.
wget http://modzer0.cs.uaf.edu/~dev2c/makemewikipedia.sh
sudo apt-get update sudo apt-get install apache2 php5 libapache2-mod-php5 mysql-server mysql-client php5-mysql phpmyadmin mkdir ~/wiki/ cd ~/wiki/ wget http://download.wikimedia.org/mediawiki/1.10/mediawiki-1.10.0.tar.gz sudo mkdir /var/www/w/ sudo tar xzf mediawiki-1.10.0.tar.gz -C /var/www/ cd /var/www/mediawiki-1.10.0/ sudo chmod a+x config echo "set up mysql" echo "set up wiki" echo "Press enter to continue" read sudo mv config/LocalSettings.php . cd ~/wiki/ wget http://download.wikimedia.org/enwiki/20070402/enwiki-20070402-pages-articles.xml.bz2 tar xf enwiki-20070402-pages-articles.xml.bz2 wget http://modzer0.cs.uaf.edu/~dev2c/mwimport.sh chmod +x mwimport.sh echo "MySQL admin name: " read ADMINNAME cat enwiki-20070402-pages-articles.xml | ./mwimport.sh | mysql -f -u $ADMINNAME -p wikidb
Categories: IT | Computers | Scripts
BlogMarks
del.icio.us
digg
Fark
Furl
Newsvine
reddit
Segnalo
Simpy
Slashdot
smarking
Spurl
Wists