How to mirror Wikipedia

From WikiPaul - Paul Swanson's wiki

Jump to: navigation, search

My new apartment doesn't have Internet access, and I wanted Wikipedia. I then concocted a plan. I've set up Mediawiki before (duh), and Wikipedia has database dumps... hmm.

Contents

Make your computer a LAMP

Install

LAMP: Linux Apache MySQL PHP. Indeed. Courtesy of this Ubuntu LAMP setup guide, I came up with these installs:

apt-get update
apt-get install apache2 php5 libapache2-mod-php5 mysql-server mysql-client php5-mysql phpmyadmin 

That will run for a while. Once it finishes, point your web browser of choice toward http://localhost/ and see your /var/www/ directory in your browser!

Setup

You need to set the root password in MySQL.

$ mysql
mysql> USE mysql;
mysql> UPDATE user SET Password=PASSWORD(’new-password’) WHERE user=’root’;
mysql> FLUSH PRIVILEGES;

You also need to create a database for your incoming Wikipedia. Go to http://localhost/ and click on phpmyadmin. Log in using your new root password. Under Create new database, enter wikidb and click Create. On the new page, click on Privileges, add the new user wikiuser and click check all, then Go.

Download MediaWiki software

Go to the MediWiki download page. On the right, download the .tar.gz file (e.g., mediawiki-1.10.0.tar.gz) under Download a package and Current version. As of 5-22-07, that would be

wget http://download.wikimedia.org/mediawiki/1.10/mediawiki-1.10.0.tar.gz

Then decompress it and move it to /var/www/

tar xf mediawiki-1.10.0.tar.gz
mv mediawiki-1.10.0.tar.gz w
sudo mv w /var/www/ 

(I traditionally install it under the w directory.) Change the permissions on the w/config/ directory.

cd /var/www/w/
chmod a+x config/

Now navigate to http://localhost/w/ From here, the only things you need to put in are

  • Site name (I chose Wikipauldia)
  • WikiSysop's password
  • DB password

Now you need to move LocalSettings.php out of config.

mv config/LocalSettings.php .

Now you can go to http://localhost/w/ and see your virgin MediaWiki install!


Wikipedia's DB dump

Download it

Subscribe to pages-articles and get the link to the newest dump. As of 5-22-07, it was

wget http://download.wikimedia.org/enwiki/20070402/enwiki-20070402-pages-articles.xml.bz2

It's 2.44 GB. Divide that by however fast your connection is. Took me 5 hours or so. Great term paper topic, by the way.

Decompress it

Once that finishes, you need to decompress it.

tar xf enwiki-20070402-pages-articles.xml.bz2

Its uncompressed size is over 9 GB, so if you're using fat32 (why are you using fat32?), you're out of luck. It took me about 20 minutes.

Import it

The longest portion. Download mwimport. You can also get it from me:

wget http://modzer0.cs.uaf.edu/~dev2c/mwimport.sh
chmod mwimport.sh

Run it like this:

cat enwiki-<date>.xml | mwimport | mysql -f -u <admin name> -p <database name>

Or, assuming you used the wget links above

cat enwiki-20070402-pages-articles.xml | ./mwimport.sh | mysql -f -u <admin name> -p wikidb

This will take about 6-12 hours, depending on the speed of your hard drive, and to a lesser extent, your processor.

As a giant script

Download it here.

wget http://modzer0.cs.uaf.edu/~dev2c/makemewikipedia.sh
sudo apt-get update
sudo apt-get install apache2 php5 libapache2-mod-php5 mysql-server mysql-client php5-mysql phpmyadmin 
mkdir ~/wiki/
cd ~/wiki/
wget http://download.wikimedia.org/mediawiki/1.10/mediawiki-1.10.0.tar.gz
sudo mkdir /var/www/w/
sudo tar xzf mediawiki-1.10.0.tar.gz -C /var/www/
cd /var/www/mediawiki-1.10.0/
sudo chmod a+x config
echo "set up mysql"
echo "set up wiki"
echo "Press enter to continue"
read
sudo mv config/LocalSettings.php .
cd ~/wiki/
wget http://download.wikimedia.org/enwiki/20070402/enwiki-20070402-pages-articles.xml.bz2
tar xf enwiki-20070402-pages-articles.xml.bz2
wget http://modzer0.cs.uaf.edu/~dev2c/mwimport.sh
chmod +x mwimport.sh
echo "MySQL admin name: "
read ADMINNAME
cat enwiki-20070402-pages-articles.xml | ./mwimport.sh | mysql -f -u $ADMINNAME -p wikidb
Personal tools