Warning

 

Close

Confirm Action

Are you sure you wish to do this?

Confirm Cancel
BCM
User Panel

Site Notices
Page / 2
Next Page Arrow Left
Link Posted: 8/11/2011 1:41:41 PM EDT
[#1]
Quoted:
"Trust, but verify" then?

I use it from time to time-but I have found inaccuracies after chasing a topic to several other reputable sources. But it is good for a quick-and-dirty WTF is this answer.


That's pretty much my policy.  It's an okay place to start looking into a topic and develop leads for further research.
Link Posted: 8/11/2011 1:46:21 PM EDT
[#2]
Quoted:
Most linux distro also have a command/app called "wget". If you wget http://wikipedia.com it will w-and get EVERYTHING in the specified folder and below. Haven't used it in years, but if you point it at the 'root' of a domain.......shit.....hurry up and wait. Just something else to throw out there for y'all. Also you could do a search on google for 'web page download or caching' and it should get you a few more to go on. These are popular with people who want to 'mirror' a website for hosting.

RTFM.  
Wikipedia has explicit measures in place to prevent a recursive wget as you describe.  
you'll end up rate-limited first and then with your IP blocked next.
http://en.wikipedia.org/wiki/Wikipedia:Database_download#Please_do_not_use_a_web_crawler

ar-jedi
Link Posted: 8/12/2011 11:44:02 AM EDT
[#3]
I've downloaded the wiki tarball 6 times. All have had problems uncompressing. We have a 64Gbit pipe here, so the connection doesn't seem to be the problem. I think they have posted a broken tarball.
Link Posted: 8/12/2011 11:53:16 AM EDT
[#4]
Quoted:
I've downloaded the wiki tarball 6 times.

full link to what you are downloading?

Quoted:
All have had problems uncompressing.

what application are you uncompressing it with?

Quoted:
We have a 64Gbit pipe here, so the connection doesn't seem to be the problem.

i can say with reasonable certainty (i work in this field) that your internet connection is not 64Gbit/s.  

ar-jedi

Link Posted: 8/12/2011 12:02:03 PM EDT
[#5]
Yeah, sorry. Its 64Mbit. We all make mistakes. I have used winrar, winzip. plus the one inside WikiTaxi. I am using http://download.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2, the link  you posted.

WinRar says !   H:\WiKi\enwiki-latest-pages-articles.xml.bz2: CRC failed in H:\WiKi\enwiki-latest-pages-articles.xml.bz2. The file is corrupt

File takes 22 minutes to get. Size is 7.387 GB.
Link Posted: 8/12/2011 12:10:32 PM EDT
[#6]
Quoted:
I am using http://download.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2, the link  you posted.

after download, what is the size of the resultant file on your system?

ps:
note that only the more recent versions of winzip and winrar will handle bzip'd files.  
nevertheless, if WikiTaxi_Importer is complaining, the file is broken anyway.

ar-jedi



Link Posted: 8/12/2011 12:11:39 PM EDT
[#7]
same size as yours. That's OK man, I'm sure there is no way they would care if their file was corrupt. I'll get it someday later, when they've replaced it. Thanks, man.
Link Posted: 8/12/2011 12:12:42 PM EDT
[#8]
Quoted:
same size as yours.

what OS and version?

Link Posted: 8/12/2011 12:18:20 PM EDT
[#9]
Quoted:
Quoted:
Most linux distro also have a command/app called "wget". If you wget http://wikipedia.com it will w-and get EVERYTHING in the specified folder and below. Haven't used it in years, but if you point it at the 'root' of a domain.......shit.....hurry up and wait. Just something else to throw out there for y'all. Also you could do a search on google for 'web page download or caching' and it should get you a few more to go on. These are popular with people who want to 'mirror' a website for hosting.

RTFM.  
Wikipedia has explicit measures in place to prevent a recursive wget as you describe.  
you'll end up rate-limited first and then with your IP blocked next.
http://en.wikipedia.org/wiki/Wikipedia:Database_download#Please_do_not_use_a_web_crawler

ar-jedi

Sorry....I was mainly pointed out the usefulness of WGET. I actuall haven't used it in quite a few years, so I am a little rusty on it...I don't use the wiki enough to read through the tech pages. I guess Scrapbook wouldn't work on it either. fukit. I'll do my homework better next time.
Page / 2
Next Page Arrow Left
Close Join Our Mail List to Stay Up To Date! Win a FREE Membership!

Sign up for the ARFCOM weekly newsletter and be entered to win a free ARFCOM membership. One new winner* is announced every week!

You will receive an email every Friday morning featuring the latest chatter from the hottest topics, breaking news surrounding legislation, as well as exclusive deals only available to ARFCOM email subscribers.


By signing up you agree to our User Agreement. *Must have a registered ARFCOM account to win.
Top Top