How
the Internet Archive |
What's 404 Fury, you ask?
That's when you click on a content link - say, to a report title or to a specific directorate in a government department) - that you've saved or found on Canadian Social Research Links or a similar website. On the next page, you see "404 Error - Page not found" page. "ARGH", you might say, "I sure wish I'd saved that report before it got vapourized." I say that all the time, along with a few choice colourful phrases. Governments keep changing their sites, almost as if they're deliberately trying to confuse/frustrate people who are looking for information.
How
to beat 404 Fury?
Wayback machine to the Rescue!
The
Wayback Machine - Archive.org (Internet Archive)
"Browse through
billions and billions of web pages archived from 1996 to a few months ago. To
start surfing the Wayback Machine, type in the web address of a site or page where
you would like to start, and press enter. Then select from the archived dates
available. "
The Wayback Machine lets you revisit/recreate
bygone versions of sites and extensive archival content of websites and web pages.
Paste a URL in the box on the home page, and the Wayback machine will retrieve
as many copies of that page as it has archived. If
you paste http://www.canadiansocialresearch.net/ into the Wayback Machine,
for example, you'll see (as at April 19/09) links to 211 separate versions
of this entire website (not just the home page) going right back to December
2000, two months after I purchased my own domain name ("canadiansocialresearch.net").
Use the Wayback Machine to access other sites right back to 1996...
Play with
it - you don't have to register or anything, and you can't break it.
And don't
miss the special collections of historical links!
What
does the Wayback Machine look like? (photo)
The Wayback Machine contains
(as of April 2009) 150 billion archived pages on a 20' by 8' by 8' box that sits
in Santa Clara, courtesy of Sun Microcomputer.
(Click the link to see the
actual physical Wayback Machine - it looks like one of those translatlantic shipping
containers.)
It serves about 500 queries per second from the approximately
4.5 Petabytes (4.5 million gigabytes) of archived web data.
Where
does the name Wayback Machine originate?
From the Rocky
and Bullwinkle Show (a Saturday morning cartoon show from the 1960s) - it's
the name of Mister Peabody's time-travel machine.
| Here's a practical example of how Archive.org works. |
In 2005, the Ontario Ministry of Community and Social Services created a page to celebrate its 75th anniversary. The page, which included some very interesting historical articles on welfare, was summarily deleted a year or so later, because, well, because the 75th anniversary had come and gone, and who cares about how welfare operated in Ontario in 1915 or 1920. Not the Ontario government webmaster, apparently.
Ministry
of Community and Social Services:
Supporting Ontario's communities since 1930
The
year 2005 was the 75th anniversary of the Ontario Ministry of Community and Social
Services. To mark the occasion, the Ministry posted to its website a collection
of six historical factoids and vignettes about welfare as it existed in the first
quarter of the 20th century and even before. When I checked the link in the summer
of 2007, not only had the page disappeared from the MCSS website, but the above
URL now (still in 2009) takes the cyber-visitor to "Thriving Communities",
the ministry's framework for a contemporary approach to supporting Ontarians.
That's all well and good, but six historical accounts of welfare in Ontario were
simply discarded like yesterday's trash, without so much as a "does-anybody-even-care-about-history-out-there"
warning.
Solution:
I went to Archive.org
and copied the URL of the Ministry into the Wayback Machine (the text box near
the top of the page). Then, on the Archive.org results page, I selected the link
to the October 2004 site snapshot. Then, on the archived MCSS home page that appeared,
I simply clicked on the 75th anniversary button and found the "missing"
page and all its secondary links, all live.
Here's
the URL of the archived copy of this page from Archive.org:
http://web.archive.org/web/20050518172022/www.mcss.gov.on.ca/CFCS/en/Celebrating75Years/default.htm
TIP
: scroll down to "Stories from our Past" for links [you have
to click on the word "more" in each case] to the following six short
historical bits about welfare and social services in Ontario in the last century:
*
Origins of the welfare department (1930)
* Breaking 650 lbs. of rocks to qualify
for welfare in 1915
* houses of refuge
* the Mothers' Allowance Act (1920)
* the first foray into the field of day care in the mid-40s
* the Soldier's
Aid Commission (est. 1915).
TIP: you can use this
same technique to retrieve many (but sadly, not all) "404" pages that
have disappeared from the Web.
Sites that are database driven, generate dynamic
web pages or have robots.txt exclusions can't be archived.
(Long Live HTML!!)
| Put
the Wayback Machine right in your browser: The Wayback Machine Bookmarklet |
Drag
this link up to your browser's Links or Bookmarks bar:
Wayback
When
you're on a web page and you want to find an older version of that page,
just
click the toolbar link ---you'll be transported to any existing archived versions
in the Wayback Machine.
| More info about The Internet Archive from Wikipedia |
The
Internet Archive (IA) - from Wikipedia:
"The
Internet Archive (IA) [also called the "Wayback Machine"] is a nonprofit
organization dedicated to building and maintaining a free and openly accessible
online digital library, including an archive of the Web. With offices and data
centers located in California, the archive includes snapshots of the World Wide
Web - archived copies of pages, taken at various points in time, along with software,
movies, books, and audio recordings. To ensure the stability and endurance of
the Internet Archive, its collection is mirrored at the Bibliotheca Alexandrina
in Egypt. The IA makes its collections available at no cost to researchers, historians,
scholars, and the general public. It is a member of the American Library Association
and is officially recognized by the State of California as a library."
| The Government of Canada Web Archive |
Government
of Canada Web Archive:
http://www.collectionscanada.gc.ca/webarchives/index-e.html
Since
the Fall of 2007, Library and Archives Canada has been harvesting the web domain
of the Federal Government of Canada (starting in December 2005).Client access
to the content of the Government of Canada Web Archive is provided through searching
by keyword, by department name, and by URL. At the time of its launch in Fall
2007, approximately 100 million digital objects (over 4 terabytes) of archived
Federal Government website data was made accessible via the LAC website. The GC
WA currently contains over 170 million digital objects and more than 7 terabytes
of data.
Source:
Library
and Archives Canada
Comments:
1. This site is definitely worth closer examination if you're looking for a federal government report or other resource that has disappeared from the Internet since early 2006. As the blurb above states, you can search through superseded versions of federal websites by keyword, department name or URL. I highly recommend that you consider using both the Government of Canada Web Archive and the Internet Archive as complementary tools; the former contains only three years' worth of digital objects (reports, tables, etc.), whereas the Internet Archive's "Wayback Machine" contains digital objects going right back to 1996.
2.
The Canadian government archive a spring chicken and a lightweight compared to
the Internet Archive.
To put everything into perspective, the government archive
only goes back to the end of 2005, and it includes *only* sites that belong to
the Government of Canada. As per the above blurb, it currently (in 2009) contains
"over 170 million digital objects and more than 7 terabytes of data".
According to Wikipedia (see the article above), "As of April 2009, the Wayback
Machine contained about 4.5 Petabytes (4.5 million gigabytes) of archived web
data, and it was growing at a rate of 100 terabytes per
month."
[Snarky factoid:
The Canadian Government site boasts of "more
than 7 terabytes of data", which is about the average size of the home collection
of real audiophiles and video collectors.]
| PAGE D'ACCUEIL - SITES DE RECHERCHE SOCIALE AU CANADA |
| TIP:
How to Search for a Word or Expression on a Single Web Page Open any web page in your browser, then hold down the Control ("Ctrl") key on your keyboard and type the letter F to open a "Find" window. Type or paste in a key word or expression and hit Enter - your browser will go directly to the first occurrence of that word (or those exact words, as the case may be). To continue searching using the same keyword(s) throughout the rest of the page, keep clicking on the FIND NEXT button. Try it. It's a great time-saver! |