How/where does the web-archive page get old and lost web page contents?
The page(s) must obviously be stored by someone _before_ they disappear.
I mean, does some person or software store randomly webpages and checks if they disappear? Or does some person save webpages on his own behalf and later contributes them to the archive?
And how do they know in advance which pages will disappear?
Greetings
Peter
How does webarchive get content
Re: How does webarchive get content
The web archive employs a program to download the pages and builtin assets (I believe those are called "spiders"). And then they have a ton of storage somewhere to store this stuff on. Obviously they don't know which sites are going to disappear. They just sample some pages by some algorithm. If they managed to sample a page you want before it got memory holed, then you are in luck. You can also somehow request a certain page be added.
Carpe diem!
- xenos
- Member
- Posts: 1119
- Joined: Thu Aug 11, 2005 11:00 pm
- Libera.chat IRC: xenos1984
- Location: Tartu, Estonia
- Contact:
Re: How does webarchive get content
One can also manually save pages with web.archive.org/save/URL.