Cloudflare and the Internet Archive (aka Wayback Machine) announced a partnership that will help the Internet achieve greater reliability. The Internet Archive obtains more copies of websites while Cloudflare benefits from having sites backed up to the Internet Archive.
Publishers will also benefit from the arrangement between Cloudflare and the Internet Archive, even publishers who don’t use Cloudflare.
Internet Archive Partners with Cloudflare
The partnership between the Internet Archive and Cloudflare is symbiotic. Symbiotic means two different organisms, people or groups that form a beneficial relationship.
What makes this relationship beneficial is that Cloudflare customers gain increased reliability while the Internet Archive gains access to more data.
The Internet Archive, also known as the Wayback Machine, is a 501(c)(3) non-profit dedicated to keeping snapshots of the Internet. Even if a website disappears, users can still locate their favorite content as it existed years before.
The Internet Archive is whimsically nicknamed after a 1960’s cartoon about a dog (Mr. Peabody) and his human boy (Sherman) who time traveled in a Wayback Machine. It was founded in 1996 and currently contains over 20 years of web documents.
While the Internet Archive also saves other kinds of documents like books, videos and radio shows, it’s main focus remains preserving and archiving a snapshot of the Internet.
The Internet Archive is currently archiving over 330 billion web pages.
Cloudflare is a security and content delivery network that helps websites speed up delivery of web pages and to mitigate the effects of distributed denial-of-service (DDoS ) attacks.
Cloudflare hosts their customer’s static content on servers around the world so that site visitors receive their content from a server that is located close to them. Dynamic content is retrieved from the original web host server.
How Internet Archive and Cloudflare Partnership Works
Cloudflare has a feature called, Always Online. Always Online is a crawler that crawls customers websites once ever three or seven days a week.
The purpose of the crawl is to download a copy of the site and create a cache that can be used in the event that the website goes offline.
- Free customers are crawled once a week.
- Pro customers are crawled every three days.
The Internet Archive will receive a copy of the web pages that are crawled by Cloudflare. This benefits the Internet Archive because now it doesn’t need to use resources to crawl Cloudflare customer websites.
Cloudflare will benefit because the partnership allows the snapshots stored at the Internet Archive to be used if a host website should go down.
It’s a win-win-win for the Internet Archive, Cloudflare and their customers and to site visitors wishing to visit a site that might be temporarily down.
According to the announcement:
“”The Internet Archive’s Wayback Machine has an impressive infrastructure that can archive the web at scale,” said Matthew Prince, co-founder and CEO of Cloudflare. “By working together, we can take another step toward making the Internet more resilient by stopping server issues for our customers and in turn from interrupting businesses and users online.””
This partnership means that more web pages will be archived than ever before. New sites that have never been archived will be included in the Internet Archive.
This is how it was explained:
“An additional source of URLs we will preserve now originates from customers of Cloudflare’s Always Online service. As new URLs are added to sites that use that service they are submitted for archiving to the Wayback Machine. In some cases this will be the first time a URL will be seen by our system and result in a “First Archive” event.
In all cases those archived URLs will be available to anyone who uses the Wayback Machine.”
All Web Publishers Benefit
The Internet Archive has been a useful resource for web publishers for over twenty years. It comes in handy when there’s a dispute about which website is the first publisher of specific content.
This is useful for Digital Millennium Copyright Act (DMCA) disputes because it makes it easier to determine what website published the content first.
Websites remove content and later on decide that maybe that content needs to be online again. The Internet Archive is useful as a way to go back in time and recover previous content that was previously published but lost and then to recover it for publication.
Sites that have been hacked that don’t have a back up can at least resort to the Internet Archive.
This partnership with Cloudflare makes more content accessible for storage. That’s a win for site visitors, publishers, Cloudflare and the Internet Archive.
Cloudflare and the Wayback Machine, Joining Forces for a More Reliable Web