Internet Archive expands OCA book digitizing effort

100,000 books now available on Internet Archive

The Internet Archive has received a grant from the Alfred P. Sloan Foundation to expand its book-digitizing efforts, which so far have resulted in the scanning of about 100,000 books now available on the group's Web site.

The grant will also benefit the Open Content Alliance, an initiative launched in October 2005 and backed by the Internet Archive, Yahoo and others to digitize books and multimedia material and make them available online, the Internet Archive announced Wednesday.

The scanned works hosted by the Internet Archive are also available for indexing by any search engine that adheres to the OCA's open-access terms for the content. These principles include providing "the greatest possible degree of access to and reuse of collections in the archive, while respecting the rights of content owners and contributors," according to the OCA Web site.

The Sloan Foundation awarded the grant to support the digitization of historical collections from five major libraries by the Internet Archive, a nonprofit organization building an online library of texts, audio, video, software and Web pages.

The US$1 million grant will be used in part to scan the complete personal library of founding father and U.S. President John Adams, housed at the Boston Public Library. Meanwhile, the Getty Research Institute in Los Angeles is making available art, architecture and performing arts books.

The archive of publications issued by New York City's Metropolitan Museum of Art will also be digitized, as well as California Gold Rush primary texts from the University of California at Berkeley's Bancroft Library. Finally, the Internet Archive will also scan the James Birney Collection of Anti-Slavery materials from Johns Hopkins University libraries in Baltimore.

Scanning books to make them available online has become a controversial practice primarily due to Google Inc.'s approach. The search engine giant is digitizing library collections that include copyright books without always asking for permission from the copyright owners. It indexes the full text of these works and makes them searchable through its Book Search service.

Google faces lawsuits alleging that this is a violation of copyright law. Google claims it is protected by the fair use principle, because it only displays snippets of text from copyright works.

The Internet Archive has refrained from digitizing copyright books, although it is interested in seeing copyright issues worked out, because its ultimate goal is to provide access to as many works as possible for the benefit of people worldwide, said Brewster Kahle, Internet Archive founder.

For example, Kahle is interested in sorting out the issue of books whose copyright owners can't be found, often called "orphan works," as well as the issue of copyright works that are out of print. In these two cases, Kahle believes that libraries should take a leading role in finding "the right path through it." In the case of in-print copyright books, a collaboration between libraries and publishers could generate significant progress, he said.

While others are criticizing Google for its wholesale scanning of copyright works, Kahle finds fault with the agreements the company is hammering out with its partner libraries. In his opinion, the contracts put too many restrictions on how libraries and people may use and share digital copies of public-domain works. "Google has bound the libraries pretty tightly," he said. "Public domain works should stay in the public domain."

For public domain books, Google provides full access and allows users to download and print them out for free, while restrictions apply to copyrighted works, a Google spokeswoman said via e-mail.

Google's library partners can use the digital copies Google gives them "to serve their students, faculties and partners," she said. Restrictions in the agreements regarding the use of the works are mutual, she added.

"Our agreements with libraries are nonexclusive, and several of our partners are working with other organizations," she said, likely referring to the University of California. "We encourage the digitization of more books by more organizations."

In addition to Yahoo and the aforementioned libraries, participants in the OCA include Microsoft, Adobe Systems, Columbia University, Hewlett-Packard, the University of Toronto, Xerox and the University of North Carolina at Chapel Hill.

Join the newsletter!

Error: Please check your email address.

More about Adobe SystemsAdobe SystemsGoogleHewlett-Packard AustraliaHISMetropolitan Museum of ArtMicrosoftVIAXeroxYahoo

Show Comments