Using the IA Client

From Digitize the Planet
Jump to: navigation, search

The IA Client as Best Tool for Uploading

Besides the use of the uploading page at the Internet Archive, there is also an S3 interface, whose use may be slightly non-intuitive, but which is the best choice for scripted or batch uploads into the Archive's collections. Luckily, there has been extensive work done on a client for the Internet Archive's S3-like/Metadata API interfaces, which translates to extremely easy uploading and scripting of uploads. If you are uploading more than a few dozen items, and especially if you expect to be doing regular uploads going forward, learning the use of the IA client will save you a lot of time, pressure and effort.

This page is meant to give a quick introduction and ramp-up in the use of the IA client. There is a formal documentation page being maintained as well.

Some Basic Concepts

It helps to understand some of the aspects of working with the Internet Archive.

  • To upload items to the Internet Archive, you need a "Library Card" (user account). On the upper right side of the [archive.org archive.org] site is a link for registering if you don't already have an account.
  • An item is a single digital object or set of objects, representing a single "thing" in the site. For example, you might have an issue of a magazine, or a podcast's mp3 file, or a .avi file of a home movie.
  • While the Internet Archive has various ways to deal with these items, it's best if each item has a "thing". If you put a bunch of "things" in one item (like an entire run of a newsletter, or the entire year of a podcast), it will work the best it can, but it becomes slightly harder to find materials going forward.
  • "Collections" are sets of these "items". A collection can only be created by an administrator at the Archive. That said, there are a set of open, world-access collections at the archive, with names like movies, opensource, and so on.
  • Every item has an identifier (also called an item name), which is a unique name (up to 60 characters) for the item. This is the permanent location it will hold at the Archive. Some examples of item names might be 2016-01-10-green-newsletter or WhenThingsGoWrongBook1998. Choose an identifier that is informative without being too hard to remember, although sometimes that's unavoidable for large-scale upload operations.
  • When you upload something to the Archive, you are likely the last person who will do any describing or curation of the item. As a result, please, please do your absolute best to put as much context and description into everything you upload. Otherwise, it becomes hard to find things.

A Special Note About The Archive's Post-Upload Work:

Once your item is uploaded, scripts will do a lot of testing of the item, trying to pull out information, convert to other open formats, and generate thumbnails. This takes some time, so you might upload an item and then find a thumbnail shows up 15 minutes later, or an hour, or a half a day. It all depends on how things are being uploaded. This is probably the most non-intuitive aspect of the Archive - things don't instantly happen. Once your item is up, the work will continue over time, even though you can see the item in the stacks.

Prerequisites and Pre-Flight Check

  • If your item or items are something you acquired from elsewhere, please take the time to search the Internet Archive and make sure it isn't already there. Hundreds of gigabytes of items are uploaded every day to the Archive and it is not unusual for multiple people to hear of a new digital offering and reflexively transfer a copy into the Archive. If you are uploading an item or items you created yourself, thank you. Contributions from Internet Archive's users are a major source of excellent material.
  • If you are intending to upload a large amount of items and know they will want a collection (a good rule of thumb is "more than 50 items"), please mail the Internet Archive with a request for the collection to be made. You can upload to the general collections in the meantime and then ask them to be moved to the built collection later.
  • The IA Client requires the use of Python, and a unix-like operating system. (Linux, FreeBSD, Ubuntu, etc.) It does not have a graphical interface.

Acquiring and Installing the IA Client

As described in the documentation, you should be able to install as root with this command:

pip install internetarchive

After the program installs, you should have a new command at your disposal:

ia --help

This gives you a quick overview of the commands of the ia client.

You will then want to set the credentials on your ia client so you can upload to the Internet Archive with your account.

ia configure

It will ask you for your e-mail address and password, and then acquire the S3 keys for the client. You can then use these keys indefinitely; you will not have to configure the client again. (Unless you remove your keys or otherwise change your Internet Archive account.)

A Quick Set of Upload Commands

What follows are some basic recipes for adding items to the archive with this client.

Uploading a single PDF file called lettermagazine_1996-01.pdf and setting the title, description, date, subject list, mediatype, and collection.

ia upload 1996_01_letter_magazine lettermagazine_1996-01.pdf -m "title:Letter Magazine January 1996" \
-m "description:The January 1996 issue of letter magazine. Includes stories on better telegram writing, \
a roundup of the 1995 Winter Letters Conference, and the 1995 Better Letter Awards." -m "date:1996-01" \
-m "mediatype:texts" -m "collection:opensource" -m "subject:Letter Magazine;Magazine;Writing;Periodicals;Conferences"

Uploading all the *.mp3 files in the current directory into an item, and setting the title, description, date, subject list, mediatype, and collection.

ia upload 2000_musician_roundup_austin_tx *.mp3 -m "title:The 2000 Musician Roundup, Austin, TX." -m "date:2000-05-13" \
-m "description:A set of performances for the 2000 Musician Roundup in Austin, TX. Held on May 13, 2000. Recorded from \
the mezzanine area." -m "subject:Austin;Music;Musician Roundup;Texas;Concert;Songs" -m "collection:audio"

Better Explanation of the Upload Command

THE UPLOAD COMMAND

ia upload <itemname> <files> <settings>

The client can upload a single file or a set of files; simply use a wildcard if you want to upload multiple items. The most relevant setting is the metadata setting, which allows you to set information pairs on your uploaded item. The format is the metadata name and the data, separated by a colon.

UNDERSTANDING METADATA PAIRS

There are a number of metadata pairs you should set, depending on if you have the information about the item. They all come in the format of

-m "key:data"

Here is a quick list of the most vital (and also suggested) metadata pairs you can set.

  • title The title of your item.
  • date The date your item comes from, as best as you can manage.
  • description A description of your item, including contents and context. Be as complete as possible.
  • mediatype There are a limited number of mediatypes. They include texts, audio, movies, software, and data.
  • collection If you have requested a collection, you would put it here. Otherwise, default to opensource.

Bear in mind that you can make custom metadata pairs as well - as long as the key name is not otherwise used by the Archive's system. For example, if you know the reading length of a book at standard reading speed and want that in every item you upload, you could add -m "readingspeed:3m" and it would set that pair as well.

THE METADATA COMMAND

After something is uploaded, if you want to made additional changes to any metadata settings in the item, you can do so with the metadata command.

ia metadata <itemname> -m "key:data"

This allows you to fix typographical errors, add additional layers of metadata, and so on.

Links

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox