If you download a grouped dataset, you will currently find four documents. There are also a number of enhancements in scope, such as including a PDF with a full set of statistics regarding each project. The grouped datasets are updated daily.
{project_name}_matched.csv
Contains the data that can be ingested directly, after the relevant spot checks are completed (see below). The criteria for a matched record are as follows:
- Two or more volunteers selected the same WorldCat record.
- Two or more volunteers entered the same shelfmark.
- No additional comments were given.
- The OCLC number is not duplicated in another task.
- The shelfmark is not duplicated in another task.
{project_name}_spot_checks.pdf
Contains a random sample of 20 cards taken from the matched dataset. Each page contains an image of the card, the associated shelfmark, a link to the WorldCat record, and some additional data.
{project_name}_not_matched.pdf
Contains those records where three people failed to locate a matching WorldCat record. This data is formatted in such a way that it can be directly ingested back into the LibCrowds system and will provide the tasks for an alternative type of project.
{project_name}_ambiguous.pdf
Contains those records that fall into neither of the previous categories. So, this will include records where people have selected different WorldCat records, the shelfmarks don't match, there are additional comments to be considered, or there are duplicates involved. This data will required further checking, which will be performed by the relevant British Library curators. The data is formatted in such a way that it can be ingested back into the LibCrowds system and will provide the tasks for a special staff project. This project will allow the curators to compare each card image against all possible matches and accept any that are appropriate. The output from this will be a further set of successful and unsuccessful matches.
We are really very close to ingesting the first new records into our database, while ensuring we have a solid process in place to reduce the time between project completion and visible results, so watch this space!
|