ArchEc plind

Plind stands roughly for PayLoad INDex. In fact, it’s just about PDF payloads. Plind data comes by RePEc series, one file per series. The data is JSON. RePEc handles act as keys. The values are

b
start of the payload
f
length of the payload
F
relative file name
o
PDF status
m
according to mime type
a
it has something "%PDF" inside first 100 bytes
p
it has "PDF" in the futli, important for ftp
f
it has an URL starting with "ftp://"
r
is from a WARC resource record that contains a payload, i.e. not preceeded by a WARC metadata record, or not concurrent to another record.

The plind is accessible via public rsync. A typical command line use would be

mkdir -p plind
rsync -av rsync://archec.repec.org/plind/ plind

The ArchEc plind is maintained by Thomas Krichel. His work was supported by a €3000 subsidy from the the Banque de France Foundation through their support of the Losheim Proposal.

ArchEc is a project of the RePEc digital library for economics. ArchEc aims to provide long-term archiving of RePEc templates and full-text files.

Valid XHTML 1.0!