This DerridasMarginsWebArchive_readme.txt file was generated on 2021-05 by Rebecca Sutton Koeser

GENERAL INFORMATION

1. Title: Derrida’s Margins web archive


2. Author Information
        A. Principal Investigator Contact Information
                Name: Katie Chenoweth
                Institution: Princeton University


        B. Associate or Co-investigator Contact Information
                Name: Rebecca Sutton Koeser
                Institution: Princeton University
                Email: rkoeser@princeton.edu


3. Date of data collection (single date, range, approximate date): 2021-11; 2024-05


5. Information about funding sources that supported the collection of the data: 


Web archiving for Derrida’s Margins was supported by Princeton University’s Center for Digital Humanities.


SHARING/ACCESS INFORMATION


1. Licenses/restrictions placed on the data: Creative Commons Attribution 4.0 - CCBY4.0


2. Links to publications that cite or use the data: N/A


3. Links to other publicly accessible locations of the data: N/A


4. Links/relationships to ancillary data sets: 
https://doi.org/10.34770/2ezk-1104  Derrida’s Margins datasets are based on the same underlying content


5. Was data derived from another source? yes
        A. Chenoweth, Katie, Alexander Baron-Raiffe, and Rebecca Sutton Koeser (eds.). Derrida’s Margins, version 1.3.3. Center for Digital Humanities at Princeton, 2018. http://derridas-margins.princeton.edu. 


DATA & FILE OVERVIEW

1. File List: 
derridas-margins.wacz : web archive of public portions of Derrida’s Margins web application
derrida-20240514-v1.3.3.csv: comma separated values file urls generated by crawling Derrida’s Margins site


2. Relationship between files, if important: 
The csv file describes the same content included in the web archive package.


3. Additional related data collected that was not included in the current data package: N/A


4. Are there multiple versions of the dataset? no


METHODOLOGICAL INFORMATION

1. Description of methods used for collection/generation of data: 


The web archive files were created in 2021 with Browsertrix Crawler <https://github.com/webrecorder/browsertrix-crawler>; custom behaviors were used to capture specific interactive portions of the site. Web archives files were compiled into a single WACZ format in 2024-05 with the Python wacz library <https://github.com/webrecorder/py-wacz>.


The CSV file was generated using Caliper <https://github.com/Princeton-CDH/caliper-scrapy>, a custom web crawling tool developed by the Center for Digital Humanities at Princeton.


2. Methods for processing the data: N/A


3. Instrument- or software-specific information needed to interpret the data: 
WACZ is an open web-archive format; we recommend using WebRecorder tools to work with it https://webrecorder.net/ 


4. Standards and calibration information, if appropriate: N/A


5. Environmental/experimental conditions: N/A


6. Describe any quality-assurance procedures performed on the data: 
The web archive was manually inspected and checked against the CSV of urls; it took multiple captures and testing custom behavior to capture the project completely.


7. People involved with sample collection, processing, analysis and/or submission: 
Web archiving work was performed and validated by Kevin McElwee and Rebecca Sutton Koeser


DATA-SPECIFIC INFORMATION FOR: derrida-20240514-v1.3.3.csv

1. Number of variables: 8


2. Number of cases/rows: 10454


3. Variable List: 
url: url for an item crawled and described by the web crawler
Status_code: HTTP status code returned for the url
content_type: response Content-Type header for the url, e.g. text/html or image/png
last_modified: response Last-Modified header for the content, if returned
content_length: response Content-Length header for the content, if returned
size: size of the response as calculated by Caliper, for resources that do not provide Content-Length header (should be equivalent to content_length if set)
referrer: referring URL where Caliper found this url (not the only place referenced, just the first discovered)
timestamp: timestamp for when the url was accessed


4. Missing data codes: missing values are left empty


5. Specialized formats or other abbreviations used:  N/A