Personal tools
You are here: Home Webarchiv
Search in NDL

Navigation
« February 2012 »
Mo Tu We Th Fr Sa Su
12345
6789101112
13141516171819
20212223242526
272829
Links
iop_only_logo

 

eu_logo

   

MKCR_logo

 

NK logo

 

logo_mzk_nove

 
Document Actions

Web Archiving

by Jan Hutař last modified 2011-10-18 09:35

The long-term preservation and access to the most fragile documents (archived Czech websites) is the second important priority of the NDK project. These documents are published solely in the digital form which makes them very dependable on the technology; they are prone to degradation and technical obsolescence. In the scope of the project we will harvest approximately 4 billions of files. 

The amounts of data

The harvested data will be stored in the Arc or WARC format, in the files of about 100 MB.  The project expects running two complete harvests of the Czech web per year. The project will produce 173TB = 1730000 file = 1572 files a day (5 years, working 220 in days) = 0.15 TB per day. Ingest of the data into the long-term preservation repository will not take place every day but in batches. The actual approach to the long-term preservation of this type of data has yet to be decided, so far the data from web archiving are only backed up, but no active preservation is in place. 

More about the technology used for the web archiving and legal conditions of the access to the archived content can be found on the portal of the project: http://webarchiv.cz/

 

webarchiv_logo

 


This site conforms to the following standards: