- User Since
- Jul 27 2015, 11:58 AM (435 w, 2 d)
Aug 31 2015
Ah ok Herbert! Was thinking about it, because it gives about double the speed... probably something for the future...
Aug 26 2015
CFast 2.0 is no option?
Jul 28 2015
yoyotta has the checksum read on source as option to verify data integrity which sounds to me as a good solution.
copying parallel more then one job/file is not a good solution, because then your frames are not stored physically in a sequence and you can
get problems with readspeed when you are playing them because they could be fragmented. Specially with 4k frames. I use a 12bay SAS storage on set as main storage which
provides about 1200MB/s read/write and 6bay Thunderbolt/USB3 Areca Raids which provide about 700MB/s.
With picture sequences as it is with Arri Raw or DNG etc. you have a high load on I/O operation as thats a huge bunch of single files.
An option for an in camera MXF wrapping around the raw files would be very nice!
Jul 27 2015
For example you have a shooting day where you have to deal with let's say 1,5TB of data reading the source twice costs a lot of time.
As you need to do an "optical" check to the material (hashes can not see problems with the picture) and process it then for example to an offline material as well.
So you end up by reading the source files on set at least three times with no parallel processing in the copy process that would be
a 4th time.
All copy programs I used at work including yoyotta are doing this. Think of the fact that you can have a multicamera show where you
need to handle two or three times the data I mentioned above. It could be less, but I would highly suggest to do parallel processing on the copy task.
From my perspective the copy and verification task should run as read once write to multiple destination in parallel.
I played with python a little bit and I read the files in binary mode so the source checksums could be generated in parallel
to the copy task which is essential.
The slowest destination will then set the copyspeed... I thought on a dynamic buffer size per destination write speed or a SSD as additional
buffer to compensate too big storage speeds differences.
The buffer is then written in parallel to all destinations. When the files where read back for destination checksum generation,
it would be fine if there would be a choice how much files you want to process in parallel. With picture sequences it would be nice
to process at least 4 files parallel per backup. If there is a conatinerformat in the future, processing one after one could be faster. I used md5deep/hashdeep
very often which is very fast. md5deep
After verification if there are mismatches on checksums or missing files etc. it should ask to recopy/reverify those files.
A copy report in pdf format would be also nice with or without thumbnails from the beginning, middle and end of the clip.
I uploaded a sample here
A textfile with all checksums should be stored with every backup at the destinations as the report.