chromelaha.blogg.se - Dupegurupe

make checksum of pure image data (or extract histogram - same images should have the same histogram) - not sure about this.

use Image::ExifTool script for collecting duplicate image data based on image-creation date, and camera model (maybe other exif data too).

delete images only at the end of workflow.

Can use FreeBSD/Linux utilities directly on the server and over the network can use OS X (but working with 600GB over the LAN not the fastest way). I'm able make complex scripts is BASH and "+-" :) know perl.

is here already any algorithm available in a unix command form or perl module (XS?) what i can use to detect these special "duplicates"?.

"enchanced" versions of the originals (from some photo manipulation programs).

how to find "similar" images, what are only the.

What perl module can extract the "pure" image data from an JPG file what is usable for comparison/checksuming?.

This is (i hope) not very complicated - but need some direction.

(therefore file checksuming doesn't works, but image checksuming could.). how to find duplicates withg checksuming only the "pure image bytes" in a JPG without exif/IPTC and like meta informations? So, want filter out the photo-duplicates, what are different only with exif tags, but the image is the same.or they are the "enhanced" versions of originals, etc.or they are only a resized versions of the original image.photos what are different only with exif/iptc data added by some photo management software, but the image is the same (or at least "looks as same" and have the same dimensions).This helped a lot, but here are still MANY MANY duplicates: collected duplicated images (same size + same md5 = duplicate).searched the the tree for the same size files (fast) and make md5 checksum for those.Photos comes from family computers, from several partial backups to different external USB HDDs, reconstructed images from disk disasters, from different photo manipulation softwares (iPhoto, Picassa, HP and many others :( ) in several deep subdirectories - shortly = TERRIBLE MESS with many duplicates.

Having approximately 600GB of photos collected over 13 years - now stored on freebsd zfs/server.