RE: Some limitations that I probably should have mentioned.

You are viewing a single comment's thread from:

Some limitations that I probably should have mentioned.

in photomag •  7 years ago 

It sounds like a great project. I didn't mean to criticize you, just wondered about the details. Even if it is not growing into something big, it will still be a great project to work on and learn from.

Finding similar images would be key though, as people tend to adjust 'stolen' images a little to make them look their own.

Authors get paid when people like you upvote their post.
If you enjoyed what you read here, create your account today and start earning FREE STEEM!
Sort Order:  

Well it works to a certain extent. Atm the Problem is the database structure, because the hashes are saved to sql where i only can check if they are exactly the same. To look for similar i would need to compute the hamming distance which is very slow because I need to compute it for every other hash in the database, which would be very slow. Therefore i need to expirement with b-trees.

Not sure if it is possible as I have not tried anything related before. But if you could just save the middle of the image somehow, you might be able to make a good comparison.

Well to produce a hash images are scaled down to a picture with a Pixel amount from the Power of 2 (64 pixels being the smallest with good results). Before and After resizing certain Operations are applied to get better results. Depending on the Operations used the accuracy and time to compute changes. Sample operations are, convert to grayscale, Discrete wavelet transform, Discrete cosine transform, etc. There is a Python libary that i am using : https://github.com/JohannesBuchner/imagehash
On that github repro are also links to webpages on how they work and how effective they are.
Due to these algorithims slight changes like jpeg compression artifacts, rescaling, slight cropping do not affect the hash that much. Cropping does still affect the hash the most ill try out if your Idea or similar techniques work.

Keep in mind I am just thinking out loud there. My idea is that changes made to an image mostly happen at the top and bottom. If it is possible to just check some area in the middle you could find equal and adjusted versions

Jep im just trying to explain how it works. I will test this on my database when i have time. At the moment I am waiting for New Hardware to arrive so that it will run a little bit faster (6 Times as fast) because at the moment i am using a raspberry pi 2 model b as a 24/7 server and That's just not strong enough for database + image hashing.