It could be more exact than Google for certain cases because the aproach is different. I calculate a imagehash(p-hash in my case) like tineye. Googles aproach is not open for the public to know, but probably uses machine Learning and pattern matching. From my own experience the algorithm im using is very fast (approx 1s-2s on a 1ghz single core cpu for one hash+ approx 0.2s for listing similar hashes from a database containing approx 0.5 million hashes, the database is subject to change and is missing many pictures from steemit). But the downside is that i can only find identical pictures and slightly edited pictures, whereas Google can is very good at finding similar pictures, due to machine Learning. Note that i do this project just for fun for me to learn database handling, pictureprocessing and multiprocessing.
RE: Some limitations that I probably should have mentioned.
You are viewing a single comment's thread from:
Some limitations that I probably should have mentioned.
It sounds like a great project. I didn't mean to criticize you, just wondered about the details. Even if it is not growing into something big, it will still be a great project to work on and learn from.
Finding similar images would be key though, as people tend to adjust 'stolen' images a little to make them look their own.
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
Well it works to a certain extent. Atm the Problem is the database structure, because the hashes are saved to sql where i only can check if they are exactly the same. To look for similar i would need to compute the hamming distance which is very slow because I need to compute it for every other hash in the database, which would be very slow. Therefore i need to expirement with b-trees.
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
Not sure if it is possible as I have not tried anything related before. But if you could just save the middle of the image somehow, you might be able to make a good comparison.
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
Well to produce a hash images are scaled down to a picture with a Pixel amount from the Power of 2 (64 pixels being the smallest with good results). Before and After resizing certain Operations are applied to get better results. Depending on the Operations used the accuracy and time to compute changes. Sample operations are, convert to grayscale, Discrete wavelet transform, Discrete cosine transform, etc. There is a Python libary that i am using : https://github.com/JohannesBuchner/imagehash
On that github repro are also links to webpages on how they work and how effective they are.
Due to these algorithims slight changes like jpeg compression artifacts, rescaling, slight cropping do not affect the hash that much. Cropping does still affect the hash the most ill try out if your Idea or similar techniques work.
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
Keep in mind I am just thinking out loud there. My idea is that changes made to an image mostly happen at the top and bottom. If it is possible to just check some area in the middle you could find equal and adjusted versions
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
Jep im just trying to explain how it works. I will test this on my database when i have time. At the moment I am waiting for New Hardware to arrive so that it will run a little bit faster (6 Times as fast) because at the moment i am using a raspberry pi 2 model b as a 24/7 server and That's just not strong enough for database + image hashing.
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit