[HIRING] upgrade an already existing computer vision Python script

in hiring •  3 years ago  (edited)

I'm severely disabled without the use of my hands and have already worked with a developer to create a webcam system that uses a couple of neural nets to identify facial gestures and sends that data elsewhere via OSC, which allows me to use my mouse, create MIDI notes and play games. You can see an example of it in action at

This system means a lot to me and I want to try to keep improving its accuracy and reliability. Some facial gestures can be pretty hit-or-miss and there are certain edge cases in particular that cause it problems. If I could collect and annotate more training images of these edge cases I should be able to improve its performance in the future. So at the current time I would simply like to find a developer to create a new training image collection system.

The previous annotation process was time-consuming and costly and handled by the last developer. But I would like to create a system that makes capturing images and annotating them easy enough that even a cripple like me can do it in my spare time. That way I can continue to generate training data that should hammer out any inconsistencies I identify as I come across them, indefinitely into the future.

I generally leave this gesture recognition program running on my computer at all times, so I think the best thing to do would be to augment it with this annotation system I have in mind. Anytime I feel the gesture recognition is misbehaving and there's something useful I could teach it, I would like to be able to hit a hotkey and have it begin taking webcam images once or twice per second. Hitting the hotkey again would stop this.

Then later on, I could enter a different mode of the program and it would open up the images it had captured and allow me to annotate them. I'm imagining the UI would look a lot like the viewer window in the video above, except it would only present static images. The annotations themselves would appear on the images similar to the way the dots and words already do in the viewer window.

There are 3 eye gestures the machine learning model is trained to recognize (left wink, right wink and left brow raise), as well as 3 mouth gestures (tongue out, pucker lips and tongue in cheek). So if I could simply hit a corresponding hotkey for each of them if they are present in the image, that would probably be nice and easy for me.

There is also an "intensity" score (0-1) associated with these gestures, which denotes how fully they are being expressed. After hitting a hotkey, perhaps a slider of some kind could be used to choose the desired intensity?

There are also 4 points on screen that need to be identified: left eye, right eye, tongue and lips. I use an eye tracker to move my mouse on screen, so I can do that pretty freely. I could then click or hit a hotkey to mark the location of the mouse pointer with the correlating annotation. I'm told by the previous developer that these locations need to be pretty precise, so after marking them I'm thinking I could use the arrow keys to adjust their positions pixel by pixel if necessary.

These are just my ideas right now about what would be easiest for me to accomplish, so we should discuss further what would be feasible and ideal.

The annotations are stored in CSV files so that is how all the selected annotations should be saved. I can show you examples of the previous files so you know exactly what the format used in them is.

Another important thing to note is that, since the current gesture recognition is pretty good, I think it would be essential in streamlining the annotation process to have the captured images already annotated by default with what the system initially had guessed. Then I would simply need to adjust any inaccuracies in it. So when the images are captured it would need to be done either through the existing viewer window (seen in the video above) or by otherwise consulting the machine learning model to generate the default annotations.

You can look over all the existing code at https://github.com/ahpalmerUNR/mouthMusicStreamer.


We can discuss an appropriate deadline for the project but I am most concerned simply that we stay in constant communication about its progress.

All code will need to be clean, organized and very well-commented. Please contact me if you have any questions or would like clarification about anything! If you see a better way to do something than what I'm suggesting, please bring it up!

Authors get paid when people like you upvote their post.
If you enjoyed what you read here, create your account today and start earning FREE STEEM!