Computer Vision Game basic components

Last week I came up with an idea: to make a computer vision game. I had the part made with pygame (python library for making games) and I integrated tracking of the movements of the body with OpenCV. For the tracking of the hand, which will be the joypad of our game, I used MediaPipe.

MediaPipe is a library created by google ready to use and for multiple functions, for our project, just a few commands are enough and we manage to have 21 points of the hand. If you want more details on hand tracking I suggest you read the official MediaPipe Hands guide

Computer Vision Game MediaPipe hand

I have already done a project with MediaPipe and maybe it could help you with the development of new ideas: Facial Landmarks Detection | with Opencv, Mediapipe and Python

In this tutorial we will see how to build the game step by step and how to integrate it with our computer vision project.

Create the game with pygame

The game consists of mosquitoes and bees flying in the center of the screen. Every time you swat a mosquito you get 1 point and every time you swat a bee you lose 1 point. Everything must be done within a certain number of seconds. In the second block, we will see how to integrate Pygame with MediaPipe to have our computer vision Game. This is the image of the result.

Computer Vision Game Example

Install Pygame

As a first step, we have to install the Pygame library and it is very simple, just this command from the terminal.

pip install pygame

Create Main File of computer vision game

I invite you to download all the necessary files and libraries through the download link at the bottom of the page because we will use these as the basis of the explanation of the components of the game.
There are the main startup functions of pygame and this is the page to use to start the game. If you just want to change some settings open the file
This is the file to manage the events of the game and it includes in addition to various libraries, also the files for the management of the hand such as a cursor,, and

I put rectangles on each element to better explain the concept. When two rectangles overlap, that of the hand and the mosquito or bee, an event is associated.

Computer vision game square

Add hand as controller (Computer Vision)

As I mentioned in the introduction, this part of the project was made with mediapipe. First, we install the library, and the command is enough

pip install mediapipe

According to the Mediapipe documentation, it manages to generate a 3d map of 21 points of the hand.

Mediapipe point landmark

Without making too many changes and using the example code found on the page dedicated to the hands you already get the result. Here is the code, which you can also find on Mediapipe Hands:

import cv2
import mediapipe as mp
mp_drawing =
mp_drawing_styles =
mp_hands =

# For webcam input:
cap = cv2.VideoCapture(0)

with mp_hands.Hands(
    min_tracking_confidence=0.5) as hands:
  while cap.isOpened():
    success, image =
    if not success:
      print("Ignoring empty camera frame.")
      # If loading a video, use 'break' instead of 'continue'.

    # Flip the image horizontally for a later selfie-view display, and convert
    # the BGR image to RGB.
    image = cv2.cvtColor(cv2.flip(image, 1), cv2.COLOR_BGR2RGB)
    # To improve performance, optionally mark the image as not writeable to
    # pass by reference.
    image.flags.writeable = False
    results = hands.process(image)

    # Draw the hand annotations on the image.
    image.flags.writeable = True
    image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
    if results.multi_hand_landmarks:
      for hand_landmarks in results.multi_hand_landmarks:
    cv2.imshow('MediaPipe Hands', image)
    if cv2.waitKey(5) & 0xFF == 27:

and starting the code is what you see

Mediapipe hands running

How do I give the command?

The idea is to use landmark points, as we can see the highest point of the hand is the number 12 and immediately before the palm of the hand there is the number 9. When point number 12 is lower than 9 we can say that the hand is closed, always open in the other case.

To explain it in a simpler way, as you can see in the image when the red dot is lower than the green one the hand is closed so send the click command.

Final computer vision game

Putting all these elements together we have our game finished and a webcam will be enough to play it.