ESP32-CAM Motion Detection* and Amplification

A side project inspired from this thread on Reddit.

The concept is simple: using a QVGA resolution, subtract the pixel data if the current frame from the pixel data of the previous frame, and display the difference on a ILI9341 TFT display.

As you can see from the video, the frame time hovers around 113ms, giving an approximate rate of 8.8 FPS.

Due to some lazy coding, the output is displayed as 32 shades of grey. This could be expanded to a 192 colours, using a similar algorithm tested on the LiDAR night vision camera project.

This page includes all the un-optimized code that comes from a curious evening of tinkering.

* this is about motion detection, not motion tracking. I’ve tried this here, using a similar scheme with ‘meh’ results.

Software

Hardware & The User_Setup.h File

The setup pictured above uses an AI-Thinker ESP32-CAM that has been modified to disable the flash and breakout GPIO 33 in order to use the resistive touchscreen on the ILI9341 display. These modifications have been documented here, along with a CTRL+C & CTRL+V version of the relevant User_Setup.h file.

I keep meaning to add a resource page for a normal AI-Thinker ESP32-CAM and the ILI9341 TFT display, so if you would find that useful then drop me a message on the contact page and I’ll sort it out. Likewise, been meaning to do a bit of an overview for the AI-Thinker module, including various hints, tips & tricks.

Main Sketch

I’m trying to get better at adding comments to the code – mainly because my memory is getting worse, and excitement of trying to fathom where the magic numbers come from was growing thin.

None the less, the sketch in all of it’s raw beauty is below, for your delectation. This is only utilising one core at the moment, so there would certainly be improvements if you want to battle tête-a-tête with the RTOS.

#include "esp_camera.h"
#include <TFT_eSPI.h> // CHECK YOUR User_Setup.h File. 
#include <SPI.h>

#define CAMERA_MODEL_AI_THINKER
#define PWDN_GPIO_NUM     32
#define RESET_GPIO_NUM    -1
#define XCLK_GPIO_NUM      0
#define SIOD_GPIO_NUM     26
#define SIOC_GPIO_NUM     27
#define Y9_GPIO_NUM       35
#define Y8_GPIO_NUM       34
#define Y7_GPIO_NUM       39
#define Y6_GPIO_NUM       36
#define Y5_GPIO_NUM       21
#define Y4_GPIO_NUM       19
#define Y3_GPIO_NUM       18
#define Y2_GPIO_NUM        5
#define VSYNC_GPIO_NUM    25
#define HREF_GPIO_NUM     23
#define PCLK_GPIO_NUM     22

TFT_eSPI tft = TFT_eSPI();
TFT_eSprite spr = TFT_eSprite(&tft);

camera_config_t config;
camera_fb_t  * fb;

//PSRAM pointers
int *previousFrame;
int *newFrame;

//Threshold
const int threshold = 64;  //(2^6)

Libraries, Definitions, Constructors and Globals.

Standard guff, really.

The ‘threshold’ integer is a vestigial constant from an original black & white version.

void setup() {

  Serial.begin(115200);

  //PSRAM allocations
  psramInit();
  previousFrame = (int *) ps_malloc(76800 * sizeof(int));
  newFrame = (int *) ps_malloc(76800 * sizeof(int));

  config.ledc_channel = LEDC_CHANNEL_0;
  config.ledc_timer = LEDC_TIMER_0;
  config.pin_d0 = Y2_GPIO_NUM;
  config.pin_d1 = Y3_GPIO_NUM;
  config.pin_d2 = Y4_GPIO_NUM;
  config.pin_d3 = Y5_GPIO_NUM;
  config.pin_d4 = Y6_GPIO_NUM;
  config.pin_d5 = Y7_GPIO_NUM;
  config.pin_d6 = Y8_GPIO_NUM;
  config.pin_d7 = Y9_GPIO_NUM;
  config.pin_xclk = XCLK_GPIO_NUM;
  config.pin_pclk = PCLK_GPIO_NUM;
  config.pin_vsync = VSYNC_GPIO_NUM;
  config.pin_href = HREF_GPIO_NUM;
  config.pin_sscb_sda = SIOD_GPIO_NUM;
  config.pin_sscb_scl = SIOC_GPIO_NUM;
  config.pin_pwdn = PWDN_GPIO_NUM;
  config.pin_reset = RESET_GPIO_NUM;
  config.xclk_freq_hz = 20000000;
  config.frame_size = FRAMESIZE_QVGA;
  config.pixel_format = PIXFORMAT_GRAYSCALE;
  //config.pixel_format = PIXFORMAT_RGB565;
  config.grab_mode = CAMERA_GRAB_LATEST;
  config.fb_location = CAMERA_FB_IN_PSRAM;
  config.jpeg_quality = 12;
  config.fb_count = 2;

  esp_err_t err = esp_camera_init(&config);

  sensor_t * s = esp_camera_sensor_get();
  s->set_brightness(s, 0);     // -2 to 2
  s->set_contrast(s, 0);       // -2 to 2
  s->set_saturation(s, 0);     // -2 to 2
  s->set_special_effect(s, 0); // 0 to 6 (
  s->set_whitebal(s, 1);       // 0 or 1
  s->set_awb_gain(s, 1);       // 0 or 1
  s->set_wb_mode(s, 0);        // 0 to 4
  s->set_exposure_ctrl(s, 1);  // 0 or 1
  s->set_aec2(s, 0);           // 0 or 1
  s->set_ae_level(s, 0);       // -2 to 2
  s->set_aec_value(s, 300);    // 0 to 1200
  s->set_gain_ctrl(s, 1);      // 0 or 1
  s->set_agc_gain(s, 0);       // 0 to 30
  s->set_gainceiling(s, (gainceiling_t)0);  // 0 to 6
  s->set_bpc(s, 0);            // 0 or 1
  s->set_wpc(s, 1);            // 0 or 1
  s->set_raw_gma(s, 1);        // 0 or 1
  s->set_lenc(s, 1);           // 0 or 1
  s->set_hmirror(s, 0);        // 0 or 1
  s->set_vflip(s, 0);          // 0 or 1
  s->set_dcw(s, 1);            // 0 or 1
  s->set_colorbar(s, 0);       // 0 or 1

  tft.init();
  tft.setRotation(3);
  tft.fillScreen(TFT_BLACK);
  spr.createSprite(320, 240); //QVGA
  spr.setTextColor(TFT_WHITE, TFT_BLACK);
  tft.setTextColor(TFT_WHITE, TFT_BLACK);
  tft.drawString("Loading...", 105, 105, 2);
  delay(1000);

}

Setup ( )

The PSRAM is allocated here for the two frames. Note that the the function free( ) is not called because the data is constantly overwritten, but this could cause consternation in your projects.

Important camera perimeters for this are the frame size being QVGA (320×240 to match the display), and the pixel format set to greyscale to give 1 byte per pixel.

A full frame sprite is created to handle the display graphics – this will put a strain on the RAM if you intend to do anything else.

As always, a small loading screen is added to verify that the display is working. If you cannot see this then I would suggest checking your wiring and User_Setup.h files.

void loop() {

  // delay(5000);   //uncomment to add 5second delay

  long initialTime = millis();

  //get intial frame
  camera_fb_t  * fb = NULL;
  fb = esp_camera_fb_get();

  //transfer camera buffer to new frame
  for (int i = 0; i < 76800; i++) {
    newFrame[i] = fb->buf[i];
  }

  //free up memory on camera
  esp_camera_fb_return(fb);

//do comparison between two frame and process.
  for (int k = 0; k < 76800; k++) {
    //uncomment for threshold colouring
    //int pix_diff = abs(previousFrame[k] - newFrame[k]);
    int pix_diff = (255 + (previousFrame[k] - newFrame[k]));
    int pix_x = k % 320;    
    int pix_y = k / 320;

    //8-bit greyscale to RGB565 conversion
    int pix_lum = pix_diff / 16;
    uint16_t B = pix_lum;
    uint16_t G = pix_lum * 2;
    uint16_t R = pix_lum;
    uint16_t pix_colour = ((R << 11) + (G << 5) + B);
    spr.drawPixel(pix_x, pix_y, pix_colour);

    /*
        //threshold colouring
        if (pix_diff > 64) {
          spr.drawPixel(pix_x, pix_y, TFT_WHITE);
        } else {
          spr.drawPixel(pix_x, pix_y, TFT_BLACK);
        }
    */

    //move current frame to previous
    previousFrame[k] = newFrame[k];
  }

  //frame timing
  long frameTime = millis() - initialTime;
  spr.setCursor(140, 200, 2);
  spr.print(frameTime); spr.println(" ms");

  //push sprite to delay
  spr.pushSprite(0, 0);


}

Loop ( )

The meat and potatoes of this program lies in the second for( ) loop.

This finds the difference between the two frames from the midpoint. As the maximum pixel value is 255, and the minimum is 0, this means that there is a spread of 510 (from -255 to +255, which is then shifted to 0 – 510).

As the camera frames are a one-dimension array, it’s necessary to work out the relative X & Y co-ordinates for the display – this is done with the pix_x and pix_y calculations. I’m sure that a nested for loop would be faster, but I’m lazy.

After the frame differences are calculated, coloured and added to the sprite, the sprite is then pushed to the display and the process repeats itself.

The alternative code for threshold colouring as been included as comments.

Colouring

The display expects a 16-bit RGB565 pixel format, and all of the camera outputs are in an 8-bit greyscale, and the range of values outputted is 9-bit.

A grey colour is achieved when each of the R G B channels are the same value (with the exception of green, which is doubled), so as the R and B channels are limited to 5 bits, that limits the number of gradients to 2⁵ = 32.

Dividing the total value range by the number of gradients, gives the bandwidth of the gradient. In this instance 2⁹ / 2⁵ = 2⁴ = 16, so the value is divided by 16 to obtain the intensity of grey. This intensity is then attributed to the R G B channels and bit shifted to achieve the final RRRRRGGGGGGBBBBB value.

The video to the right shows the threshold colouring with the threshold value set to 64 (2⁶).

Initial experiments used a threshold value of 127 (2⁷-1), but only major movement would be shown (which may be beneficial for some uses).

The usage of the abs( ) function when calculating the difference will always return a positive integer, so the this simply shows if there is change between the frames, thus implying movement of some sort.

A benefit of the greyscale display is that it can indicate the difference between light->dark changes, and dark->light changes.

Future Plans

This was a fun little side quest, and will probably remain as such.

However, should there ever be a Version 2, then I would like to implement a full colour display. In addition to this, experiments with edge detection show that a gaussian blur filter would probably remove a large amount of the noise, too.

Page created: 04/01/2026
Last updated: 05/01/2026

HJWWalters

Jack of all trades; master of none.