ESP32CAM – Digital Panning using a Joystick and ST7789 Display.

As with most ideas, this one came at the inconvenient time of 11:13pm – so I sent a Whatsapp message to myself to remind me in the morning. As you can see – the pretence is simple – have the camera take an image which is larger than the display, and control the window with a joystick.

The video below shows the final* result. Cross hairs have been added to show how the camera pans, regardless of any motion within the frame.

*The definition of “final” being “final for me, as I can’t be bothered to work on it any more”


Hardware

If you want to build this contraption for yourself, then you’ll need an ESP32CAM with an OV2640 camera sensor, ST7789 TFT display, an analogue joystick, some wires and an FTDI adapter (or other USB link). The wiring for this experiment is shown in the photo below.

Please ignore the circular GC9A01 display and push button – these are for other experiments not featured here. The ST7789 display was chosen because it can operate with only 4 connections to the ESP32CAM, unlike the GC9A01 which requires 5 connections.

The pinout for this setup is given in the list below. I’ve not shown the FTDI connections – these can be viewed here:

  • ST7789 VCC –> 3.3V
  • ST7789 GND –> GND
  • ST7789 SCL –> ESP32CAM GPIO 14
  • ST7789 SDA –> ESP32CAM GPIO 13
  • ST7789 RES –> ESP32CAM GPIO 15
  • ST7789 DC –> ESP32CAM GPIO 12
  • Joystick 5V –> 3.3V (NOT 5V)
  • Joystick GND –> GND
  • Joystick VRx –> ESP32CAM GPIO 2
  • Joystick VRy –> ESP32CAM GPIO 4
  • Joystick SW –> [not connected]

The eagle-eyed might have noticed a number of capacitors on the 3.3V rail. These are to help stabilise the voltage as there were a number of brown-out issues. The values of the capacitors used are 330uF electrolytic, 4.7uF electrolytic and 0.1uF ceramic.


Software – User_Setup.h

This code uses the TFT_eSPI library, so the User_Setup.h file must be correctly configured to the display. Below is the relevant setup code:

#define ST7789_DRIVER      // Full configuration option, define additional parameters below for this display
#define TFT_SDA_READ      // This option is for ESP32 ONLY, tested with ST7789 and GC9A01 display only
#define TFT_RGB_ORDER TFT_BGR  // Colour order Blue-Green-Red
#define TFT_WIDTH  240 // ST7789 240 x 240 and 240 x 320
#define TFT_HEIGHT 240 // ST7789 240 x 240

//#define TFT_MISO 19
#define TFT_MOSI 13		//assumed sda pin
#define TFT_SCLK 14		
//#define TFT_CS   15  // Chip select control pin
#define TFT_DC    12  // Data Command control pin
#define TFT_RST   15  // Reset pin (could connect to RST pin)
//#define TFT_RST  -1  // Set TFT_RST to -1 if display RESET is connected to ESP32 board RST

#define LOAD_GLCD   // Font 1. Original Adafruit 8 pixel font needs ~1820 bytes in FLASH
#define LOAD_FONT2  // Font 2. Small 16 pixel high font, needs ~3534 bytes in FLASH, 96 characters
#define LOAD_FONT4  // Font 4. Medium 26 pixel high font, needs ~5848 bytes in FLASH, 96 characters
#define LOAD_FONT6  // Font 6. Large 48 pixel font, needs ~2666 bytes in FLASH, only characters 1234567890:-.apm
#define LOAD_FONT7  // Font 7. 7 segment 48 pixel font, needs ~2438 bytes in FLASH, only characters 1234567890:-.
#define LOAD_FONT8  // Font 8. Large 75 pixel font needs ~3256 bytes in FLASH, only characters 1234567890:-.
//#define LOAD_FONT8N // Font 8. Alternative to Font 8 above, slightly narrower, so 3 digits fit a 160 pixel TFT
#define LOAD_GFXFF  // FreeFonts. Include access to the 48 Adafruit_GFX free fonts FF1 to FF48 and custom fonts
#define SMOOTH_FONT

// #define SPI_FREQUENCY  20000000
#define SPI_FREQUENCY  27000000
// #define SPI_FREQUENCY  40000000
// #define SPI_FREQUENCY  80000000
#define SPI_READ_FREQUENCY  20000000
#define USE_HSPI_PORT
#define SUPPORT_TRANSACTIONS

If you want to save on program space then you can get rid of the fonts (apart from Font 2 which is used to show the frame refresh time). Changing the SPI_FREQUENCY 40000000 works, but doesn’t change the frame speed. Changing the SPI_FREQUENCY to 80000000 just causes instability issues.


Software – Main Sketch

Full honesty here: this program was a bastardised version of the original GC9A01 display sketch. But after issues with running both cores, the code was simplified – so there are likely to be unused variables as well as many other inefficiencies.

#include "esp_camera.h"
#include <TFT_eSPI.h> // Hardware-specific library
#include <SPI.h>

#define CAMERA_MODEL_AI_THINKER
#define PWDN_GPIO_NUM     32
#define RESET_GPIO_NUM    -1
#define XCLK_GPIO_NUM      0
#define SIOD_GPIO_NUM     26
#define SIOC_GPIO_NUM     27
#define Y9_GPIO_NUM       35
#define Y8_GPIO_NUM       34
#define Y7_GPIO_NUM       39
#define Y6_GPIO_NUM       36
#define Y5_GPIO_NUM       21
#define Y4_GPIO_NUM       19
#define Y3_GPIO_NUM       18
#define Y2_GPIO_NUM        5
#define VSYNC_GPIO_NUM    25
#define HREF_GPIO_NUM     23
#define PCLK_GPIO_NUM     22

const int xPin = 2;
const int yPin = 4;
volatile int x = 200;   //200 for VGA
volatile int y = 120;   //120 for VGA

TFT_eSPI tft = TFT_eSPI();       // Invoke custom library
TFT_eSprite spr = TFT_eSprite(&tft);

camera_config_t config;

uint16_t *scr;
long initalTime = 0;
long frameTime = 1;
volatile bool screenRefreshFlag = true;

Libraries, Definitions, Globals and Constructors

This is the standard setup for the camera and display rigs that have been featured on the edge detection experiments.

The joystick input pins and intial x & y window offset are declared.

VGA resolution 640pix wide, so with a 240pix wide window gives 200pix padding. Similarly, the 480pix height of a VGA frame gives 120pix padding top & bottom.

void setup() {
  Serial.begin(115200);

  pinMode(xPin, INPUT);
  pinMode(yPin, INPUT);

  config.ledc_channel = LEDC_CHANNEL_0;
  config.ledc_timer = LEDC_TIMER_0;
  config.pin_d0 = Y2_GPIO_NUM;
  config.pin_d1 = Y3_GPIO_NUM;
  config.pin_d2 = Y4_GPIO_NUM;
  config.pin_d3 = Y5_GPIO_NUM;
  config.pin_d4 = Y6_GPIO_NUM;
  config.pin_d5 = Y7_GPIO_NUM;
  config.pin_d6 = Y8_GPIO_NUM;
  config.pin_d7 = Y9_GPIO_NUM;
  config.pin_xclk = XCLK_GPIO_NUM;
  config.pin_pclk = PCLK_GPIO_NUM;
  config.pin_vsync = VSYNC_GPIO_NUM;
  config.pin_href = HREF_GPIO_NUM;
  config.pin_sscb_sda = SIOD_GPIO_NUM;
  config.pin_sscb_scl = SIOC_GPIO_NUM;
  config.pin_pwdn = PWDN_GPIO_NUM;
  config.pin_reset = RESET_GPIO_NUM;
  config.xclk_freq_hz = 20000000;
  config.frame_size = FRAMESIZE_VGA;
  config.pixel_format = PIXFORMAT_RGB565;
  config.grab_mode = CAMERA_GRAB_LATEST;   
  config.fb_location = CAMERA_FB_IN_PSRAM;
  config.jpeg_quality = 12;
  config.fb_count = 2;       

  esp_err_t err = esp_camera_init(&config);
  sensor_t * s = esp_camera_sensor_get();
  s->set_brightness(s, 0);     // -2 to 2
  s->set_contrast(s, 0);       // -2 to 2
  s->set_saturation(s, 0);     // -2 to 2
  s->set_special_effect(s, 0); // 0 to 6 (0 - No Effect, 1 - Negative, 2 - Grayscale, 3 - Red Tint, 4 - Green Tint, 5 - Blue Tint, 6 - Sepia)
  s->set_whitebal(s, 1);       // 0=disable, 1=enable
  s->set_awb_gain(s, 1);       // 0=disable, 1=enable
  s->set_wb_mode(s, 0);        // 0 to 4 - if awb_gain enabled (0 - Auto, 1 - Sunny, 2 - Cloudy, 3 - Office, 4 - Home)
  s->set_exposure_ctrl(s, 1);  // 0=disable, 1=enable
  s->set_aec2(s, 0);           // 0=disable, 1=enable
  s->set_ae_level(s, 0);       // -2 to 2
  s->set_aec_value(s, 300);    // 0 to 1200
  s->set_gain_ctrl(s, 1);      // 0=disable, 1=enable
  s->set_agc_gain(s, 0);       // 0 to 30
  s->set_gainceiling(s, (gainceiling_t)0);  // 0 to 6
  s->set_bpc(s, 0);            // 0=disable, 1=enable
  s->set_wpc(s, 1);            // 0=disable, 1=enable
  s->set_raw_gma(s, 1);        // 0=disable, 1=enable
  s->set_lenc(s, 1);           // 0=disable, 1=enable
  s->set_hmirror(s, 0);        // 0=disable, 1=enable
  s->set_vflip(s, 0);          // 0=disable, 1=enable
  s->set_dcw(s, 1);            // 0=disable, 1=enable
  s->set_colorbar(s, 0);       // 0=disable, 1=enable

  tft.init();
  tft.setRotation(3);
  tft.fillScreen(TFT_BLACK);
  tft.setTextColor(TFT_BLACK, TFT_WHITE);
  scr = (uint16_t*)spr.createSprite(240, 240);
  tft.drawString("Loading...", 105, 105, 2);

  delay(1000);
}

The Setup( )

The Serial monitor was used to look at the timings.

As noted above, this uses VGA resolution, rather than the intended full UXGA. The reason is that the pixel format is set to RGB565, so each camera pixel required 2 bytes.

Therefore a single 1600×1200 UXGA image would occupy 3.84MB and the ESP32 would just shit a brick.

As always, the setup( ) ends with displaying a simple “Loading…” screen for a second to verify the display works.

void loop() {

  initalTime = millis();

  camera_fb_t  * fb = NULL;
  fb = esp_camera_fb_get();
  long split_1 = millis() - initalTime;

  for (int y_count = 0; y_count < 240; y_count++) {
    int yPart = (y_count + y) * 1280;
    for (int x_count = 0; x_count < 240; x_count++) {
      int xPart = (x_count + x) * 2;
      byte first_byte = fb->buf[yPart + xPart];
      byte second_byte = fb->buf[yPart + xPart + 1];
      scr[(y_count * 240) + x_count] = (second_byte << 8) + first_byte;
    }
  }

  esp_camera_fb_return(fb);
  long split_2 = millis() - initalTime;

  int xValue = analogRead(xPin);
  int yValue = analogRead(yPin);
  if ((yValue == 0) && (y > 0)) {
    //move up
    y = y + 5;
  }
  if ((yValue > 4000) && (y < 240)) {
    //move down
    y = y - 5;
  }
  if ((xValue == 0) && (x > 0)) {
    //move left
    x = x + 5;
  }
  if ((xValue > 4000) && (x < 400)) {
    //move right
    x = x - 5;
  }

  spr.drawLine(120, 0, 120, 240, TFT_BLACK);
  spr.drawLine(0, 120, 240, 120, TFT_BLACK);
  spr.drawString(String(frameTime), 100, 220, 2); //print frame time in milliseconds
  spr.drawString("ms", 125, 220, 2);
  spr.pushSprite(0, 0);
  long split_3 = millis() - initalTime;

  frameTime = millis() - initalTime;
  Serial.println(initalTime);
  Serial.println(split_1);
  Serial.println(split_2);
  Serial.println(split_3);

}

The Loop( )

The concept of the sketch is simple:

  • Camera takes a photo and saves to frame buffer.
  • Nested for( ) loops grab the necessary pixels from the frame buffer to create a 240×240 window.
  • The joystick values are read and the window offset is adjusted accordingly.
  • The display shows the contents of the window.
  • Repeat ad infinitum.

The magic number of “1280” is the number of bytes in a single line of VGA frame (640pixels x 2bytes per pixel).

As we all know, the ADC of the ESP32 is dire at best, and not helped by the display and camera pulling current. Therefore the joystick control was kept simple: either fully off (reading 0) or convincingly on (reading over 4000 – max is 4096).

The “split” variables are for timing purposes. (More about this in the ‘conclusion’)


Troubleshooting

There a few quirks with this setup and I’ve noticed a couple of common error messages

Serial data stream stopped

if you get this error regarding “Possible serial noise or corruption”, then remove the joystick VCC and GND connections from the ESP32CAM when uploading the sketch. Once the sketch is uploaded, you can reinsert the wires and reset the device.

GPIO error

I’ve seen this error flag up on the serial monitor. I can only assume it’s because the system isn’t happy about the joystick pins being used as inputs, but it doesn’t prevent the system from working. So just ignore it.

Other Errors

There are issues with core 1 panicking, but found that a good old fashioned reset will usually sort that problem.


Conclusion

I’m always genuinely surprised when these ideas actually work. Although, truth be told, I’m disappointed at the frame rate of 3.1 FPS, and this seems to limited by the camera.

The timing markers are split in three places:

  1. After camera has obtained frame.
  2. After the window has been created from the frame.
  3. After the window has been shown on the display.

Markers 2 and 3 consistently produce values of 18ms and 32ms respectively. However, the time taken for the camera to obtain a VGA frame is 270ms!! I’ve tested a number resolutions, timed them with the same code and these are the results:

ResolutionSizeTime to get frame (ms)Pixel / Millisecond
UXGA1600 x 1200 pixel[error][nul]
SVGA800 x 600 pixel270 ms1,777 pix/ms
VGA640 x 480 pixel270 ms1,137 pix/ms
QVGA320 x 240 pixel29 ms2,648 pix/ms
CIF400 x 296 pixel (according to the datasheet)28 ms4,228 pix/ms

There seems to be something happening within the camera that’s causing a massive step down in pixel transfer speeds. I did look at utilising the second core for obtaining and displaying the window, while the original core is farting about with the camera…

….But I can’t be arsed.