Secure Video Doorbell

After speaking with a friend who was installing security cameras at his home and after reading about Ring’s disregard for privacy regarding cloud storage of security footage, I was inspired to find an alternative solution. I had already purchased a Jetson Nano which seemed like an excellent SBC (Single Board Computer) for a project like this to not only capture and process a camera feed, but also execute some inference for face and package detection.

I also purchased a Raspberry Pi HQ Camera, but later found it wasn’t compatible with the Jetson Nano I had without modifying the hardware; which in turn would cause it to be incompatible with my Raspberry Pi 4. They may have finally solved this in software by RidgeRun, but I have not yet tried this on my board.

The camera needed lenses. I ended up trying a few. I have a couple Canon DSLR lenses and found a CS to EOS lens adapter (the Raspberry PI HQ Camera has a C/CS mount), but it was much too expensive of a path for the concept of the project along with the inability to control the aperture of the lenses without some manual hacks (eg. with the lens on the DSLR camera, set the aperture and then remove the lens while the camera is still powered on :O). I ended up settling on Arducam’s 5-pack.

Due to the incompatibility of the camera with the Jetson, I ended up connecting the camera to a Raspberry Pi 4 and streaming the camera data across my network to the Jetson Nano via GStreamer. A fun work around. GStreamer is a fantastic framework if you don’t already know about it. If I were to finally deploy this device, I would modify the camera and connect it directly to the Jetson Nano — or possibly try the aforementioned RidgeRun driver.

The Jetson Nano includes the Nvidia Jetpack SDK, which includes some plugins for GStreamer (DeepStream). GStreamer allows you to stream your camera data through a number of plugins and ultimately into a file on disk or elsewhere. The plugins allow you to process the data in a number of ways. One of the ways can be to send the data through an inference model, or multiple models of your choosing (ie Face Detection, Face Recognition, Package Delivery Detection, etc.). It is fairly straightforward to grab pre-trained models from the web and convert them to the Nvidia Tensor RT. The Tensor RT models can be executed by the GStreamer plugin. It’s also straightforward enough to train and use your own models as well. The performance on the Jetson Nano is very capable, although the DRAM is somewhat limited at 2GB. The new model has 4GB.

At the time of this writing (May 2022), the cost of the Raspberry Pi HQ Camera, the Arducam lens kit, and the Jetson Nano is around: $59 + $125 + $50 = $235 + tax. Or around $134 + tax if you only need 1 lens instead of 5. All the software is free or included. It is slightly more expensive than the ring doorbell at around $99, but I think it is well worth it just for the sake of privacy alone. Is your privacy worth $35?

This project is a work in progress. I may release all my source code (mostly Python) in the future, but for now this is a decent blueprint for what can be built with probably only a modicum of expertise.

Some interesting articles for future direction:

https://www.influxdata.com/blog/nvidia-jetson-series-part-1-jetson-stats/
https://www.influxdata.com/blog/nvidia-jetson-series-part-2-vision-ai-pipeline/
https://medium.com/pytorch/accelerating-pytorch-inference-with-torch-tensorrt-on-gpus-896e06ff1637

Update (2022-11-29): “Anker’s Eufy Cameras Caught Uploading Content to the Cloud Without User Consent” — Their marketing slogan: “No Clouds or Costs: This means that no one has access to your data but you, plus you never have to pay a monthly fee for cloud services.” Just another reason for a DIY, open source, secure video doorbell system.