Alex Haslehurst

Make movies

23 April 2024
Alex Haslehurst
8 min read

Make Movies is my (arguably) over-engineered home lab project for making family movie night more accessible for my less-than technical family members.

By the way, if you couldn't tell (lol just don't look at the appendages), I've been messing around with Stable Diffusion XL.

A disclaimer:

We currently subscribe to Netflix, Amazon Prime & Disney+, pay our TV license and will go to the cinema when there's something we want to watch. The legality of using this project is questionable but morally, we are happy to use it to consume niche & unavailable content or content that I already own on physical media but CBA to get out the loft.

"I'd like to watch this random movie that's not on Netflix"

-- Non technical family member

Me:

Find my laptop, connect to Tor
Search for a decent quality torrent
Add it to transmission
Wait for the download to complete
Move the files to our movie library
Refresh Jellyfin

Surely this can be automated right?

There were some challenges.

The only place I would be comfortable running this app is my home network, so I am subject to the UK ISP firewall, which is likely to include all of the (relatively speaking) best torrent sites. So I am forced to tunnel connections through a VPN.

I first had a go at using a TOR proxy but found that services would often be using some sort of DDoS protection, which forces TOR exit traffic through a browser check. It's beyond the scope of this project to bypass this, so instead, I looked at proxying through a free tier VPN. For this I used a combination of a SOCKS5 proxy via Dante on top of OpenVPN and a service called Proton VPN.

I opted to run this in a container to simplify configuration i.e. I can tunnel all internet traffic out one container. For the convenience of tooling and health checks, I run this on Kubernetes. Here's the helm chart. You can install with.

# Create the namespace
kubectl create namespace make-movies

# Create a secret for OpenVPN from local openvpn.conf
kubectl --namespace make-movies create secret generic openvpn \
    --from-file=openvpn.conf=./openvpn.conf

# Install openvpn-proxy chart from github packages
helm upgrade --namespace make-movies \
  --install v1 oci://ghcr.io/axle-h/make-movies/openvpn-proxy \
  --version 0.0.3-9cf8b7b

# Create the namespace
kubectl create namespace make-movies

# Create a secret for OpenVPN from local openvpn.conf
kubectl --namespace make-movies create secret generic openvpn \
    --from-file=openvpn.conf=./openvpn.conf

# Install openvpn-proxy chart from github packages
helm upgrade --namespace make-movies \
  --install v1 oci://ghcr.io/axle-h/make-movies/openvpn-proxy \
  --version 0.0.3-9cf8b7b

Torrent services are unreliable, so to consume them live would mean:

Availability issues would propagate into the app.
There would be no fallback if they disappeared from the internet.

The best option then would be to batch scrape all the data that I need to build this app. Luckily I don't even need to sift through HTML (JQuery selectors anyone?), as one site, yts.mx, even has a decent REST API.

We'll need somewhere to store the scraped data with app metadata e.g. whether we already have the movie in our library. There's quite a lot of data actually (in the order of 50k movies) and it needs to be both searchable and mutable in big batches (when scraping).

If I were running this app commercially, I'd need to consider scalability, availability and OLTP vs OLAP. I'd opt to store the data in a DBMS, preferably MongoDB as the data isn't that relational. Then since the MongoDB full text search is both deprecated and terrible, I'd keep a copy of key fields asynchronously maintained in ElasticSearch.

But since I'm running this at home and don't plan to scale it past a single instance, I opted for just keeping the data in memory 😱🤓! Searching can then be done within the app process, by whatever is the best natural language search algorithm on NuGet (and I think I tried them all). I keep it persistent with a background thread that queues up updates to a JSON file and blocks the app from quitting until it's caught up. Here's the implementation.

Honestly, I reckon we can apply this YAGNI practise to scalability in more places than we realise. The cost of maintaining the infrastructure, team and licenses to do this "right" is not insignificant.

It takes quite a few moving parts.

UI: Family & mobile friendly UI written in Next.js with:
- Start a scrape session
- Movie search
- Display movie metadata like poster image, year of release, review score
- Download a movie
- Monitor movie download progress
API: JSON REST API written in .NET for gluing everything together
- Movie torrent scraper
- Movie API
- Otherwise a proxy for local services like transmission & Jellyfin.
nginx: Or any reverse proxy for binding these onto the same port with path based routing.

VPN: Something to tunnel through the great firewall of London.
Transmission: Downloads the movie torrents, we can control it via an RPC API (XML 🤢).
Jellyfin Server: A local movie library server, it has a vast REST API, the hardest part of which is figuring out what to call.
Jellyfin Client: Something to stream the movies to various devices around our house.

yts.mx: The dodgy site that hosts movie torrents, this is what the VPN is for.
TMDB: Most movie images, they have a decent and free API. We just need to make sure our usage doesn't look like scraping.
OMDb: Mops up any movie images not available on TMDB.

TLDR; scraped movies are searchable and downloadable via the UI. Once downloaded, they are available to watch via Jellyfin.

Everything needed to run this project is available on GitHub. In a sanity saving measure, I have opted to deploy most of the services on systemd. This was actually more simpler than I thought it would be and the .NET bit even supports Type=notify via Microsoft.Extensions.Hosting.Systemd.

Install the VPN on k3s as above
Install the API via documentation
Install the UI via documentation
Install nginx
```
sudo apt update
sudo apt install nginx
```
```
sudo apt update
sudo apt install nginx
```

Configure nginx for path based routing over the API & UI:

# /etc/nginx/sites-enabled/default
server {
  listen 8080;
  listen [::]:8080;

  location / {
        proxy_pass http://localhost:3000;
  }

  location ~ ^/(api|movie-images)/ /api {
        proxy_pass http://localhost:5000;
    }
}

# /etc/nginx/sites-enabled/default
server {
  listen 8080;
  listen [::]:8080;

  location / {
        proxy_pass http://localhost:3000;
  }

  location ~ ^/(api|movie-images)/ /api {
        proxy_pass http://localhost:5000;
    }
}

Restart nginx:
```
sudo systemctl restart nginx
```
```
sudo systemctl restart nginx
```
Browse to http://localhost:8080

This could easily have been a massive project spanning multiple microservices, complex messaging, even more complex DBMS, ridiculously complex search engine and all deployed via some horribly complex helm chart.

Trust me, I tried it.

But after some fun learning all this stuff and some immense satisfaction watching it deploy to k8s with a single command, I had many issues that were much more painful to fix than a project 'for fun' should present. Then I had some nasty breaking changes in major version updates. I'd already blown way past any time I would have saved by using this project so I'd had enough.

I took the approach of stripping it back to an MVP and minimizing dependencies. E.g. I used to do scraping asynchronously over a MassTransit bus, now I just lock on a Mutex in a single service. This has resulted in a system that, yes, wouldn't scale to thousands of concurrent users, would be a bit of a pain to decouple in places and is mostly void of technical showcase but that ... works.

Honestly, I am proud of it for it's simplicity. I have been on the path to simplistic solutions to complex problems for years now but this experience has been yet another confirmation that (at least) starting with simplicity is fundamental to successful projects.

simple-meme

Make movies

The (first world) problem

Pain points

1. Need a VPN

2. Need scraping

3. Need storage

Architecture

App components

Local services

Remote services

Process

Deploy

Simplicity