SightX: Forging the Environment on Apple Silicon

Day 4 & 5

They say a craftsman never blames their tools, but they also never start building without setting them up properly. Before a single model weight gets initialized, the environment that will carry all of that work needs to be solid, reproducible, and hardware-aware. That was the entire focus of these two days.


Why Bother With Setup First?

My first instinct, honestly, was to skip straight to writing model code. I already understood the theory from Days 2 and 3: CNNs, ResNet50, transfer learning. So, the urge to just start building was real. But I have seen enough projects collapse under their own environmental chaos to know better. A broken dependency, a mismatched Python version, a library that silently falls back to CPU when it should be using the GPU, these are the bugs that waste hours and have nothing to do with the actual problem you are trying to solve. So I did it right, once, and locked it in.


The Environment: Miniconda and Python 3.10

The foundation is a dedicated Conda environment called SightX, running Python 3.10. Conda is doing something important here that pip alone cannot: it manages the full dependency graph including non-Python system libraries and hardware drivers. The SightX environment is completely isolated from everything else on the machine. Nothing installed here can break another project, and nothing external can silently break SightX.
Python 3.10 was a deliberate choice, stable enough that every library in this stack has been battle-tested against it, modern enough to support the syntax and performance improvements that make the codebase clean. Reproducibility is locked in immediately by running pip freeze > requirements.txt after installation. Any future deployment or collaborator can replicate this environment exactly.


The Dependency Stack: What Was Installed and Why

Every library here earns its place. Nothing was installed casually.

PyTorch, TorchVision, TorchAudio: These three are the computational core of the entire inference engine. PyTorch handles tensor operations, automatic differentiation, and the neural network primitives. TorchVision adds the computer vision transforms and pre-built architectures, including ResNet50, which is the backbone of SightX. TorchAudio is included for potential future expansion. The critical detail here is that PyTorch 2.x ships with a native MPS backend, which is what connects directly to the M4 chip.

FastAPI and Uvicorn: Once the model is trained, it needs to be callable from the outside world. FastAPI wraps the inference logic into a validated REST API with auto-generated documentation, using Python type hints. Uvicorn is the high-performance server that runs it. These two together are the serving layer.

Pillow: Every image file on disk passes through Pillow before it ever reaches the neural network. It handles reading, resizing, colour-space conversion, and format normalisation. It is the unglamorous but essential interface between raw retinal scan images and the tensors PyTorch actually trains on.

NumPy: The common language of the scientific Python ecosystem. Pillow, scikit-learn, and PyTorch all speak NumPy arrays as a shared intermediate format. Used directly for data manipulation and label encoding throughout the preprocessing pipeline.

Scikit-Learn: Not everything needs a neural network, and not every evaluation metric does either. Scikit-learn provides train/test splitting, stratified sampling, and the confusion matrices that will be essential for evaluating a 5-grade classification model.

Matplotlib: Training without visualisation is flying blind. Loss curves, accuracy plots, sample image grids, these are the instruments that tell you whether the model is learning or just memorising.

Tqdm: A single line wrapping the training loop transforms a stalled-looking terminal into a live progress bar with per-batch timing. During long training runs on thousands of retinal images, knowing whether you are 4% through an epoch or 94% through changes every decision about whether to intervene or let it run.

Jupyter: The scratchpad where hypotheses become experiments before they become code. New preprocessing strategies, learning rate variations, and architecture tweaks all get prototyped here before being formalised into the project's Python scripts.


The M4 ARM chip: Why This Hardware Is a Genuine Advantage

This is the part of the setup that deserves more than a checkbox.

After running the hardware verification script, the confirmation came back:

Using Apple Silicon GPU via MPS

That line matters. The Apple M4 is built on ARM architecture with a unified memory design, meaning the CPU, GPU, and Neural Engine all share the same physical memory pool. 

There is no PCIe bus, no separate VRAM, no overhead from copying data between processor and accelerator. In a traditional laptop setup, every training step would involve shuttling data from system RAM across a bus to a discrete GPU's dedicated memory. On the M4, that bottleneck simply does not exist.

PyTorch accesses this through MPS(Metal Performance Shaders),  Apple's GPU compute framework. When a tensor is moved to the mps device, all operations on it execute on the M4's GPU cores with full unified memory bandwidth. The result is a 4–6x speedup over CPU training, with no NVIDIA dependency, no CUDA installation complexity, and none of the power and thermal demands of a discrete GPU.

For SightX specifically, this matters because the training dataset is large and the training loop will be running ResNet50, a 50-layer network with significant computational depth. Having dedicated silicon for matrix multiplication and convolution operations, accessible directly from PyTorch, means the iteration cycle between experiment and result stays tight. No spinning up a cloud instance. No per-hour GPU cost. Just local hardware doing exactly what it was architecturally designed to do.


The Project Structure

The folder layout is intentional:

inference-engine/

├── data/                                  ← Not tracked in Git

│   ├── train/

│   ├── test/

│   └── trainLabels.csv

├── checkpoints/                      ← Saved model weights

├── notebooks/                        ← Exploration and prototyping

├── model.py                           ← Architecture definition

├── preprocessing.py               ← Image pipeline

├── train.py                              ← Training loop

├── main.py                             ← FastAPI server

├── Dockerfile                         ← Containerization

└── requirements.txt                ← Locked dependencies


Raw training data stays out of Git, it is large, it does not change between code commits, and it would bloat the repository. Each Python file has a single, defined responsibility, a concept I learned in my Object Oriented Class and I would like to give a shout out to my professor for making me understand why it is important . 

When the preprocessing pipeline gets built next, it already has a designated home and its relationship to every other component is already clear.


What's Next?

The environment is solid. The hardware is confirmed. The folder structure is in place. The next phase is building the preprocessing pipeline.

Comments

Popular posts from this blog

SightX: We Shipped It (The Journey Comes to an End)

SightX: Data Acquisition & Exploration - 88GB of Reality, Data Acquisition and the 73% Problem

SightX: Teaching the Model to Learn - The Training Loop