SightX: We Shipped It (The Journey Comes to an End)

Day 17 - 21


The last blog ended with V2 sitting in a file on my laptop. A hundred megabytes of learned opinions about retinas, trained entirely on a MacBook Air that never once sounded like it was preparing for takeoff. It could look at an eye and tell you how bad the damage was. But it could not actually "talk" to anyone. It was a brain in a jar. Impressive at dinner parties. Useless in a clinic.


The next phase was simple in theory and chaotic in practice: give the brain a body. Build the server. Build the interface. Wire everything together. Ship it.


The Brain Needed a Safety Net


Here is something I did not expect. The model's raw opinion is actually kind of dangerous to use directly.


Think about it. The model trained on a dataset where 73% of images were healthy retinas. Remember the 73% problem from way back in the data exploration blog? It never went away. The model "learned" that bias. Deep in its 24.6 million parameters, it has a quiet preference toward saying "you are fine." In a medical setting, that quiet preference is the kind of thing that costs people their eyesight.


So before I could serve the model's predictions to anyone, I had to build what I started calling the safety net. Three layers of math between the raw model output and the final recommendation, each solving a different problem.


The first layer fixes the model's confidence. Neural networks are notorious liars when it comes to how sure they are. The model says "I am 80% confident" when it is actually right about 60% of the time. There is a surprisingly simple fix: divide the output by a single number that you find by running the math on your validation set. One number. No retraining. The model's accuracy stays the same, but its confidence estimates start matching reality. It learned humility.


The second layer undoes the dataset bias. Remember, 73% healthy in training, but real clinical populations look different. Published medical literature says roughly 35% of screened diabetic patients have some form of retinopathy. There is a formula for this, literally just Bayes' theorem, the same thing you learn in intro statistics. You multiply the model's output by the ratio of what the real world looks like versus what the training data looked like. The model stops over-predicting healthy.


The third layer is the one I am most proud of, and it is the reason I can sleep at night. It encodes a belief that most machine learning projects never bother with: not all mistakes are equal. Telling a healthy person they need a specialist visit is annoying. Telling a person with sight-threatening disease that they are fine is catastrophic. So I built a system where the cost of missing serious disease is weighted *fifty times* higher than the cost of a false alarm. When the model is uncertain, it errs on the side of caution. Every single time.


That is not a bug. That is the entire point.


108 Second Opinions


I also made the model look at every image 108 times.


This sounds excessive. It is. But here is why: if you rotate an image slightly, crop it differently, flip it horizontally, and the model changes its mind, then the model was never that confident to begin with. So for every uploaded image, the system runs 108 passes. The first one is a clean, deterministic look. The other 107 are slight variations: random crops, flips, rotations, small color shifts. Then it takes a vote.


The final prediction is whatever the model said most often across all 108 looks. The confidence is averaged. The noise washes out. You get the same answer whether you run it once or ten times.


Does it take a few seconds instead of milliseconds? Yes. But this is someone's eyesight. I will take clinical confidence over speed every time.


Giving the Brain a Mouth


With the safety net built, wrapping the model in a server was almost anticlimactic. You send it an image of a retina. It sends back a JSON response that says, in plain English, what to do.


Not "Grade 2" or "0.7823 probability vector." It says "Doctor Visit Required" with a yellow indicator, "Schedule ophthalmology appointment," and a confidence score. Everything a clinician needs, nothing they do not.


The model runs on a FastAPI server that boots up, auto-detects whether you have a GPU or Apple Silicon or just a regular CPU, loads the model once, and holds it in memory. When an image arrives, it validates the file, runs the 108-pass ensemble, pushes everything through the safety net, and returns a clean answer. The whole thing is about 170 lines of Python.


Three Boxes, One Mission


SightX ships as three separate services, each in its own Docker container.


The frontend is a React app. It is the diagnostic interface. Upload zone, real-time status animations, triage results color-coded by severity. It blocks mobile devices entirely because this is not a consumer app. This is a diagnostic workstation built for desktop monitors in clinical settings.


The backend is a thin Node.js gateway. It receives the image from the frontend, forwards it to the AI engine, and relays the answer back. One job. Clean handoff. It does not process images. It does not make predictions. It orchestrates.


The inference engine is the Python service with the model. This is the only part of the system that knows what a retina looks like.


One command starts everything: docker-compose up --build. Three containers. Internal networking. Done.

Here is a peek at the systems design > ~ <




The Interface: Making Trust Visible


I spent an honestly unreasonable amount of time on the user interface. Here is why: if it looks like a homework project, no clinician will trust it. Medical software has a credibility bar, and that bar is set by how professional and intentional the design feels.


The main page has a drag-and-drop upload zone that transforms into a clinical confirmation form after the AI runs. There is a live status card that animates between three states, idle, processing, and complete, with an hourglass that spins and a shimmer bar that runs across the bottom during inference. When results arrive, they slide in with severity-colored accent bars. Green for optional follow-up. Yellow for required. Red for mandatory.


And here is the part that matters most: after the AI gives its opinion, the clinician enters the patient's name and ID, can override the AI diagnosis with a dropdown, and hits "Save Patient Record." That record goes to the database with both the AI's prediction and the doctor's final call attached.


The AI assists. It does not decide. That distinction is the entire design philosophy.


The Moment It All Connected


I remember the first end-to-end test. All three containers running. Frontend loaded.


I dragged a retinal image onto the upload zone. The hourglass spun. The shimmer bar did its thing for about four seconds. Then:


🟡 Doctor Visit Required - "Schedule ophthalmology appointment" - Confidence 83.2%


I sat there for a second. The image went from my cursor to a clinical recommendation in four seconds. I typed a patient name, hit save, and the record was in Supabase.


That was the moment it stopped being a project and started being a system.


Where SightX Actually Sits


Let me be honest about what this is and what it is not.


FDA-approved systems run at 90%+ accuracy. They diagnose autonomously. SightX is not that.


Clinical assist tools run at 80-90%. Doctors use them as a second opinion. SightX is not quite there yet either.


SightX is a screening tool. 70% accuracy, κ=0.6454. It flags at-risk patients for specialist referral. In a rural clinic where the alternative is *no* diabetic retinopathy screening at all, a system that catches most at-risk patients and sends them to an ophthalmologist is genuinely, measurably useful.


And here is the thing I keep coming back to: the model is one swappable file. Someone with better hardware could retrain on more data at higher resolution, drop in their model, and the entire system, the server, the safety net, the interface, the database, just works. The model is the engine. SightX is the car.


The Closing Thought


Day 1 of this project, I did not know what a learning rate scheduler was. I spent twenty minutes wondering why Jupyter could not find pandas because I forgot to activate my conda environment. The EyePACS documentation said the dataset was 35GB. It was 100GB. I ate noodles and questioned my life choices more times than I can count.


And now here is what exists: a system where you drag a photo of someone's eye onto a screen, and four seconds later, it tells a doctor whether that person needs help. It is not perfect. It is not FDA-approved. But it works. And every architectural decision, from the cost matrix that penalizes missed disease fifty times harder than a false alarm, to the 108-pass ensemble that refuses to be impulsive, to the clinician override that keeps a human in the loop, all of it was built around one idea.


This is someone's eyesight. Be careful.


Diabetes runs in my family. I have watched relatives navigate the constant anxiety of complications that arrive silently. Every line of code in SightX carries that weight. It always has. From Day 1 to Day 21. "From a confused conda environment to a three-container clinical system."


SightX is shipped. Built with care to serve people.


That is all for SightX for now, will return when I get better at stuff while that happens

Be HealthTechy, Code On, Build for the People, Stay healthy 'n


Forgot! Thanks Artemis II Mission for being the key motivator to do this project




Lets build more for the peeps in that beautiful blueball buts thats all for now


Bibs Out!

Sayonara



Comments

Popular posts from this blog

SightX: Data Acquisition & Exploration - 88GB of Reality, Data Acquisition and the 73% Problem

SightX: Teaching the Model to Learn - The Training Loop