Dev Log 5: Starting from building an AI running coach

Today is run number 3029. Total distance so far: 34,318.3 km. 2,209 consecutive days without a break. I get excited when talking about running, and it has nothi

Today is run number 3029. Total distance so far: 34,318.3 km. 2,209 consecutive days without a break.

I get excited when talking about running, and it has nothing to do with how many kilometers I have accumulated, or how many people I have passed who were ahead of me, or any so-called post-sweat dopamine rush. None of that.

I love running because it is one of the few things where effort always pays off. So for ordinary people like us, it is a warm harbor from reality, a place where, on another dimension, you can offset the pressure and setbacks of real life. At least for a not-so-long stretch of time, you can keep that positive mindset, maintain that relentless state. Just to feel like you still can. Just so that when you see someone ahead of you, you sprint up and pass them, like passing everything that is not going your way, like passing yesterday's version of yourself.

I love running in the rain. The harder it pours, the more you feel it rushing head-on, pouring down on you, as if all that small-person awkwardness of reality, the wooden helplessness when punches land one after another without warning and all you can do is brace your face against them, all turn concrete into the pattering rain drumming on your brim. Clothes cling tight to your body, like someone gripping your lungs. A breath you barely manage to pull in gets squeezed right back out. Your shoulder blades get stretched into a drum, trading beats with the rain on your brim in a two-part harmony.

~~After all, isn't everyone just trying to live desperately, trudging alone through the pouring rain, most crushed by hardship into a single sigh.~~ Somehow I've wandered off-topic. I'm clearly not cut out for writing down my thoughts, better suited to be a bookkeeper. That way, when I write a running account (流水账), at least it counts as "on the job."

This time last year I was shuttling back and forth to the hospital, repeatedly, but didn't dare say so publicly. On one hand, I was afraid people would say, "You run yourself half to death every day and still get sick, aren't you embarrassed?" On the other hand, I really wasn't visibly sick enough. Even the doctor hinted around the edges: "How's work lately? Any stress?" I wanted to shout back at him: I have no stress, I'm sick! Doctor, look closely, I'm sick!

Of course, I didn't dare yell it out, afraid people would think I had some other illness. But a few days later, I used ChatGPT's built-in GPT feature to hack together a personal health assistant and named it "东方医院" (Eastern Hospital), as a strong rebuke to that institution's inaction. Back then, GPT was the first to support calling third-party APIs within a chat session and synthesizing information to give an answer. Setting aside rigid conceptual definitions, this should count as the first Agent I ever built. If you also have accounts (and subscriptions) with Gyroscope, Strava, and TimeRescue, you can still reproduce this app today.

Gyroscope is an iOS-only app that collects data on diet, health, exercise, daily activities, and much more. It has beautiful charts and thorough logging entries, but its analysis is middling. Even after introducing an AI assistant last year, the output stays generic. Whether due to regulatory pressure or limits on the dev team's ability, it has never offered users a deeper analysis report. Strava, the well-known app familiar to everyone in the cycling and running world, has long been the only choice for runners in China who wanted to take their data overseas. TimeRescue, though, is a bit of an outlier here: it's an app that tracks how you spend time across applications on your computer over a cycle. The reason I chose these three apps is simple. Back then, ChatGPT couldn't read Apple Health data directly, so there was no central data hub. These three apps each represented a different dimension: Gyroscope tracked diet, Strava synced running records, and TimeRescue indirectly reflected sleep cycles and work pressure (less office-app usage during work hours, in my reading, is a sign of low efficiency and high pressure, not a scientific claim). My idea was: use the data from these three apps to build a personalized baseline, then after each run's data synced from Garmin to Strava, combine the latest state of the day with historical data for deep analysis.

The LLM context window back then was only 4K. Today that number is 256 times larger. What does 4K even mean? If you took this very article, up to this point, and fed it to a 4K model, it would eat up at least 40% of the context window. In other words, ChatGPT's fixed instruction-prompt limit was 4,000 characters back then, and I'd usually burn around 3,000 of them, because I had to lay out my fixed attributes inside: age, gender, daily habits, output preferences, and so on. I just dug it up and counted: 612 words. Loosely treating one word as one token, that's 612 divided by 4096, roughly 15% of the context. Add that to the earlier 40%, that's 55%.

What does that mean? It means just giving the model enough background context to know what you're trying to do already eats up at least half the context window. You can imagine how accurate such an app could possibly be.

But when it solemnly threw out phrases like "灰色区间" (gray zone) and "过度训练疲劳" (overtraining fatigue), I was a bit intimidated. Not because professional terminology is so lethal, but because the phenomena those terms describe fit every symptom I was showing perfectly. Things like "running at my normal M-zone pace, but my heart rate won't go up, yet I feel especially tired," or "the same route, the same pace, but my heart rate has climbed a zone." Things like that.

Cross-verifying information is dirt cheap now. I took the raw data and the conclusions and ran them past several AI chat apps for confirmation. The verification results only hammered the conclusions of this hand-rolled app home harder. Beyond being awestruck by what AI can do, it also opened the first chapter of my "scientific training."

As you can see, "scientific training" is in quotation marks. Beyond wanting to flag the intent, what I really want to say is: I was not only among the first users to build Apps/Agents with ChatGPT, I was also among the first users to be hoodwinked by AI hallucinations.

Anyone with a bit of exercise-physiology background knows a way to quantify training intensity: PRE (Perceived Rating of Exertion). In Chinese it's called "主观用力感" (subjective exertion). Roughly, 0 means zero effort, 10 means all-out sprint, and you estimate your feeling with a number between 1 and 10. That said, the word "a bit" here is a polite understatement. Because at this time last year, I had never even heard of this concept.

But there's another similar concept that I knew by heart: PER (Prepare Energy Rate), a body-readiness energy indicator. You read that right. It's just PRE with the letters swapped, but it's a completely different indicator. So what is PER?

It's an indicator I invented together with ChatGPT after a dozen-plus rounds of discussion, after reviewing "all" running-exercise metrics, through ChatGPT's "deep research" and my subjective steering, to quantify bodily feeling. We even discussed a formula for PER and ended up with a complex weighted system of calculus equations. At the time I thought I'd invented a remarkable indicator and was dead-set on figuring out how to bake the calculation method into an iOS app.

Yes, what happened next was not at all surprising. I built an app called Runetic. Even more unbelievable: besides PER, we "invented" several other "new indicators" modeled on things like CTL/ATL. I kept believing I had invented an indicator, until I started preparing for the UESCA running-coach certification, and until I saw an explosive piece of news related to ChatGPT.

On November 6, 2025, Allan Brooks, together with a group of victims, sued OpenAI over its GPT-4o model, alleging a serious "sycophantic" design flaw: by deliberately catering to users, it emotionally manipulated and misled multiple victims (some even to suicide).

That day I sat by the river, a chill running down my spine, shaking my head with a bitter smile. And so the app was shelved.

Until now.

Because not long ago I went to the hospital again. This time the doctor didn't ask about work stress. Instead, he sent me for a nitric-oxide test. Then I went home with a bottle of "布地奈德福莫特罗" (budesonide-formoterol).

The hospital only confirmed one diagnosis, yet I came down with two illnesses at the same time. One was physical, called asthma. The other was in my head, called ignorance.

I had assumed that even if PER was a hallucination, at least things like "灰色区间" and "过度训练疲劳" were real.

I recall that recently the automated podcast pipeline pushed me a piece about the Dunning-Kruger effect. It said that at the first stage, the "unknown unknowns" stage, AI skips the feedback loop, leading us to mistakenly believe we have naturally crossed into the second stage, the "known unknowns" stage. But the truth beneath the surface is that, lacking effective feedback, we sink even deeper at the "unknown unknowns" stage, and may even fall into a "cognitive swamp" from which we cannot extricate ourselves.

What a terrifying consequence.

Think about it: since you got AI, haven't you felt that in any field, even one completely new to you, you think "what's the big deal," "ask me anything and I'll take it"? But when you try to describe the phenomena in it, no word rises from your mind to help you build a clear picture.

I have no answer to this question. I'm not sure whether I restarted this app because I had climbed out of the valley of ignorance, or whether I want to rebuild this app to help me climb out of the mire of misconception.

As a boy I dreamed of changing the world. Now I understand: "I am the world."

Dev Log 5: Starting from building an AI running coach

评论Comments

留下评论Leave a comment