Dev Log 4: Making a Podcast for Myself
I have listened to AI-generated speech with extremely high production quality—even Beijing accents are mimicked down to the finest detail, and you can tell whic
I have listened to AI-generated speech with extremely high production quality—even Beijing accents are mimicked down to the finest detail, and you can tell which syllables need emphasis, which need a pause. The price is not bad either: roughly 3 yuan for ten thousand characters. At a normal human speaking rate of 180–240 characters per minute, an hour-long script costs no more than 5 yuan. The key thing is: no accent, no verbal slip-ups, and you can even pick whatever voice timbre you like. Then I look at a show I am subscribed to—supposedly about AI—but I would rather call it an AI barometer. Because content producers today seem to have reached some unspoken consensus: shock or die trying. Every time I hear some big name in the field has pulled off another extraordinary feat, or that some model has once again surpassed humans at some capability, I cannot help feeling quietly ashamed—really that born as a human, I am sorry feeling—while also thinking, as we push new-intelligence productivity forward on every front, someone like me who is both old and not intelligent really is about to be swept onto the beach by the incoming tide.
At the time, the concept of entropy from another class was a big inspiration for me. It is originally a physics concept, describing the degree of disorder in a system. But applied to personal growth, it takes on a more concrete meaning. A person who does not take in new information, who only uses accumulated experience to face an ever-renewing era, will see their personal cognitive entropy keep growing—thinking becomes more chaotic and disordered. Like a southern summer with the windows shut and no AC blowing in cool air, things just keep getting hotter and hotter; or a lawn nobody tends for years, slowly turning into a wild thicket.
I figured: since I cannot count on others, and AI powerful capabilities are right at my fingertips, why not hand-roll a podcast for myself? I could not only customize the tone I want, but also pick viewpoints I find credible, and add a gate to filter out misinformation—basically the ideal information secretary. But following the steps from the last installment, a good podcast obviously cannot be done in one shot.
Nothing comes from nowhere. If a podcast is a dish, the most important thing is rounding up the ingredients you need.
Before AI existed, Python-based web scrapers were everywhere—no need to explain that. The difference now is that you can go further: let automated coding tools help you write and debug these scripts. You just need to be clear about which sites you want content from, or you could just say which field you are interested in and let AI research who the heavyweights are that run personal blogs in that field, which leading companies have official RSS feeds…and so on. But if you cook at home often, you naturally understand the importance of ingredient freshness. If AI fetches you articles from months ago—say, a paper in a slow-moving field, that is probably fine—but in today breakneck pace, news from a week or a day ago is already obsolete, let alone months.
The gathering step needs some tool-handling basics, but no real tricks. Besides, I did not strictly record my own workflow, so I cannot provide prompts you can paste straight in. There is also a paradox here: if I knew how to do it, I would not need to teach; if I needed someone to teach me, I would not need to make a podcast like this myself. So my approach is like ancient guqin scores: teaching by example, not by words. Either I do it and you listen, period.
At this stage, an ordinary cook would think it is time to fire up the wok and dump in the ingredients. Indeed, in normal cases, once you have collected the materials, you can have any AI tool organize them and generate a script suited for spoken delivery. With ChatGPT 1 billion monthly active users, I would guess this step is a piece of cake for everyone. And needless to say, collecting, synthesizing, and converting to speech (aka, TTS test-to-sound)—these three steps are the soul of any podcast.
But if every cook used that kind of sloppy soul-of-the-dish method, you would be eating nothing but scrambled eggs with tomatoes every day—tomato scrambled with egg or egg scrambled with tomato. When you sit down at a restaurant, the first thing you do is grab the menu. And a menu is not just a display page for the dishes; it is a relationship map between ingredients and flavors—a ledger that consumes ingredients and produces flavor.
You need a ledger to record which information sources you have tapped
For the diner, the menu is a ledger of dopamine and food; for the chef, it constrains what dishes can be made and sold, and what ingredients can be bought. When making a podcast for yourself, as a listener you first need to know what kinds of news you want to hear and how you want them presented. For example, I want tech-related content, but not mechanical recitation—I want storytelling. The tech-related plus storytelling is the dish you want to eat, and translated into a recipe it becomes: search for recent tech news, then turn it into a verbatim script in a storytelling style.
Simply put, we now have two fields. Industry: tech and target format: storytelling, plus the information sources and freshness requirements AI found earlier—it is rather like the kitchen suppliers and delivery specs. With this ledger, this contract in hand, write it up in structured text: JSON, YAML, whatever format looks good to you. This unassuming little file connects the two most important parts of the whole podcast production process. If AI is a gun, this file specifies which targets the gun shoots at, and how often each target gets shot. If you are like me, chasing AI news every day, always feeling like missing a day means you can never catch up again, the word Harness will not be unfamiliar. But looking back at the contract we just made—if this is not called a Harness, then what is?
In the process of using AI, Prompt Engineering, Agent Engineering, Harness Engineering, and more recently the suddenly-rising Loop Engineering—these much-hyped concepts, if you are patient enough to break down the projects you have done, and then turn back to look at these familiar-yet-strange bogeymen, you will suddenly realize: they are just problems that naturally arise when we use AI. Prompt solves the early-AI problem of semantic understanding not meeting the bar. Agent solves the problem of multiple AI threads coordinating with each other. Harness solves the problem of regulating AI behavior. And Loop solves the problem of, once all the above are solved, getting AI to produce stably.
But at this point, we notice another problem: beyond the ledger, do not we also need an inventory log? When multiple suppliers deliver the ingredients you need every day, without proper management, how do you make sure every ingredient supply and demand is just right?
The contract is not just a ledger; it is also a status log
The fields we recorded earlier are a bit like those big-V accounts you follow on Douyin or Xiaohongshu—they constrain what content you can see (let us not even mention platform recommendations). But have you noticed something you never cared about: the platform never recommends stuff you have already seen (homogeneous content does not count). Why? Because the platform has its own status management.
Come to think of it, it really fits a saying: iron camp, flowing soldiers. If you want every batch of AI on this assembly line to clearly know what info was caught before, what was used for which episode, what is unused and still fresh, and what is the latest data recently—after that round of analysis and recording, you can (almost) guarantee that every episode uses different material, that each episode has something new.
In the old days, any system with even a little scale used databases. As I see it, they are just for recording information itself, plus the state and boundaries of using that information. The former is the passport for data flow; the latter is the fence for data compliance. But for a podcast of this scale, a few text files are plenty. In fact, you do not even need to worry about where the files are stored or what is written in them. You just need to know wear gloves when going out in winter and tell AI; it will wait for winter, then find gloves, and put them on by itself.
I heard recently that 胡彦斌 used AI to build a community site for his fans, and was even fixing bugs at the airport. Some say the wall between code and programming has been torn down. I think it is just that we are now using energy more efficiently.
From the invention of the computer to now, from assembly to high-level languages, from procedural to object-oriented—all of this has served one purpose: communication between humans and computers. We encode our thoughts into language; the computer decodes binary into its own circuit switches; and from language to binary is the whole evolution of programming languages. Naturally, as the computer encoding capabilities keep improving, it has taken on more and more of the middle steps—until now, natural language has become that bridge. So it is not that the wall of code has been torn down; it is that the computer has stepped out from behind the wall and gotten closer to humans.
So, coming back to the podcast: we can call this assembly-line process programming too, because as the name suggests, 编 (bian) means to make, 程 (cheng) means procedure. We made a procedural assembly line that automatically produces podcasts—why would not that be called programming?
I used to say irresponsibly, code is dead. More accurately, code is just the middle step in information transmission (bidirectional, between humans and computers); its only role is carrying information. Whether code lives or dies depends only on whether the middle step still needs an intermediary. As for how to actually get something done through a computer—that is the essence of engineering.
评论Comments
加载中…Loading…
留下评论Leave a comment