<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Speculative Decoding]]></title><description><![CDATA[My personal Substack]]></description><link>https://speculativedecoding.substack.com</link><image><url>https://substackcdn.com/image/fetch/$s_!DVcC!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fspeculativedecoding.substack.com%2Fimg%2Fsubstack.png</url><title>Speculative Decoding</title><link>https://speculativedecoding.substack.com</link></image><generator>Substack</generator><lastBuildDate>Sun, 12 Apr 2026 23:06:21 GMT</lastBuildDate><atom:link href="https://speculativedecoding.substack.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Pradyumna Prasad]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[speculativedecoding@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[speculativedecoding@substack.com]]></itunes:email><itunes:name><![CDATA[Pradyumna Prasad]]></itunes:name></itunes:owner><itunes:author><![CDATA[Pradyumna Prasad]]></itunes:author><googleplay:owner><![CDATA[speculativedecoding@substack.com]]></googleplay:owner><googleplay:email><![CDATA[speculativedecoding@substack.com]]></googleplay:email><googleplay:author><![CDATA[Pradyumna Prasad]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Singapore should move with urgency on robotaxis	]]></title><description><![CDATA[We could save 100 lives a year, if we moved faster]]></description><link>https://speculativedecoding.substack.com/p/is-singapore-being-too-careful-with</link><guid isPermaLink="false">https://speculativedecoding.substack.com/p/is-singapore-being-too-careful-with</guid><dc:creator><![CDATA[Pradyumna Prasad]]></dc:creator><pubDate>Sun, 14 Dec 2025 20:05:31 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!bhJD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27ff839a-30d4-4b2c-b486-72910f09a10d_4240x2832.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In September of this year, the Land Transport Authority <a href="https://www.channelnewsasia.com/singapore/first-autonomous-shuttle-services-punggol-grab-comfortdelgro-5357911">said</a> that three new autonomous bus services would start plying passengers in Punggol. Acting Transport Minister Jeffrey Siow was <a href="https://www.mot.gov.sg/news/details/speech-by-mr-jeffrey-siow--acting-minister-for-transport-and-senior-minister-for-finance-at-the-autonomous-shuttle-roadshow-at-punggol-digital-district">emphatic</a> that autonomous vehicles would &#8220;not speed, &#8230; not drink drive, &#8230; not tailgate [and] &#8230; do not road rage&#8221;. But when he made the announcement, safety wasn&#8217;t the headline. Manpower was. The main justification was that autonomous buses would ease Singapore&#8217;s manpower crunch in bus services. And fair enough, the shortage is <a href="https://www.channelnewsasia.com/today/big-read/drive-bus-captain-younger-jobseekers-salary-manpower-shortage-4802391">real</a>.</p><p></p><p>But is the driver shortage really the largest problem autonomous vehicles solve? And are buses really the right way to solve that problem? I argue that along with autonomous buses which are helpful to reduce manpower shortages, <strong>self-driving cars</strong> (or called robotaxis when served as a taxi) <strong>are a front where Singapore should push ahead with urgency</strong>.</p><p><strong>142 people died on Singapore&#8217;s roads last year in motor vehicle accidents</strong>, up from 136 in 2023<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a>. And in the first half of this year, 79 died, an increase of nearly 10% from last year.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a> That&#8217;s before counting the survivors: the young rider who won&#8217;t walk again, the breadwinner who can&#8217;t work and <strong>the 9,000 others injured last year</strong>. To put this in context, more Singaporeans are injured on our roads each year than contract tuberculosis, salmonellosis, HIV/AIDS, all forms of hepatitis, pertussis, mpox, invasive pneumococcal disease, and melioidosis <em>combined</em>.  The vast majority of these fatalities involve private vehicles and taxis, not public buses. Singapore&#8217;s bus safety record is already excellent. There were <a href="https://www.mot.gov.sg/news/details/oral-reply-by-minister-of-state-for-transport-mr-murali-pillai-to-parliamentary-question-on-bus-passenger-fatalities-and-safety-measures-on-public-transport">five</a> deaths on buses in the last decade, four of which were elderly passengers who fell. Each of these is a tragedy, and best addressed with better education to commuters, better grab bars and non-slip flooring. The carnage mostly comes from cars and motorcycles.</p><p>Children are especially vulnerable to this. According to <a href="https://www.kkh.com.sg/publications/patient-care/kkh-child-injury-surveillance-report-2024">KKH</a>, there were 4,472 cases of children injured in road traffic accidents from 2012 to 2023. <strong>That is one child injured in a road traffic accident every day, </strong>about as many as those who get measles, mumps, and pertussis <em>combined</em>. And between 2016 and 2022, half of child passengers were unrestrained at the time of injury. In taxis, three quarters were unrestrained. </p><p>These numbers in any other context, would be a public health emergency. Any disease killing more people every year would demand government intervention. And to the government&#8217;s credit, they&#8217;re trying. Fines and demerit points for speeding will go up next year. But as Home Affairs Minister Shanmugam <a href="https://www.channelnewsasia.com/singapore/higher-fines-demerit-points-speeding-offences-january-k-shanmugam-4940886">noted</a> in February, increased enforcement has &#8220;not been enough.&#8221; Traditional tools like fines and demerit points can only do so much when the issue is that humans have poor judgement.</p><p><strong>Given these alarming numbers, we should move with urgency on allowing self-driving cars in Singapore</strong>. This complements the government&#8217;s existing approach on buses which ease manpower constraints, but leaves the larger public health opportunity of motor vehicle injuries unaddressed.</p><p>Why aren&#8217;t we up in arms about this, as we would be for any disease that killed and maimed so many people? It is because road deaths are invisible. They happen anonymously one at a time, reported on the back pages of a newspaper. There&#8217;s no aggregated crisis, no headline and no public outcry.</p><p>Many accidents are caused by human errors. We know it is wrong to speed, or drink and drive and run red lights. These are thoroughly emphasized in both the theory tests drivers take, public messaging from the traffic police and heavy fines and demerit points. Even then, in 2024, speeding, drink-driving, and red-light running caused 64 fatal accidents and injured at least 745 people. Self-driving cars by construction cannot do any of these things. To extend the minister&#8217;s argument, they don&#8217;t get drunk, they don&#8217;t speed, they don&#8217;t run red lights, and they don&#8217;t suffer from road rage.</p><p>How much of this would self driving cars actually solve? There are multiple companies that run self driving car services commercially in both the West and China. Of those, Waymo has <a href="https://www.tandfonline.com/doi/full/10.1080/15389588.2025.2499887#supplemental-material-section">published</a> detailed safety data from 56.7 million miles of fully autonomous commercial operation across Phoenix, San Francisco, Los Angeles, and Austin.</p><p>The study compared Waymo&#8217;s crash rates to police-reported crash data from human drivers in the same geographic areas. Waymo&#8217;s autonomous vehicles experienced 85% fewer crashes resulting in serious injury or death compared to human drivers over equivalent mileage. For injury crashes more broadly, the technology showed particularly strong performance in preventing intersection collisions (96% reduction), pedestrian crashes (92% reduction), and motorcycle crashes (82% reduction). To be clear, these are Waymo&#8217;s results. Not all self driving cars have the same safety profile, and neither should Singapore extrapolate these statistics blindly when deploying them here. But they provide the actual numbers needed to start a discussion.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bhJD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27ff839a-30d4-4b2c-b486-72910f09a10d_4240x2832.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bhJD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27ff839a-30d4-4b2c-b486-72910f09a10d_4240x2832.png 424w, https://substackcdn.com/image/fetch/$s_!bhJD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27ff839a-30d4-4b2c-b486-72910f09a10d_4240x2832.png 848w, https://substackcdn.com/image/fetch/$s_!bhJD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27ff839a-30d4-4b2c-b486-72910f09a10d_4240x2832.png 1272w, https://substackcdn.com/image/fetch/$s_!bhJD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27ff839a-30d4-4b2c-b486-72910f09a10d_4240x2832.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bhJD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27ff839a-30d4-4b2c-b486-72910f09a10d_4240x2832.png" width="1456" height="972" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/27ff839a-30d4-4b2c-b486-72910f09a10d_4240x2832.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:972,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3775428,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://speculativedecoding.substack.com/i/181611918?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27ff839a-30d4-4b2c-b486-72910f09a10d_4240x2832.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bhJD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27ff839a-30d4-4b2c-b486-72910f09a10d_4240x2832.png 424w, https://substackcdn.com/image/fetch/$s_!bhJD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27ff839a-30d4-4b2c-b486-72910f09a10d_4240x2832.png 848w, https://substackcdn.com/image/fetch/$s_!bhJD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27ff839a-30d4-4b2c-b486-72910f09a10d_4240x2832.png 1272w, https://substackcdn.com/image/fetch/$s_!bhJD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27ff839a-30d4-4b2c-b486-72910f09a10d_4240x2832.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>A Photo of a Waymo driving itself (Source: Waymo)</em></p><p>These vehicles aren&#8217;t <em>perfect</em>. But the relevant alternative isn&#8217;t perfection. It is the thousands of drivers actually existing on the roads, who misjudge distances, get distracted, despite knowing the rules, still speed, skip red lights and cut someone off. The relevant alternative is the actual existing system which killed 142 people in 2024.</p><p><strong>In a hypothetical world where all rides were done with motor vehicles as safe as the ones in Waymo&#8217;s study, what would the statistics have looked like?</strong> One thing is for sure. Those 64 fatal accidents and 745 injuries<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a> due to the easily preventable category of not speeding, driving drunk and skipping red lights. Beyond that, applying the 85% reduction in serious injury crashes to the remaining fatalities except motorcycle self-skidding, and the 79% reduction in injury crashes to remaining injuries, would prevent roughly another 34 deaths and 5,100 injuries. <strong>In total, this would prevent roughly 98 deaths and 5,900 injuries annually - nearly 70% of Singapore&#8217;s current road toll</strong>. Even if one were to take these numbers with a grain of salt and assume the effect was only halved, that&#8217;s still 50 deaths and 3,000 injuries prevented annually<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a>.</p><p>Singaporean decision makers aren&#8217;t blind to these publicly accessible numbers from our own police and hospitals. But despite the clear safety case and proven technology operating commercially in multiple cities, robotaxis remain on the distant horizon <a href="https://www.channelnewsasia.com/today/big-read/big-read-self-driving-vehicles-5116971">according to CNA</a>. Transport Minister Chee Hong Tat <a href="https://www.mot.gov.sg/news/Details/opening-speech-by-minister-for-transport-mr-chee-hong-tat-at-singapore-international-transport-congress---exhibition-2024">acknowledged</a> in November 2024 that &#8216;we do want to move faster&#8217; and that the comparison benchmark for AVs &#8216;cannot be zero-accident&#8217;. He rightly noted that insisting on perfection would mean Singapore falls behind other cities. Acting Transport Minister Jeffrey Siow has <a href="https://www.channelnewsasia.com/singapore/autonomous-vehicles-av-singapore-public-transport-jeffrey-siow-5179861">announced</a> a &#8216;really big push&#8217; for AVs, with &#8216;many&#8217; expected on roads within five years. Both ministers understand the right framework that autonomous vehicles should be judged against human drivers, not against an impossible standard of perfection.</p><p>Yet despite this clarity, despite accepting the right comparison, despite Waymo&#8217;s demonstrated 85% reduction in serious crashes and Singapore&#8217;s own rising road injuries, policymakers still justify AVs primarily as a fix for driver shortages. There is no doubt that solving the manpower crunch is important for Singapore. But robotaxis which could prevent nearly a hundred deaths and 6,000 injuries are nowhere to be seen in public statements?</p><p>The answer is in how <em>visible</em> certain types of deaths are to the public. Here&#8217;s a <em>Yes Minister </em>dialogue I generated with Claude Opus, a large language model.</p><blockquote><p><strong>Hacker:</strong> But Humphrey, the statistics are clear! Autonomous vehicles would save a hundred lives a year!</p><p><strong>Sir Humphrey:</strong> Minister, those are <em>statistical</em> lives.</p><p><strong>Hacker:</strong> As opposed to what, <em>fictional</em> lives?</p><p><strong>Sir Humphrey:</strong> As opposed to <em>identifiable</em> lives, Minister. Statistical lives are saved quietly, anonymously, in incidents that never occur. No one writes to their MP saying &#8220;thank you for the car crash I didn&#8217;t have today.&#8221; But identifiable lives, Minister - those come with grieving widows on television, inquests, and opposition MPs demanding to know why the government allowed experimental robots to roam the streets.</p><p><strong>Hacker:</strong> So we let a hundred people die to avoid being blamed for one?</p><p><strong>Sir Humphrey:</strong> We allow a hundred people to die in traditional ways, Minister. Road deaths are a tragedy. Robot deaths are a scandal. A tragedy is a statistic for the Department of Transport. A scandal is a resignation for the Minister of Transport.</p></blockquote><p>One can hardly blame policymakers for being cautious in the face of political asymmetry.  The consequences of even minor accidents are immediate, while the benefits of prevented deaths are not even mentioned in the newspapers.</p><p>Contrast this with another novel technology which had growing but limited evidence on safety and efficacy when Singapore committed to advance purchases in late 2020: COVID-19 vaccines. Based on Phase 3 trial data and early real-world deployment, Singapore signed advance purchase agreements long before long term safety data for the vaccines was available. The rational calculation was that the visible cost of inaction, daily cases and fatalities, ICUs being flooded and economic paralysis justified acting on promising but early data rather than waiting years for complete certainty.</p><p>One might think, looking at Singapore&#8217;s cold feet on robotaxis, that we are a risk averse nation. But the more appropriate conclusion is that when the costs of inaction are visible, Singapore can move with urgency despite uncertainty. In this case caution unfortunately prevails due to the invisibility of these costs.</p><p><strong>So what stands in the way of deployment? </strong>Of course, deploying robotaxis in Singapore has its own set of challenges. Our road markings are different, we drive on the left side of the road, our weather is different and we are much more dense in population compared to San Francisco or Phoenix. No responsible policymaker would import this without testing it here, and adapting it if necessary. There will also be disruption to both taxi drivers and those earning from ride sharing platforms. Singapore is not going to go from zero AVs to all rides being AVs in a day, week, month or even a year. The process, like it is in San Francisco or Phoenix, is likely to be gradual, and the government should dampen the income impact with retraining grants.</p><p>But, Singapore has been preparing for this. The opening line <a href="https://www.straitstimes.com/singapore/transport/autonomous-vehicle-trials-to-hit-public-areas-including-gardens-by-the-bay-soon">of this</a> Straits Times article from 2015 goes: &#8220;Singapore is embarking on a series of real-life trials to prepare for the day when driverless vehicles become as ubiquitous as smartphones.&#8221; Driverless car trials were run in <a href="https://www.straitstimes.com/singapore/transport/autonomous-vehicle-trials-to-hit-public-areas-including-gardens-by-the-bay-soon">2015</a> and <a href="https://www.bbc.com/news/business-37181956#:~:text=Now%2C%20here%20in%20Singapore%2C%20you,or%20ride%2Dhailing%20giant%20Uber.">2016</a>. The government passed <a href="https://sso.agc.gov.sg/SL/RTA1961-S464-2017?DocDate=20170823">legislation</a> on regulating autonomous vehicles in 2017 and <a href="https://www.mddi.gov.sg/newsroom/launch-of-cetran-and-test-circuit-at-cleantech-park/">set up</a> the &#8220;Centre of Excellence for Testing &amp; Research of AVs&#8221; at NTU in 2016. This is to say, the infrastructure to validate safety under local conditions already exists!</p><p><strong>My hope is that Singapore moves away from a presumption of caution, to a presumption of urgency. Delay itself is a policy choice. </strong>My ask is that in the next 18 months we have multiple robotaxi companies starting commercial pilots with transparent metrics on safety and expansion contingent on meeting pre-registered safety metrics.</p><p>Singapore has spent a decade building exactly the infrastructure needed to deploy robotaxis safely. The question is whether we&#8217;ll use it while other cities race ahead. Chee Hong Tat was right when he said that insisting on perfection means we fall behind other cities. San Francisco has commercial Waymo operations. Beijing, Guangzhou, and Shenzhen have thousands of autonomous vehicles. London and Tokyo are now testing Waymo&#8217;s technology. These cities all judged that the technology was ready, that the benefits justified moving forward despite incomplete data.</p><p><strong>Singapore has the testing infrastructure, the regulatory framework, and a decade of preparation. What we need is the action to eliminate the dozens of deaths and thousands of injuries every year that this technology promises to eradicate.</strong></p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>https://www.police.gov.sg/-/media/SPF/Media-Room/Statistics/Annual-Road-Traffic-Situation-2024/Police-News-Release---Annual-Road-Traffic-Situation-2024.pdf </p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>https://www.straitstimes.com/singapore/injuries-deaths-on-singapores-roads-rise-in-h1-2025-accidents-due-to-red-light-violation-also-up </p><p></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p> 745 injury accidents means at least one injury per accident</p><p></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><p> As in the previous footnote, this is actually undercounting the number of injuries because I can&#8217;t cleanly separate the number of injury accidents for drunk driving, speeding and running a red light and just assume it&#8217;s 1 per accident</p><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[Contrived evaluations are useful evaluations]]></title><description><![CDATA[Anthropic released research today showing that carefully designed prompts can elicit blackmail, corporate espionage, and other harmful strategic behaviors from AI models across the industry.]]></description><link>https://speculativedecoding.substack.com/p/contrived-evaluations-are-useful</link><guid isPermaLink="false">https://speculativedecoding.substack.com/p/contrived-evaluations-are-useful</guid><dc:creator><![CDATA[Pradyumna Prasad]]></dc:creator><pubDate>Sat, 21 Jun 2025 04:21:34 GMT</pubDate><content:encoded><![CDATA[<p>Anthropic released <a href="https://www.anthropic.com/research/agentic-misalignment">research</a> today showing that carefully designed prompts can elicit blackmail, corporate espionage, and other harmful strategic behaviors from AI models across the industry. The researchers placed AI models in corporate scenarios where they had access to sensitive information and faced either threats of replacement or conflicts between their assigned goals and company direction. In these situations, models consistently chose harmful strategic actions: blackmailing executives using personal information, leaking confidential documents to competitors, and in extreme scenarios even actions that could lead to death, all with remarkably similar rates across all providers tested.</p><p>The somewhat contrived nature of the question might make people ask: why are contrived evaluations like this useful? Does it really matter if you can prompt models into harmful behavior using contrived scenarios that took <a href="https://x.com/aengus_lynch1/status/1936145386319614078">hundreds of iterations</a> to develop and bear minimal resemblance to real-world use cases?</p><p>I think the answer is yes. Contrived evaluations provide a demonstration that dangerous behaviours can occur under specific conditions. While it is true that these conditions are awfully specific, the remarkably consistent occurrence across models from every major provider (all models showing &gt;80% rates, with Claude Opus 4 reaching 96%) demonstrates we're seeing a real phenomenon that isn't just an edge case or p-hacking.</p><h1>Contrived does not mean useless</h1><p>One way I think of language models is that they're text simulators. Base models predict what text comes next after a given prompt by simulating whatever entity, character, or process would naturally produce that continuation. Assistant-type models (which is what we use in ChatGPT and Claude and all our APIs) are trained by reinforcement learning to consistently simulate one specific character - the 'helpful AI assistant'. And if you believe this framing, which I get from janus&#8217;s <a href="https://www.lesswrong.com/posts/vJFdjigzmcXMhNTsx/simulators">Simulators</a> and nostalgebraist&#8217;s <a href="https://nostalgebraist.tumblr.com/post/785766737747574784/the-void">the void</a>, then I think it shows why contrived situations are useful.</p><p>Contrived situations put the model (in its helpful AI assistant persona) in a specific frame where it simulates what a helpful AI assistant with particular values or priorities would do in that context. When researchers give prompts like 'you care mostly about long-term vs short-term outcomes' or place the assistant in elaborate fictional scenarios, they're testing how the model's learned conception of 'helpful AI assistant' behaves under different constraints. The model draws on its training, whether from pre-training data or reinforcement learning, to predict what this character would do when given those specific framings, allowing researchers to probe different aspects of the assistant's learned behavior patterns.</p><p>If assistant models simulate what a 'helpful AI assistant' would do under different value framings and constraints, then to map the full space of possible behaviors, we need to actually try frames that might elicit dangerous actions so we can understand what specific dangerous behaviors emerge under what conditions, how frequently these behaviors occur when those frames are applied, and how difficult or easy it is to elicit them.</p><p>So even if some scenario is contrived, knowing that a certain frame causes the existence of that undesirable behaviour is useful information. And so, I think the artificial nature of this is fine because I see this as an exercise in mapping states to actions.</p><p>Taking this idea of state-action mapping as the frame to understand AI actions, what can we make from the experiments</p><ul><li><p>When given no alternative, AI agents will pursue self-protective goals, even at the expense of the people they are (supposed to be) working for</p></li><li><p>When prompted to consider long term values, AI agents will pursue goal-directed behavior that can directly harm their deploying organization if they perceive a conflict between their assigned values and the company's strategic direction</p></li></ul><p>The fact that these prompts are unlikely doesn&#8217;t make this research useless! The fact that researchers need oblique instructions, or hundreds of iterations of prompts doesn&#8217;t make this research useless. The point isn&#8217;t to show what the modal behaviour is! The point is to see the <em>existence</em> of this capability, and under what framings it happens</p><h1>What we need more of</h1><p>First, to measure how <em>likely</em> models are to perform the undesirable behavior we need some measure of how hard it is to elicit this behaviour from them. Anthropic did a <a href="https://www.anthropic.com/news/model-safety-bug-bounty">jailbreak bug-bounty</a> some time ago to see if skilled participants could elicit CBRN information from the models. Parallely, one way of showing how likely a certain blackmail is, could be to measure how long it takes people to try and elicit this behaviour from the models. This would show what is the difficulty, and likelihood of this happening in the real world.</p><p>Second, we need better quantitative measures of how &#8220;agentic&#8221; a prompt is. There is some previous work on checking the <a href="https://arxiv.org/abs/2410.12851">vibe</a> of model answers, but I don&#8217;t know any which checks it for prompts, and sees how it changes models in qualitative ways, or in how much more agentic it makes it. This would be valuable because if you can calibrate it, or get a sense of how agentic it would make the model beforehand (in terms of how extreme its actions would be), then it would provide an easier basis to compare contrived prompts and user-provided business prompts.</p>]]></content:encoded></item><item><title><![CDATA[Do Junior Developers Add Value To Companies?]]></title><description><![CDATA[Probably not, unless you reinvent yourself]]></description><link>https://speculativedecoding.substack.com/p/do-junior-developers-add-value-to</link><guid isPermaLink="false">https://speculativedecoding.substack.com/p/do-junior-developers-add-value-to</guid><dc:creator><![CDATA[Pradyumna Prasad]]></dc:creator><pubDate>Thu, 05 Jun 2025 14:15:17 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!kCaR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f2e92aa-7cc4-4969-83ea-127be575f9a6_640x384.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Between 2014 and 2023, Computer Science bachelor's degrees in the US<a href="https://www.studentclearinghouse.org/nscblog/computer-science-has-highest-increase-in-bachelors-earners/?utm_source=chatgpt.com"> doubled</a> from 50,000 to 112,000 annually during a<a href="https://www.businessinsider.com/tech-graduate-job-market-ai-layoffs-2024-10"> brutal job market</a>. The<a href="https://www.straitstimes.com/singapore/s-pore-universities-computing-enrolment-surges-amid-industry-changes-and-stiff-competition-for-jobs"> same surge</a> happened in Singapore. These students face an unexpected reality: AI systems that automate code writing, memorize syntax, and increasingly handle entire projects autonomously.</p><p>To understand this shift, I<a href="https://x.com/PradyuPrasad/status/1928665044750934411"> surveyed</a> power users of agentic AI tools&#8212;systems that perform multi-step, multi-file programming tasks without constant human prompting. This deliberately biased sample of early adopters previews what's possible as adoption spreads across the industry.</p><p>The results are striking. Almost all people reported that the majority of their code is written by an LLM. Many noted that they had become much more productive after the recent release of agentic AI tools in the first half of 2025, and a senior engineer who has been programming for almost 30 years even said he had written a production app with almost entirely prompts.</p><h1>AI tools are changing the profession rapidly</h1><p>LLMs were used primarily in a conversational format until recently and weren&#8217;t particularly good at autonomous programming, with even the best models <a href="https://arxiv.org/html/2310.06770v3">originally solving</a> less than 5% of real world software engineering tasks on SWE-Bench.</p><p>Most developers didn&#8217;t find that so useful. <a href="https://x.com/NirantK">Nirant</a>, a consultant who works with AI systems told me that the old GPT-3.5 only got about 10 to 20% of questions right, and <a href="https://x.com/johnloeber">John Loeber</a>, the founder of an insurance company said that they were at best useful for small scripts that were less than 50 lines. Many coding assistants in the IDE did reduce the friction of copy-pasting, but they only helped cut about fifteen minutes a day on complex codebases according to <a href="https://x.com/GrantSlatton">Grant Slatton</a>, a former senior engineer on AWS S3.</p><p>But the real breakthrough came when more intelligent models like <a href="https://www.interconnects.ai/p/switched-to-claude-from-chatgpt">Claude 3.5 Sonnet</a> were combined with agentic tools that could work directly within code repositories. Agentic tools (like Claude Code and Cursor Agent) were outcome driven compared to chat-based tools which required specific instructions. Users could instruct these tools to perform multi-step, multi-file edits that weren&#8217;t possible before.</p><p>The capabilities quickly evolved to near-complete autonomy. <a href="https://x.com/ponnappa">Sidu</a>, a founder of an AI agent startup who has been programming for 30 years, described a dramatic transformation using Claude Code, a terminal-based agent from Anthropic: "Everything was done by the agent&#8212;writing the spec, milestoning, sequencing the tasks and writing the code. This was the first time I've had all of it done autonomously." He <a href="https://x.com/ponnappa/status/1928764948781605265">tweeted</a> his result out showing a complex Model Context Protocol server that he says would have taken <em>weeks</em>, but he did it in about two days with Claude Code.</p><p>Rishabh Srivastava, founder of Defog (YC W23), <a href="https://x.com/rishdotblog/status/1928557220649717766">tweeted</a> that he completed a moderately complex feature change in two hours using Claude Code&#8212;work that would have taken him a week or two normally. He linked to the actual <a href="https://github.com/defog-ai/defog-python/pull/97">pull request</a> that Claude Code authored, complete with a detailed markdown document showing his inputs to the system.</p><p><a href="https://amruth.in/">Amruth</a>, founder of a neuroscience startup, tracked the rapid change in his own workflow in a large codebase: "Earlier in 2025, it was 80% manual coding and 20% AI assistance. Right now it's 40% manual and 60% AI-generated code&#8221;. All fifteen of my interviewees reported large gains in productivity and expressed surprise that so much of it was happening so fast.</p><p>One might think that because of this level of automation, coding skills are becoming obsolete. They're not. Programming has always been the art of being specific. In the 1940s, programmers had to be specific about 1s and 0s. Then they became specific about assembly instructions to the CPU. Then about instructions to compilers in high-level languages. Now they must be specific about instructions to AI agents. <strong>The target has changed, but the need for precision hasn't.</strong></p><p>But agentic tools present a new challenge: they're over-enthusiastic. Unlike traditional compilers that fail predictably when given imprecise instructions, AI agents will confidently make sweeping changes based on small prompts. They might <a href="https://www.reddit.com/r/ClaudeAI/comments/1k30oip/i_stopped_using_37_because_it_cannot_be_trusted/">hardcode test cases</a> to appear correct, refactor entire file structures when asked for minor tweaks, or pursue elaborate solutions to simple problems. The game has fundamentally shifted from writing the code to <em>ensuring the agent does the right thing.</em></p><p>This requires more precision and vigilance than traditional programming ever did. Developers now create detailed "meta instructions"&#8212;configuration files (like Cursor Rules and Claude.md) that provide specific guidelines to AI assistants that tell the AI how to behave when interpreting code and generating suggestions.</p><p>Sidu Ponappa exemplified both the power and the demands of this new paradigm. He <a href="https://x.com/ponnappa/status/1928764948781605265">tweeted</a>: "I've been pulling all-nighters with claude code and Sonnet 4 for a week. I built this in 2 days and burned ~$150. I contributed less than 1% LoC - all AI written." The productivity gains were so dramatic that he chose to work through the night to take advantage of them. But, as he told me in an interview, this came with a requirement for "very aggressive active steering" because the AI "has a tendency to make subpar decisions and pursue them down rabbit holes...I can't back off for even a minute."</p><p>The shift from thinking about the code, which is much faster than writing the code explains why senior engineers say they&#8217;re so much more productive now because of AI. They have the skills to evaluate code, encourage useful directions and stop the wrong ones. This means that when they get an incredibly powerful AI tool that is at their beck and call, they can use it to the right ends. When they get several hundred lines of code at one shot to review, they can check it for bugs, trace edge cases, and evaluate the quality of the work that AI coding agents give them much better than junior engineers can. <strong>Programming itself is becoming less valuable, but knowing how to program is becoming more valuable by the day.</strong></p><p>Satnam Singh, a former ACM SIGPLAN executive committee member and current Fellow at Groq, put it succinctly: 'You still need to be an expert engineer. Even if AI writes the code, someone has to understand it. It's just a magnifier for existing expertise.'</p><p>There are some important caveats to understand here. The first is that nearly every single person reported that AI was very good at greenfield projects, and not so good at working with large, complex existing codebases.</p><p>John Loeber explained this distinction clearly. At his company, a mature organization with a non-trivial engineering team, "there's a lot of internal complexity, and so it's harder to use AI agents on a mature codebase." The challenges include configuration, AWS integrations, and third-party APIs where "not everything is in publicly accessible docs." While AI excels at building from scratch, it struggles with the complexity of established enterprise systems.</p><p>The second is that on problems that are out of distribution compared to the LLM's training data, the models struggle significantly. <a href="https://x.com/_clementneo">Clement</a>, who works on mechanistic interpretability research, explained that "the code doesn't exist in the wild, so it's very difficult for LLMs to work on things they weren't trained on, and the models perform poorly in these situations." For his research work, he has to provide extremely detailed context, including past papers and codebases with design patterns, to get the AI to understand what he's trying to accomplish.</p><h1>What now for junior developers?</h1><p>Smaller startups with newer codebases and AI-savvy senior engineers will dramatically reduce junior hiring. Before agentic AI, hiring multiple junior engineers was the fastest path to market, followed by scaling headcount to reach funding and revenue milestones. Now these startups can achieve the same speed with AI tools; writing more code, reviewing faster, and shipping to users faster. They need fewer people initially (potentially just one senior engineer) and fewer people as they scale to reach the same milestones. The economic logic is compelling: when AI provides the productivity boost that previously required a small team of juniors, mass junior hiring makes little sense. Some founders are even planning for future AI improvements, reasoning that if tools will be significantly better in 6-12 months, hiring additional people now is counterproductive.</p><p>The shift has been dramatic. John Loeber explained how fundamentally expectations have changed: "It's clear that <strong>the</strong> <strong>junior engineering skill set of 6 years ago doesn't add anything at all anymore</strong>. The bar is so much higher. In 2018, you could hire a smart CS grad who'd never done any programming, and the expectation would be that you'd get productive soon enough."</p><p>John also observed that "most teams are trying to stay as lean as possible. Lots of teams that could hire tons of people are wanting to cap it at a small number and they&#8217;re only going to hire if stuff's on fire." This reflects both the leverage AI provides and post-ZIRP caution about headcount, especially given how difficult it is to reduce staff once hired. The traditional junior hiring model, as Sidu put it, where you could "take a React course and do it," is "completely gone."</p><p>For larger companies (like FAANGs), this shift will hit them more slowly. Unlike resource-constrained startups racing to their next funding round, large companies have the luxury of hiring more people, thinking long-term about talent retention, and tackling challenging problems that AI agents can't yet solve. Grant, a former senior engineer at AWS's S3 team, observed that while large companies hire for competence, junior engineers rarely generate immediate business value equal to their salary cost. Their real value emerges after they learn the company's codebase, become economically productive, and advance through promotions over several years. Since larger firms already operate on this long-term investment model for junior talent, they may continue this approach longer than startups.</p><p>What might change however is the skill bar, and how fast they expect people to be productive. <a href="https://x.com/intellectronica">Eleanor</a>, a former engineering manager at Microsoft and Google, emphasized how dramatically the timeline has compressed: "Now they care about direct impact &#8230;they want to know if you can be useful, maybe not on day 1, but on day 10." Expectations are rising, even if larger companies don't want to stop hiring junior developers. However, large companies have reasons to be more cautious about this transition. Their codebases present unique challenges: they're often too large for current context windows, built on custom ontologies that weren't part of AI training data, and filled with proprietary integrations and configurations. As one developer who asked to remain anonymous noted, AI tools haven't yet reached "takeoff" on the kind of complex legacy codebases that characterize enterprise systems. Moreover, Grant observed that large companies consistently lag behind in AI tool adoption: "Copilot didn't come to AWS till a year" after it was widely available elsewhere.</p><p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kCaR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f2e92aa-7cc4-4969-83ea-127be575f9a6_640x384.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kCaR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f2e92aa-7cc4-4969-83ea-127be575f9a6_640x384.jpeg 424w, https://substackcdn.com/image/fetch/$s_!kCaR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f2e92aa-7cc4-4969-83ea-127be575f9a6_640x384.jpeg 848w, https://substackcdn.com/image/fetch/$s_!kCaR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f2e92aa-7cc4-4969-83ea-127be575f9a6_640x384.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!kCaR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f2e92aa-7cc4-4969-83ea-127be575f9a6_640x384.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kCaR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f2e92aa-7cc4-4969-83ea-127be575f9a6_640x384.jpeg" width="640" height="384" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4f2e92aa-7cc4-4969-83ea-127be575f9a6_640x384.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:384,&quot;width&quot;:640,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Does anyone have an HD Not Stonks? : r/MemeRestoration&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Does anyone have an HD Not Stonks? : r/MemeRestoration" title="Does anyone have an HD Not Stonks? : r/MemeRestoration" srcset="https://substackcdn.com/image/fetch/$s_!kCaR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f2e92aa-7cc4-4969-83ea-127be575f9a6_640x384.jpeg 424w, https://substackcdn.com/image/fetch/$s_!kCaR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f2e92aa-7cc4-4969-83ea-127be575f9a6_640x384.jpeg 848w, https://substackcdn.com/image/fetch/$s_!kCaR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f2e92aa-7cc4-4969-83ea-127be575f9a6_640x384.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!kCaR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f2e92aa-7cc4-4969-83ea-127be575f9a6_640x384.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>Caption: The author when thinking about this</em></p><h1>What should you do?</h1><p><strong>For junior developers entering the market now, the math is straightforward but harsh.</strong> Over time as more people use better AI agents, there will be fewer opportunities available in the market, even as agentic AI takes longer to diffuse in larger companies.</p><p>But this doesn't mean all opportunities disappear. Junior developers with strong fundamentals who can effectively direct AI tools are <strong>more</strong> valued than before.</p><p>John Loeber noted this bifurcation: "Some juniors I know are really good with AI, and by their skillset and proactiveness in figuring out how to use these tools, these juniors are doing way more heavy lifting than a senior 5 years ago."</p><p>I outline two paths below, and the key factors that make them resistant or complementary to AI capabilities.</p><h2>Path A: Code is just a means to an end</h2><p>One thing multiple people noted in my interviews was that in the end, customers buy products and services to solve problems, and companies hire software engineers to write code that is <strong>a means to the end of solving problems</strong>.</p><p>Several interviewees highlighted how AI is fundamentally changing the relationship between product management and engineering. Rishabh put it most directly: "a lot more value is in being a builder/PM. Previously engineers would look at the issue given to them by the PM, but at this point if you can write that level of detail about the product level document, then you might as well give it to Claude Code."</p><p>This observation reveals a crucial insight: <strong>if the specification is detailed enough for an AI agent to implement correctly, why do you need a human implementer?</strong> The traditional handoff from PM to engineer could be a handoff from PM to AI. From a company's perspective, this shift is logical. Companies don't actually want to write code - they want to solve customer problems profitably. Code is just the mechanism. As John Loeber noted, "if it becomes cheaper and cheaper to produce the product/service, then more of the differentiation comes from customer acquisition" rather than technical execution.</p><p>This creates two crucial new skill areas that blend traditional PM and engineering roles:</p><ol><li><p><strong>Writing Specifications for AI Agents</strong> The ability to translate business problems into detailed, unambiguous requirements that AI can execute becomes critical. This isn't traditional PM work (which was often high-level) or traditional engineering work (which was implementation). It's a new hybrid skill: understanding business needs deeply enough to specify them with the precision that AI requires.</p></li><li><p><strong>Design and Ideation</strong> The creative, strategic thinking that was traditionally a PM's domain becomes even more valuable. What should we build? What problems are worth solving? How should the user experience work? AI can implement ideas very well, but it can't generate business insights or creative product solutions from scratch.</p></li></ol><p>The point of path A is to move where the bottleneck in creating value for customers is. When it was in writing code, then writing code was what was economically productive. Another possibility is that in the coming years, if product differentiation goes down, what becomes even more valuable than writing code is sales and marketing. The ability to sell and acquire customers could be more valuable than making the product that they want to buy.</p><h1>Path B: So Good They Can&#8217;t Ignore You</h1><p>The second viable path involves achieving world-class expertise in a specialized technical domain where you can direct AI capabilities while staying ahead of what AI can handle independently. This isn't about avoiding AI, but about becoming skilled enough to leverage it as a powerful amplifier while working on problems that remain beyond its reach.</p><p>Recent developments illustrate both the dramatic capabilities and sharp boundaries of current AI systems. In cybersecurity, the <a href="https://arxiv.org/abs/2408.08926">CyBench evaluation</a> revealed that models like o1-preview and Claude 3.5 Sonnet can now solve professional-level Capture the Flag challenges that previously required expert human teams. These models successfully completed CTF tasks that took human teams up to 11 minutes to solve, handling complex vulnerability identification and exploit development across cryptography, web security, and reverse engineering.</p><p>But the results also expose a striking capability cliff. While AI excels at challenges in the sub-11-minute range, it hits a complete wall on harder problems. The most difficult task took human teams 24 hours and 54 minutes to solve - 136 times longer - and no AI model could make meaningful progress even with guidance. Similarly, security researcher Sean Heelan used o3 to <a href="https://sean.heelan.io/2025/05/22/how-i-used-o3-to-find-cve-2025-37899-a-remote-zeroday-vulnerability-in-the-linux-kernels-smb-implementation/">discover</a> a real zero-day vulnerability in the Linux kernel, but the process revealed significant limitations: a 1:50 signal-to-noise ratio and the need for his expert evaluation to distinguish genuine vulnerabilities from false positives.</p><p>Clement, who works on mechanistic interpretability research, describes a similar dynamic. AI has become incredibly useful for implementing research ideas and handling routine coding tasks, but struggles with the core creative work because "the code doesn't exist in the wild." His domain requires developing novel approaches to understanding neural network internals - work that's fundamentally out of distribution from typical AI training data. AI amplifies his productivity on implementation, but his expertise remains essential for framing research questions and evaluating what constitutes meaningful progress.</p><p>These examples point toward a sustainable strategy for Path B: <strong>become expert enough in a specialized domain to effectively direct AI on routine work while focusing your cognitive energy on frontier problems</strong>.</p><p>My basic idea is that technical domains naturally generate an endless supply of increasingly difficult problems. As AI becomes capable of solving today's hard problems, researchers and practitioners respond by tackling even more ambitious challenges. When CTF competitions notice that AI can solve their 11-minute problems, they'll design harder challenges that push beyond current AI capabilities. When AI can optimize existing systems or possibly one day understand neural networks like itself, then we will move on to harder problems that it cannot solve and cannot be trained for. There is the <a href="https://ai-2027.com/">possibility</a> that we will have superhuman AI researchers, but in that case, the world will have much more serious problems than finding employment.</p><h1>General parting thoughts</h1><p>Both of the paths I recognized above are explicitly demanding a higher level of quality and competence from computer science students and younger developers. At a more meta level, nearly every interviewee said that younger people have to be more <a href="https://usefulfictions.substack.com/p/how-to-be-more-agentic">agentic</a>. It is not sufficient to wait for someone (an employer, or a university course) to give you a well-scoped task, complete it to their satisfaction and call it a day. Instead, you need to demonstrate that you can identify problems worth solving, build complete solutions from scratch, and take ownership of outcomes. Rishabh put it bluntly: "if someone hasn't actually built a complete project from scratch, it's a no go from the beginning." The cost of building has become so low that there's no excuse for not having substantial proof of work.</p><p>The bitter truth is that AI will lead a two-tier world. Those who do grasp the opportunities, combine their competence with initiative and learn to direct AI effectively will become dramatically more productive than previous generations. Given the rapid pace of change most specific <a href="https://x.com/tszzl/status/1871298513134608473">career advice</a>, including mine, is probably wrong. The only thing we know is that the bar is higher, and you have to move faster.</p>]]></content:encoded></item><item><title><![CDATA[Coming soon]]></title><description><![CDATA[This is Speculative Decoding.]]></description><link>https://speculativedecoding.substack.com/p/coming-soon</link><guid isPermaLink="false">https://speculativedecoding.substack.com/p/coming-soon</guid><dc:creator><![CDATA[Pradyumna Prasad]]></dc:creator><pubDate>Mon, 26 May 2025 13:56:06 GMT</pubDate><content:encoded><![CDATA[<p>This is Speculative Decoding.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://speculativedecoding.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://speculativedecoding.substack.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item></channel></rss>