Humanloop started out in 2020, they were working on a better way to train the state-of-the-art language models of that time. These models needed lots of manually annotated data to work best; Humanloop’s first product made it easier for anyone to do this annotation work, while drastically reducing the overall amount of manual work required.</p><p>But they sensed a shifting tide.</p><p>“About two years ago we were watching what was happening with large language models,” says Humanloop co-founder Raza Habib, “and we realized that the biggest risk to us as a business was that these large language models would get really good — the paradigm for how people build AI would change substantially, and you wouldn’t need this annotation any more.”</p><p>In what would prove to be a prescient move, they started exploring a pivot just months before ChatGPT would debut. Instead of helping people annotate their training data, Humanloop would give teams the tools to evaluate how well their LLM-based AI applications were working, and help team members — technical or not — collaborate on building them.</p><p>“Pivoting your company… it’s a scary thing to do,” says Raza. “So we gave ourselves two weeks; we’d make some mockups and go out to the people we know are building with these [large language models] and see if anyone would pay for this. If we could get ten paying customers in two weeks, that’d be a strong enough signal that it’s worth pivoting the company. “</p><p>“In the end, it took us two days,” he says. Today Humanloop counts companies like Gusto, Vanta, and Duolingo as customers — effectively serving as their collaborative LLM playground to find the best prompts, evaluate different models, and track changes over time.</p><p>This week Humanloop is launching a podcast series called <strong>High Agency</strong> (on <a href=https://www.ycombinator.com/"https://podcasts.apple.com/de/podcast/high-agency-the-podcast-for-ai-builders/id1747605459/">Apple Podcasts</a>, <a href=https://www.ycombinator.com/"https://open.spotify.com/show/5Gl6eoYBldXNveJh0DZKrY?si=d40eded314574e1a\%22>Spotify, and <a href=https://www.ycombinator.com/"https://www.youtube.com/@builtwithhumanloop/">YouTube) where Raza will talk to others building at the forefront of AI to compare notes on what works and what doesn’t in this still-early field. The first episodes will feature interviews with the CTOs at Ironclad, Zapier, Sourcegraph and Hex; all companies that have built great things with LLMs in production — but as Raza puts it, \"nobody is an expert yet, and everyone is learning by doing.\"</p><p>With that in mind, I asked Raza to break down some of the most common mistakes he sees teams making when building on top of LLMs. Here’s what he told me:</p><p><strong>Not having consistent, systematic evaluation in place:</strong></p><p>Figure out what “good” looks like for your AI product’s output, then figure out how to measure against that as you build. </p><p>“If teams don’t have a good way of measuring what ‘good’ looks like,” says Raza, “they’ll spin their wheels for a long time changing things and not really knowing if they’re making any progress.” </p><p>“Everyone wants things that are fast; everyone wants things that are cheap, and accurate. But you’re going to have [criteria] that are really use-case specific to you.”</p><p>If you’re building an AI chat bot for helping someone practice a new language, maybe that means checking the output to ensure it only uses words appropriate for the user’s skill level. If you’re building an AI coach, perhaps that means double checking that each of your user’s stated goals gets mentioned and addressed. </p><p>But there’s more to it than just running the prompt a few times and making sure it all looks reasonable; the systems have to be in place to check the output regularly, as prompts change and the underlying models evolve. </p><p>“For [traditional software development], you write a piece of code, and every time you run it, it does the same thing. Same inputs, same outputs. But with an LLM? Same input, multiple outputs — every time you run it, you’ll get something slightly different.”</p><p>“One of the biggest mistakes people make is just eyeballing [one-off] examples,” he notes. “It doesn’t give them a rigorous enough sense of whether or not they’re making things better.”</p><p><strong>Not paying attention to (sometimes silent) user feedback:</strong></p><p>“What ‘good’ looks like is very subjective!” notes Raza.  “What is a good summary for this call? What is a good sales email? There isn’t just one single correct answer.”</p><p>“What your customer says is good is the ultimate answer,” says Raza. But they don’t always say those things out loud.</p><p>“You want to be capturing different sources of end-user feedback,” he notes. “That can be explicit things, like votes — those little thumbs up/thumbs down buttons. But it’s also implicit things that users do within your application that correlate well with whether or not it’s working. If you helped them generate that sales email, did they actually send it?”</p><p>“Plan ahead, when you’re designing an application, to capture the user signals that tell you if it’s working; you want to design that in from the beginning.”</p><p><strong>Not closely tracking prompt history:</strong></p><p>“Another error is not treating prompt management with the same rigor you treat code management,” says Raza.</p><p>The prompts you’re using will change over time; it’s key to track those changes and know why they were made.</p><p>“People start off doing this and they use [shared docs], they’re copying &amp; pasting things in Slack, and they’re losing the history of their experimentation. New people join the team and it’s hard to know what was tried before. Something will be in production for months and you’ll make a change; is this better or worse than what we had before? They don’t know!”</p><p><strong>Not fine-tuning the model:</strong></p><p>For most purposes and when proving your idea works, you can probably get pretty far with the popular base models. But eventually, Raza suggests, you’ll want to fine-tune them for your needs. Good fine-tuning will give you better results, lower latency, and reduce costs in the long run.</p><p>“We recommend to everyone that prompt engineering is where they should start, because it’s the easiest, fastest, and most powerful thing,” says Raza. “but you can get order-of-magnitude cost savings if you fine-tune your models later.” </p><p>“The best way to think about fine-tuning is as an optimization,” he notes. “You want to avoid optimizing prematurely, but once you’ve validated that there’s demand for your product then it should become a focus.”</p><p><strong>Not having domain experts write the prompts:</strong></p><p>If you’re building LLM products for a specific vertical or industry, bring in people who really <em>know </em>the topic to help write the prompts and evaluate the output — don’t rely on engineers to do it alone. Large language models are, clearly, all about language. Language is nuanced, and the lexicons of different industries are deep.</p><p>“This is work that’s best done by domain experts,” says Raza. “It’s one of those things that’s obvious in retrospect, but wasn’t obvious at the start.”</p><!--kg-card-begin: html--><hr /><!--kg-card-end: html--><p><em>If you’re building with LLMs, be sure to <a href=https://www.ycombinator.com/"https://humanloop.com//">check out Humanloop here</a>, and find Raza’s new podcast for AI builders, High Authority, <a href=https://www.ycombinator.com/"https://www.youtube.com/watch?v=vlKEdulzAc0\%22>on YouTube here</a>.</em></p>","comment_id":"665f5c5c5df3e40001fdc7fb","feature_image":"/blog/content/images/2024/06/raza-2.png","featured":true,"visibility":"public","email_recipient_filter":"none","created_at":"2024-06-04T11:26:36.000-07:00","updated_at":"2024-06-04T12:45:09.000-07:00","published_at":"2024-06-04T11:57:49.000-07:00","custom_excerpt":"After working with hundreds of teams to make better products with LLMs, Humanloop CEO Raza Habib shares some of the most common mistakes he's seen.","codeinjection_head":null,"codeinjection_foot":null,"custom_template":null,"canonical_url":null,"authors":[{"id":"645000ebb09be6000165fbad","name":"Greg Kumparak","slug":"greg","profile_image":"/blog/content/images/2023/05/greg.jpeg","cover_image":null,"bio":"Greg oversees editorial content at Y Combinator. He was previously an editor at TechCrunch for nearly 15 years.","website":null,"location":null,"facebook":null,"twitter":null,"meta_title":null,"meta_description":null,"url":"https://ghost.prod.ycinside.com/author/greg/"}],"tags":[{"id":"665f5d8b5df3e40001fdc812","name":"#21894","slug":"hash-21894","description":null,"feature_image":null,"visibility":"internal","og_image":null,"og_title":null,"og_description":null,"twitter_image":null,"twitter_title":null,"twitter_description":null,"meta_title":null,"meta_description":null,"codeinjection_head":null,"codeinjection_foot":null,"canonical_url":null,"accent_color":null,"url":"https://ghost.prod.ycinside.com/404/"}],"primary_author":{"id":"645000ebb09be6000165fbad","name":"Greg Kumparak","slug":"greg","profile_image":"https://ghost.prod.ycinside.com/content/images/2023/05/greg.jpeg","cover_image":null,"bio":"Greg oversees editorial content at Y Combinator. He was previously an editor at TechCrunch for nearly 15 years.","website":null,"location":null,"facebook":null,"twitter":null,"meta_title":null,"meta_description":null,"url":"https://ghost.prod.ycinside.com/author/greg/"},"primary_tag":null,"url":"https://ghost.prod.ycinside.com/raza-habib-building-better-ai-products-with-llms/","excerpt":"After working with hundreds of teams to make better products with LLMs, Humanloop CEO Raza Habib shares some of the most common mistakes he's seen.","reading_time":5,"access":true,"og_image":null,"og_title":null,"og_description":null,"twitter_image":null,"twitter_title":null,"twitter_description":null,"meta_title":null,"meta_description":null,"email_subject":null,"frontmatter":null,"feature_image_alt":null,"feature_image_caption":null},"mentions":[{"id":21894,"slug":"humanloop","name":"Humanloop","batch_name":"S20","small_logo_url":"https://bookface-images.s3.amazonaws.com/small_logos/36780ef46af5eeec8e865ac191d6d27af56ff936.png","one_liner":"Humanloop is the LLM evals platform for enterprises. ","website":"https://humanloop.com","long_description":"Humanloop is the LLM evals platform for enterprises. Teams at Gusto, Vanta and Duolingo use Humanloop to ship reliable AI products. We enable you to adopt best practices for prompt management, evaluation and observability.","tags":["Artificial Intelligence","Generative AI","Machine Learning","SaaS"],"ycdc_status":"Active","logo_url":"https://bookface-images.s3.us-west-2.amazonaws.com/logos/8e431d9ef365044acdd727afd8539beb3b0ecb84.png?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ASIAQC4NIECAFKQT62E2%2F20250528%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Date=20250528T124334Z&X-Amz-Expires=3600&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEKv%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaCXVzLXdlc3QtMiJGMEQCIHtD5fpHGJ9AMZ0qfv6BUkshXUwaqYrw%2BPE476jVdrMTAiBes9Fp376pNJq52R9f7jIuSMZQywshoMHI83NHLgkHOSrlAwh0EAAaDDAwNjIwMTgxMTA3MiIM2WmIBFr5p9n%2B0s2NKsID0SVjQOl0SmI1om7Pdot%2BLw4WU3P5LG43f3Nari%2Fxf6BW7rgqL9EYU1O2fY68ssh0UUKorvIuCPHUhKChkIHG7q9kYAtmsMYRuTzs1dY3Z7DPFPqkFXg3B1cpg1gDbTQIu%2Fr9eqJjezaCAsx0vWIg1vQ4nPKeiCXQt1CUvD%2Ba0PqRawaJMg7Qy4lqzmlXXfj%2FKnJiK41HfRMfmXvo0Z0C08RjRN6MgXnf0qZ7sPdGwZyzFEX%2BIczT4dvUk7lw7o6mPemEQWyYQ9RKd3mZi5PqCcd4sBKwYaJojr7Gkl6juGtCs%2F%2BbDskn%2FzZRxGr2oINxZMGT5UHHuBF%2FSfTIYqJGNfwb77T2X6jbU3bocDJVoUBf7fi668A%2F2Ms7ui1ye%2BbiyRDjWFR6G%2BqG1RvK9cr%2BIvmtue5kU1AG9Jy6UGCX2DvOa1VEs1isvrBB7WwNxqXWn61LfhLGhaXqp%2BLUw2c0ZjCoSWsTbjJt3YcWc5UD5bxp7qa3CxX%2BbmtnTW89gbL1gH%2FXLV69ixqECk16xfsZUwDdz%2Fp2CtgWG5BvxCGki9k7QscaD2iALYoWpClS%2FdsJ%2FSkEIw%2BffLyplEawc2ETfSloMJrR28EGOqYBQqtVcuS%2BYZ5Bxl4gsOFCXrzlP4dlzRwB80Adi38DBLoT0KRIZZGfki4Vd8Y2gO31RoMHS6mr1tEU23mS4uyTScM8ZGwZTNIQQEVura1rd8mNTB4Rkopp1sgwdYEc1Kl0uCyXn71JgzTtnYxPOqcVUajV980GbFSjLfKxgs48JtBp23mUBx2PhXmfdNjgS9Kw0o1nBHLUuveQxxgw1FAUCxhxpJjwOA%3D%3D&X-Amz-SignedHeaders=host&X-Amz-Signature=198fe4893ad1904d018ceb0296da34f54c37064e5daa7cd42bb13d2d3de4e212","year_founded":2020,"team_size":14,"location":"San Francisco","linkedin_url":"https://www.linkedin.com/company/humanloop","twitter_url":"https://x.com/humanloop","fb_url":"","cb_url":"https://www.crunchbase.com/organization/humanloop","is_hiring":true,"active_job_count":6}],"related_posts":[{"id":"67d344f069c1fe0001271630","uuid":"480eecc3-7955-4522-9a0a-6631d93268e9","title":"East Coast College Tour 2025","slug":"east-coast-college-tour-2025","html":"<p>YC is going on a college tour! We're visiting Harvard, Yale, MIT, Columbia, Cornell, and Cornell Tech. </p><p>If you're a current student at any of those schools and interested in startups, we'd love to meet you.</p><p>Here are dates and links to sign up - please note these events are only for current students at these schools:</p><p><strong>Cornell</strong> - Mar 23, 2025 <br><a href=https://www.ycombinator.com/"https://events.ycombinator.com/yc-cornell/">Sign up link</a></p><p><strong>Cornell Tech</strong> - Mar 25, 2025 <br><a href=https://www.ycombinator.com/"https://events.ycombinator.com/yc-cornell-tech/">Sign up link</a></p><p><strong>Columbia</strong> - Mar 26, 2025<br><a href=https://www.ycombinator.com/"https://events.ycombinator.com/yc-columbia/">Sign up link</a></p><p><strong>Yale</strong> - Mar 27, 2025 <br><a href=https://www.ycombinator.com/"https://events.ycombinator.com/yc-yale/">Sign up link</a></p><p><strong>Harvard</strong> - Mar 28, 2025 <br><a href=https://www.ycombinator.com/"https://events.ycombinator.com/yc-harvard/">Sign up link</a></p><p><strong>MIT</strong> - Apr 18, 2025 <br><a href=https://www.ycombinator.com/"https://events.ycombinator.com/yc-mit/">Sign up link</a></p>","comment_id":"67d344f069c1fe0001271630","feature_image":"/blog/content/images/2025/03/BlogTwitter-Image-Template--1-.jpg","featured":true,"visibility":"public","email_recipient_filter":"none","created_at":"2025-03-13T13:49:52.000-07:00","updated_at":"2025-03-13T14:26:52.000-07:00","published_at":"2025-03-13T14:26:20.000-07:00","custom_excerpt":"YC is going on a college tour! We're visiting Harvard, Yale, MIT, Columbia, Cornell, and Cornell Tech. \n\nIf you're a current student at any of those schools and interested in startups, we'd love to meet you.","codeinjection_head":null,"codeinjection_foot":null,"custom_template":null,"canonical_url":null,"authors":[{"id":"61fe29e3c7139e0001a7106f","name":"Y Combinator","slug":"yc","profile_image":"/blog/content/images/2022/02/yc.png","cover_image":null,"bio":null,"website":null,"location":null,"facebook":null,"twitter":null,"meta_title":null,"meta_description":null,"url":"https://ghost.prod.ycinside.com/author/yc/"}],"tags":[{"id":"61fe29efc7139e0001a71173","name":"YC News","slug":"yc-news","description":null,"feature_image":null,"visibility":"public","og_image":null,"og_title":null,"og_description":null,"twitter_image":null,"twitter_title":null,"twitter_description":null,"meta_title":null,"meta_description":null,"codeinjection_head":null,"codeinjection_foot":null,"canonical_url":null,"accent_color":null,"url":"https://ghost.prod.ycinside.com/tag/yc-news/"},{"id":"61fe29efc7139e0001a71179","name":"YC Events","slug":"yc-events","description":null,"feature_image":null,"visibility":"public","og_image":null,"og_title":null,"og_description":null,"twitter_image":null,"twitter_title":null,"twitter_description":null,"meta_title":null,"meta_description":null,"codeinjection_head":null,"codeinjection_foot":null,"canonical_url":null,"accent_color":null,"url":"https://ghost.prod.ycinside.com/tag/yc-events/"}],"primary_author":{"id":"61fe29e3c7139e0001a7106f","name":"Y Combinator","slug":"yc","profile_image":"https://ghost.prod.ycinside.com/content/images/2022/02/yc.png","cover_image":null,"bio":null,"website":null,"location":null,"facebook":null,"twitter":null,"meta_title":null,"meta_description":null,"url":"https://ghost.prod.ycinside.com/author/yc/"},"primary_tag":{"id":"61fe29efc7139e0001a71173","name":"YC News","slug":"yc-news","description":null,"feature_image":null,"visibility":"public","og_image":null,"og_title":null,"og_description":null,"twitter_image":null,"twitter_title":null,"twitter_description":null,"meta_title":null,"meta_description":null,"codeinjection_head":null,"codeinjection_foot":null,"canonical_url":null,"accent_color":null,"url":"https://ghost.prod.ycinside.com/tag/yc-news/"},"url":"https://ghost.prod.ycinside.com/east-coast-college-tour-2025/","excerpt":"YC is going on a college tour! We're visiting Harvard, Yale, MIT, Columbia, Cornell, and Cornell Tech. ","reading_time":1,"access":true,"og_image":null,"og_title":null,"og_description":null,"twitter_image":null,"twitter_title":null,"twitter_description":null,"meta_title":null,"meta_description":null,"email_subject":null,"frontmatter":null,"feature_image_alt":null,"feature_image_caption":null},{"id":"66fed3c0d896210001baba58","uuid":"9744fe1c-686a-4276-90c3-5fea6f663027","title":"Join us at Startup School Europe in London - November 23","slug":"startup-school-europe-2024","html":"<p>Applications for Startup School Europe are now open! Apply <a href=https://www.ycombinator.com/"https://events.ycombinator.com/suseurope/">here.

STARTUP SCHOOL</a></strong><br>Saturday, November 23, 2024<br>10am-4:00pm<br>London, UK<br><br>Startup School is <a href=https://www.ycombinator.com/"https://www.ycombinator.com//" rel=\"noopener noreferrer\">Y Combinator’s</a> free-to-attend, one-day conference for college students and recent grads, where you’ll hear stories and practical advice from founders and YC partners. For the first time in YC history, we’ll be hosting this event in London!</p><p>If you're a student or recent grad who has been thinking about starting a startup one day (or are in the very earliest stages of building one), we hope to meet you there!</p><p><strong>This year’s speakers include:</strong></p><ul><li><a href=https://www.ycombinator.com/"https://www.paulgraham.com//" rel=\"noopener noreferrer nofollow\">Paul Graham, Co-Founder of Y Combinator</a></li><li><a href=https://www.ycombinator.com/"https://www.ycombinator.com/people/tom-blomfield/" rel=\"noopener noreferrer nofollow\">Tom Blomfield, Co-Founder of Monzo and Y Combinator Group Partner</a></li><li><a href=https://www.ycombinator.com/"https://posthog.com//" rel=\"noopener noreferrer nofollow\">James Hawkins, Posthog</a></li><li><a href=https://www.ycombinator.com/"https://www.photoroom.com//" rel=\"noopener noreferrer nofollow\">Matt Rouif, Photoroom</a></li><li>and more!</li></ul><!--kg-card-begin: html--><center>\n<a href=https://www.ycombinator.com/"https://events.ycombinator.com/suseurope/" class=\"ycdc-btn mt-2 ml-[13px] mt-[-10px] items-center justify-center\">Apply To Attend</a>\n</center><!--kg-card-end: html-->","comment_id":"66fed3c0d896210001baba58","feature_image":"/blog/content/images/2024/10/BlogTwitter-Image-Template--9-.jpg","featured":true,"visibility":"public","email_recipient_filter":"none","created_at":"2024-10-03T10:26:24.000-07:00","updated_at":"2024-10-03T11:30:00.000-07:00","published_at":"2024-10-03T11:30:00.000-07:00","custom_excerpt":"Applications for Startup School Europe are now open!","codeinjection_head":null,"codeinjection_foot":null,"custom_template":null,"canonical_url":null,"authors":[{"id":"61fe29e3c7139e0001a7106f","name":"Y Combinator","slug":"yc","profile_image":"/blog/content/images/2022/02/yc.png","cover_image":null,"bio":null,"website":null,"location":null,"facebook":null,"twitter":null,"meta_title":null,"meta_description":null,"url":"https://ghost.prod.ycinside.com/author/yc/"}],"tags":[{"id":"61fe29efc7139e0001a7117f","name":"Startup School","slug":"startup-school","description":null,"feature_image":null,"visibility":"public","og_image":null,"og_title":null,"og_description":null,"twitter_image":null,"twitter_title":null,"twitter_description":null,"meta_title":null,"meta_description":null,"codeinjection_head":null,"codeinjection_foot":null,"canonical_url":null,"accent_color":null,"url":"https://ghost.prod.ycinside.com/tag/startup-school/"},{"id":"61fe29efc7139e0001a71179","name":"YC Events","slug":"yc-events","description":null,"feature_image":null,"visibility":"public","og_image":null,"og_title":null,"og_description":null,"twitter_image":null,"twitter_title":null,"twitter_description":null,"meta_title":null,"meta_description":null,"codeinjection_head":null,"codeinjection_foot":null,"canonical_url":null,"accent_color":null,"url":"https://ghost.prod.ycinside.com/tag/yc-events/"},{"id":"61fe29efc7139e0001a71173","name":"YC News","slug":"yc-news","description":null,"feature_image":null,"visibility":"public","og_image":null,"og_title":null,"og_description":null,"twitter_image":null,"twitter_title":null,"twitter_description":null,"meta_title":null,"meta_description":null,"codeinjection_head":null,"codeinjection_foot":null,"canonical_url":null,"accent_color":null,"url":"https://ghost.prod.ycinside.com/tag/yc-news/"}],"primary_author":{"id":"61fe29e3c7139e0001a7106f","name":"Y Combinator","slug":"yc","profile_image":"https://ghost.prod.ycinside.com/content/images/2022/02/yc.png","cover_image":null,"bio":null,"website":null,"location":null,"facebook":null,"twitter":null,"meta_title":null,"meta_description":null,"url":"https://ghost.prod.ycinside.com/author/yc/"},"primary_tag":{"id":"61fe29efc7139e0001a7117f","name":"Startup School","slug":"startup-school","description":null,"feature_image":null,"visibility":"public","og_image":null,"og_title":null,"og_description":null,"twitter_image":null,"twitter_title":null,"twitter_description":null,"meta_title":null,"meta_description":null,"codeinjection_head":null,"codeinjection_foot":null,"canonical_url":null,"accent_color":null,"url":"https://ghost.prod.ycinside.com/tag/startup-school/"},"url":"https://ghost.prod.ycinside.com/startup-school-europe-2024/","excerpt":"Applications for Startup School Europe are now open! Apply here.STARTUP SCHOOLSaturday, November 23, 202410am-4:00pmLondon, UKStartup School is Y Combinator’s free-to-attend, one-day conference for college students and recent grads, where you’ll hear stories and practical advice from founders and YC partners. For the first time in YC history, we’ll be hosting this event in London!","reading_time":1,"access":true,"og_image":null,"og_title":null,"og_description":null,"twitter_image":null,"twitter_title":null,"twitter_description":null,"meta_title":null,"meta_description":null,"email_subject":null,"frontmatter":null,"feature_image_alt":null,"feature_image_caption":null},{"id":"68264164f3981b000154b180","uuid":"5a9a94e8-2438-4708-99fb-aad836a8cf03","title":"Welcoming Jon Xu and Andrew Miklas as YC’s Newest General Partners","slug":"welcome-jon-and-andrew","html":"<p>Over the last few batches, visiting partners Jon Xu and Andrew Miklas have become two of the most trusted voices in the YC community. They were both founders in YC’s Summer 2010 batch, and now, 15 years and two industry-shaping companies later, they’re back in a new role: General Partners at YC.</p><p>As visiting partners, Jon and Andrew have already made a big impact, working closely with hundreds of founders across multiple batches. Now, as full-time General Partners, they’ll play an even bigger role in selecting, advising, and supporting the companies we back.</p><p>They bring deep technical experience, firsthand startup grit, and a strong track record of helping early-stage companies grow – all of which makes them a perfect fit for YC’s mission to help ambitious founders build enduring companies from the earliest stages.</p><figure class=\"kg-card kg-image-card kg-card-hascaption\"><img src=https://www.ycombinator.com/"https://ghost.prod.ycinside.com/content/images/2025/05/X25-0421_009.jpg/" class=\"kg-image\" alt loading=\"lazy\" width=\"2000\" height=\"1334\" srcset=\"https://ghost.prod.ycinside.com/content/images/size/w600/2025/05/X25-0421_009.jpg 600w, https://ghost.prod.ycinside.com/content/images/size/w1000/2025/05/X25-0421_009.jpg 1000w, https://ghost.prod.ycinside.com/content/images/size/w1600/2025/05/X25-0421_009.jpg 1600w, https://ghost.prod.ycinside.com/content/images/size/w2400/2025/05/X25-0421_009.jpg 2400w\" sizes=\"(min-width: 720px) 720px\"><figcaption>Jon Xu</figcaption></figure><p>Jon is the co-founder and former CTO of FutureAdvisor (YC S10), one of the first robo-advisors to make quality investment management accessible to a broader consumer audience. After the company was acquired by BlackRock in 2015, Jon continued to lead product and engineering, building an enterprise-grade robo-advisor platform for large financial institutions. His background gives him unique expertise both in building high-trust consumer products and B2B platforms for regulated industries.</p><figure class=\"kg-card kg-image-card kg-card-hascaption\"><img src=https://www.ycombinator.com/"https://ghost.prod.ycinside.com/content/images/2025/05/X25-0513_002.jpg/" class=\"kg-image\" alt loading=\"lazy\" width=\"2000\" height=\"1334\" srcset=\"https://ghost.prod.ycinside.com/content/images/size/w600/2025/05/X25-0513_002.jpg 600w, https://ghost.prod.ycinside.com/content/images/size/w1000/2025/05/X25-0513_002.jpg 1000w, https://ghost.prod.ycinside.com/content/images/size/w1600/2025/05/X25-0513_002.jpg 1600w, https://ghost.prod.ycinside.com/content/images/size/w2400/2025/05/X25-0513_002.jpg 2400w\" sizes=\"(min-width: 720px) 720px\"><figcaption>Andrew Miklas</figcaption></figure><p>Andrew co-founded PagerDuty (YC S10, NYSE:PD), which became the backbone of digital operations for thousands of businesses. As founding CTO, he designed the original product and its high-availability architecture, and scaled the engineering team to 70+ people. After PagerDuty, he became an early-stage investor at s28 Capital, supporting companies like Clerk, CaptivateIQ, and Teleport. Andrew brings deep experience in building resilient systems and scaling engineering teams from zero to impact.</p><p>At YC, we’re always looking for ways to meet the needs of the most ambitious founders. Adding Jon and Andrew to our team strengthens our bench of technical leaders who’ve built durable companies – and who know what it takes to turn early-stage ideas into something world-changing.</p><p>We’re lucky to have them, and our founders will be even luckier.</p>","comment_id":"68264164f3981b000154b180","feature_image":"/blog/content/images/2025/05/BlogTwitter-Image-Template-1.png","featured":true,"visibility":"public","email_recipient_filter":"none","created_at":"2025-05-15T12:32:52.000-07:00","updated_at":"2025-05-21T08:00:00.000-07:00","published_at":"2025-05-21T08:00:00.000-07:00","custom_excerpt":"Over the last few batches, visiting partners Jon Xu and Andrew Miklas have become two of the most trusted voices in the YC community. They were both founders in YC’s Summer 2010 batch, and now, 15 years and two industry-shaping companies later, they’re back in a new role: General Partners at YC.","codeinjection_head":null,"codeinjection_foot":null,"custom_template":null,"canonical_url":null,"authors":[{"id":"61fe29e3c7139e0001a710d2","name":"Garry Tan","slug":"garry","profile_image":"/blog/content/images/2023/03/Instagram-Image-Template--Square---21-.png","cover_image":null,"bio":"Garry is the President & CEO of Y Combinator. Previously, he was the co-founder & Managing Partner of Initialized Capital. Before that, he co-founded Posterous (YC S08) which was acquired by Twitter.","website":null,"location":null,"facebook":null,"twitter":"@garrytan","meta_title":null,"meta_description":null,"url":"https://ghost.prod.ycinside.com/author/garry/"}],"tags":[{"id":"61fe29efc7139e0001a71173","name":"YC News","slug":"yc-news","description":null,"feature_image":null,"visibility":"public","og_image":null,"og_title":null,"og_description":null,"twitter_image":null,"twitter_title":null,"twitter_description":null,"meta_title":null,"meta_description":null,"codeinjection_head":null,"codeinjection_foot":null,"canonical_url":null,"accent_color":null,"url":"https://ghost.prod.ycinside.com/tag/yc-news/"}],"primary_author":{"id":"61fe29e3c7139e0001a710d2","name":"Garry Tan","slug":"garry","profile_image":"https://ghost.prod.ycinside.com/content/images/2023/03/Instagram-Image-Template--Square---21-.png","cover_image":null,"bio":"Garry is the President & CEO of Y Combinator. Previously, he was the co-founder & Managing Partner of Initialized Capital. Before that, he co-founded Posterous (YC S08) which was acquired by Twitter.","website":null,"location":null,"facebook":null,"twitter":"@garrytan","meta_title":null,"meta_description":null,"url":"https://ghost.prod.ycinside.com/author/garry/"},"primary_tag":{"id":"61fe29efc7139e0001a71173","name":"YC News","slug":"yc-news","description":null,"feature_image":null,"visibility":"public","og_image":null,"og_title":null,"og_description":null,"twitter_image":null,"twitter_title":null,"twitter_description":null,"meta_title":null,"meta_description":null,"codeinjection_head":null,"codeinjection_foot":null,"canonical_url":null,"accent_color":null,"url":"https://ghost.prod.ycinside.com/tag/yc-news/"},"url":"https://ghost.prod.ycinside.com/welcome-jon-and-andrew/","excerpt":"Over the last few batches, visiting partners Jon Xu and Andrew Miklas have become two of the most trusted voices in the YC community. They were both founders in YC’s Summer 2010 batch, and now, 15 years and two industry-shaping companies later, they’re back in a new role: General Partners at YC.","reading_time":2,"access":true,"og_image":null,"og_title":null,"og_description":null,"twitter_image":null,"twitter_title":null,"twitter_description":null,"meta_title":null,"meta_description":null,"email_subject":null,"frontmatter":null,"feature_image_alt":null,"feature_image_caption":null}]},"url":"/blog/raza-habib-building-better-ai-products-with-llms","version":"a2ce9abc7c2080e819915df4aacfcc5c01cd09b5","encryptHistory":false,"clearHistory":false,"rails_context":{"railsEnv":"production","inMailer":false,"i18nLocale":"en","i18nDefaultLocale":"en","href":"https://www.ycombinator.com/blog/raza-habib-building-better-ai-products-with-llms","location":"/blog/raza-habib-building-better-ai-products-with-llms","scheme":"https","host":"www.ycombinator.com","port":null,"pathname":"/blog/raza-habib-building-better-ai-products-with-llms","search":null,"httpAcceptLanguage":"en, *","applyBatchLong":"Summer 2025","applyBatchShort":"S2025","applyDeadlineShort":"May 13","ycdcRetroMode":true,"currentUser":null,"serverSide":true},"id":"ycdc_new/pages/BlogPage-react-component-555e8ffe-ec37-4e9b-a860-14da5b2e10e4","server_side":true}" data-reactroot="">

Humanloop CEO Raza Habib shares 5 common mistakes teams make when building with LLMs

by Greg Kumparak6/4/2024

When Humanloop started out in 2020, they were working on a better way to train the state-of-the-art language models of that time. These models needed lots of manually annotated data to work best; Humanloop’s first product made it easier for anyone to do this annotation work, while drastically reducing the overall amount of manual work required.

But they sensed a shifting tide.

“About two years ago we were watching what was happening with large language models,” says Humanloop co-founder Raza Habib, “and we realized that the biggest risk to us as a business was that these large language models would get really good — the paradigm for how people build AI would change substantially, and you wouldn’t need this annotation any more.”

In what would prove to be a prescient move, they started exploring a pivot just months before ChatGPT would debut. Instead of helping people annotate their training data, Humanloop would give teams the tools to evaluate how well their LLM-based AI applications were working, and help team members — technical or not — collaborate on building them.

“Pivoting your company… it’s a scary thing to do,” says Raza. “So we gave ourselves two weeks; we’d make some mockups and go out to the people we know are building with these [large language models] and see if anyone would pay for this. If we could get ten paying customers in two weeks, that’d be a strong enough signal that it’s worth pivoting the company. “

“In the end, it took us two days,” he says. Today Humanloop counts companies like Gusto, Vanta, and Duolingo as customers — effectively serving as their collaborative LLM playground to find the best prompts, evaluate different models, and track changes over time.

This week Humanloop is launching a podcast series called High Agency (on Apple Podcasts, Spotify, and YouTube) where Raza will talk to others building at the forefront of AI to compare notes on what works and what doesn’t in this still-early field. The first episodes will feature interviews with the CTOs at Ironclad, Zapier, Sourcegraph and Hex; all companies that have built great things with LLMs in production — but as Raza puts it, "nobody is an expert yet, and everyone is learning by doing."

With that in mind, I asked Raza to break down some of the most common mistakes he sees teams making when building on top of LLMs. Here’s what he told me:

Not having consistent, systematic evaluation in place:

Figure out what “good” looks like for your AI product’s output, then figure out how to measure against that as you build.

“If teams don’t have a good way of measuring what ‘good’ looks like,” says Raza, “they’ll spin their wheels for a long time changing things and not really knowing if they’re making any progress.”

“Everyone wants things that are fast; everyone wants things that are cheap, and accurate. But you’re going to have [criteria] that are really use-case specific to you.”

If you’re building an AI chat bot for helping someone practice a new language, maybe that means checking the output to ensure it only uses words appropriate for the user’s skill level. If you’re building an AI coach, perhaps that means double checking that each of your user’s stated goals gets mentioned and addressed.

But there’s more to it than just running the prompt a few times and making sure it all looks reasonable; the systems have to be in place to check the output regularly, as prompts change and the underlying models evolve.

“For [traditional software development], you write a piece of code, and every time you run it, it does the same thing. Same inputs, same outputs. But with an LLM? Same input, multiple outputs — every time you run it, you’ll get something slightly different.”

“One of the biggest mistakes people make is just eyeballing [one-off] examples,” he notes. “It doesn’t give them a rigorous enough sense of whether or not they’re making things better.”

Not paying attention to (sometimes silent) user feedback:

“What ‘good’ looks like is very subjective!” notes Raza.  “What is a good summary for this call? What is a good sales email? There isn’t just one single correct answer.”

“What your customer says is good is the ultimate answer,” says Raza. But they don’t always say those things out loud.

“You want to be capturing different sources of end-user feedback,” he notes. “That can be explicit things, like votes — those little thumbs up/thumbs down buttons. But it’s also implicit things that users do within your application that correlate well with whether or not it’s working. If you helped them generate that sales email, did they actually send it?”

“Plan ahead, when you’re designing an application, to capture the user signals that tell you if it’s working; you want to design that in from the beginning.”

Not closely tracking prompt history:

“Another error is not treating prompt management with the same rigor you treat code management,” says Raza.

The prompts you’re using will change over time; it’s key to track those changes and know why they were made.

“People start off doing this and they use [shared docs], they’re copying & pasting things in Slack, and they’re losing the history of their experimentation. New people join the team and it’s hard to know what was tried before. Something will be in production for months and you’ll make a change; is this better or worse than what we had before? They don’t know!”

Not fine-tuning the model:

For most purposes and when proving your idea works, you can probably get pretty far with the popular base models. But eventually, Raza suggests, you’ll want to fine-tune them for your needs. Good fine-tuning will give you better results, lower latency, and reduce costs in the long run.

“We recommend to everyone that prompt engineering is where they should start, because it’s the easiest, fastest, and most powerful thing,” says Raza. “but you can get order-of-magnitude cost savings if you fine-tune your models later.”

“The best way to think about fine-tuning is as an optimization,” he notes. “You want to avoid optimizing prematurely, but once you’ve validated that there’s demand for your product then it should become a focus.”

Not having domain experts write the prompts:

If you’re building LLM products for a specific vertical or industry, bring in people who really know the topic to help write the prompts and evaluate the output — don’t rely on engineers to do it alone. Large language models are, clearly, all about language. Language is nuanced, and the lexicons of different industries are deep.

“This is work that’s best done by domain experts,” says Raza. “It’s one of those things that’s obvious in retrospect, but wasn’t obvious at the start.”


If you’re building with LLMs, be sure to check out Humanloop here, and find Raza’s new podcast for AI builders, High Authority, on YouTube here.

Author

  • Greg Kumparak

    Greg oversees editorial content at Y Combinator. He was previously an editor at TechCrunch for nearly 15 years.