这是关于AI智能体系列文章的第四篇。虽然我们通常通过提示词工程来提升大语言模型的任务表现,但这种方法存在一个核心缺陷:它依赖模型一次性准确执行任务的能力。本文将讨论另一种通过反馈循环来改进大语言模型系统的方法。我会先概述关键概念,然后用Python代码示例进行说明。
像ChatGPT这样的人工智能系统之所以强大,主要有两个原因。第一,它能根据自然语言指令执行各种任务。第二,用户可以实时提供反馈来优化模型的输出结果。
举个例子,你可以让ChatGPT帮你重写简历。如果第一次生成的内容不满意,你可以给出具体意见让它重新修改。
经过几轮对话调整后,最终你就能得到一份优化过的简历,帮助你获得理想的工作机会。
迭代的优势
上面例子的核心在于:大语言模型生成的简历内容会随着多次修改而不断优化。这个过程和我们通过"反复练习→获得反馈→再次尝试"来提升任务表现的方式完全一致。
反复改进的力量普遍存在,不论是在生物进化、创业经营、机器学习还是无数其他领域。这说明了只要在足够多的反馈循环中做出微小改进,就能积累产生巨大影响。
虽然反复改进十分有效,但在通过多轮对话优化大语言模型输出时,存在一个关键瓶颈:整个过程需要人工参与。但如果我们可以将这个过程自动化呢?
迭代优化中的大模型
"迭代优化大模型"指的是通过自动反馈机制来持续改进大模型的输出结果。这个概念有多种实现方式,以下是几个具体案例:
编程Agent:接收错误提示和单元测试结果来处理GitHub工单
自我对弈系统:通过模拟正反观点辩论来产生有说服力的论据
抖音标题生成器:接收符合人类偏好的评判模型反馈
头条机器人:根据用户互动数据优化推文内容
建立自动化反馈循环开辟了通过计算测试实现可扩展性能提升的新途径。核心问题在于:如何评估大语言模型的输出,确保反馈机制引导其朝着正确方向发展?
这时候就需要引入评估机制了。
3种自动化评估方法
评估指标是一个与我们期望结果相关的数字。比如,在改写简历时,好的评估指标应该与获得面试机会的概率密切相关。
虽然评估方法需根据具体任务定制,但我将其归纳为三种主要类型。
注:人工评估不在讨论范围内,这样会造成大模型的反馈效果
第一种:基于规则的评估
这类评估通过简单代码构建测试和指标来判断结果。例如,设置一个标记来检查LLM生成的代码是否报错或是否能通过单元测试。
基于规则的评估实现和解读起来相对简单——这对暗箱式的LLM尤为重要。不过难点在于,如何设计出能真实反映预期效果的简单规则。
第二种:基于LLM的评估
虽然基于规则的评估很实用,但有些内容无法通过代码判断。比如评估输出内容是否具备同理心,或者比较两个输出中哪个更受用户青睐。
这时就需要用到LLM作为裁判的方法。这类评估能对模型输出进行更复杂的评判,但LLM裁判也存在挑战,比如如何与人类偏好对齐、位置偏差和模型偏差等问题[2]。
注:也可以使用专门的机器学习分类器替代LLM裁判,但这需要足够的训练数据。
第三种:真实数据反馈
最后一类是直接用真实数据作为LLM的反馈指标,比如着陆页点击率、销售话术转化率或用户满意度评分。
这类评估的强大之处在于,它可以直接使用目标结果(如候补名单注册数、销售额、用户满意度)或高度相关的替代指标作为模型的反馈信号。换句话说,模型能根据你的核心目标直接优化输出。
实例演示:在LLM中迭代优化Upwork个人资料
Upwork(国外一家兼职平台)
既然我们已了解"LLM循环"的基本概念和运作原理,现在用一个具体案例来说明。下面我将开发一个工具,用于优化我在Upwork平台上的自由职业者个人资料。
不同于在ChatGPT中来回对话沟通,这次我将使用基于规则的评估指标来自动化反馈循环,比如文案字数统计、客户关注度、社会认同信号,以及阅读难易度。
导入准备
首先导入需要用到的实用程序库、功能模块和环境变量。
from openai import OpenAI
import os
from functions import * # custom function definitions
from dotenv import load_dotenv
load_dotenv() # import sk from .env file
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY")) # connect to openai API
操作说明
接下来,我将定义一个系统消息(或开发者指令),这个指令会在所有LLM重写过程中持续生效。指令将包含如何撰写有吸引力的Upwork个人资料的具体指导。
为了生成这个指令,我收集了5位收入超100万美元的AI与数据科学领域自由职业者的资料,让ChatGPT从中提炼出共性要点。此外,我还让ChatGPT根据这些最佳实践撰写了一个示例资料。
我将这些结果保存为.txt文件,然后通过以下方式将其整合到系统消息中。
# read existing profile text
instructions = read_context("instructions.txt")
# read examples
example = read_context("example.txt")
# define instructions
instructions = f"""{instructions}
---
## Here's an example"
{example}
----
## Important Guidelines:
- Profiles should be written in **PLAIN TEXT** (NOT markdown)
----
这里不展示完整的操作说明,因为篇幅很长。不过下方可以看到示例资料的内容。完整版的操作说明(instructions.txt)和示例文件(example.txt)朋友们可以私信索要。
🚀 Unlock AI-Powered Automation Without the Overhead of a Full Dev Team
You know AI has potential—but between the jargon, hype, and tools that only
half-deliver, it's hard to know where to start.
❌ Still doing manual, repetitive work
❌ Using tools that don't quite fit your workflow
❌ Struggling to connect data sources or build prototypes
❌ Not sure how to apply AI to your business problems
✅ That’s where I come in.
I help founders, teams, and consultants build custom, scalable AI-powered
tools that solve real business problems—without the fluff.
🛠 What I Build:
• AI Agents & Assistants – Custom GPT-4o powered workflows tailored to your
business
• Automation Scripts – Web scraping, email drafting, document parsing, API
integrations
• LLM-Powered Search – RAG systems, semantic search, Q&A over your internal
docs
• Dashboards & Insights – Visualizations and reporting powered by Python & AI
• Custom Chatbots – Trained on your content, deployed to your site or app
• Data Pipelines – End-to-end ETL for analytics or model training
• Lightweight SaaS Tools – MVPs built using Streamlit, FastAPI, and PostgreSQL
💡 How I Work:
I’m not just an engineer—I’m a strategist. I help you figure out what’s worth
building and why, then actually make it happen. My workflow includes:
• Business-first scoping sessions
• Clear milestones with working prototypes
• Fast, clean Python code (you own it)
• Loom walkthroughs and async check-ins
🏆 Why Clients Work With Me:
• Built AI-powered internal tools used by 20+ enterprise teams
• 100% Job Success | Expert-Vetted | US-Based
• Former Data Scientist + Consultant | 10+ years experience
• Clients include funded startups, solo consultants, and multi-billion dollar
orgs
💬 What People Say:
“Delivered a working prototype within days—then iterated with real feedback.
Best experience I've had on Upwork.”
“Our GPT-powered assistant is now saving us 15+ hours/week of manual work.”
“Shawhin is a rare blend of technical skill, product thinking, and business
empathy.”
---
💥 NOT a good fit if:
• You’re looking for the cheapest option
• You don’t have a clear business outcome in mind
• You want to outsource the thinking, not just the building
✅ GREAT fit if:
• You have a use case or idea and need a fast, functional implementation
• You want someone who gets both AI + business
• You value clean, modular code you can own and scale
---
📩 Ready to start? Here’s what to do:
1. Message me with a short description of what you’re trying to do
2. I’ll follow up with questions or suggest a quick discovery call
3. I’ll outline a scope + price, and we’re off to the races
Let’s build something smart.
初始改写提示
现在我们已经有关于如何撰写吸引人的个人资料的指导了,接下来需要处理我当前的个人资料。我把这段文字存进了一个叫background.txt的文件里。此外,我还描述了我理想中的客户类型。
# read current profile
background = read_context("background.txt")
# define customer
customer = """Founders and CXOs of small to medium sized business, \
seeking guidance with AI use cases"""
# create prompt
prompt = f"""### 📄 Upwork Profile Rewrite Task
Below is a freelancer's background, your task is to rewrite it based on \
high-converting Upwork profile best practices. Their ideal customer avatar is: \
{customer}."""
----
{background}
----
"""
下面是书面呈现的内容。可以发现,我当前的个人资料与那些顶级自由职业者的范例存在多处差异。最严重也最明显的问题是——我现在的资料通篇都在讲*"我、我、我"*,完全没有提及客户的需求和他们面临的问题!
### 📄 Upwork Profile Rewrite Task
Below is a freelancer's background, your task is to rewrite it based on
high-converting Upwork profile best practices. Their ideal customer avatar is:
Founders and CXOs of small to medium sized business, seeking guidance with AI
use cases.
----
# AI Product Manager | AI Application Architect
Meet Stone, one of the first AI senior practitioners in China, with rich experience in R&D and AI product implementation, 10+ years of AI experience, 6 years of AI product manager experience, 7 years of front-line AI R&D experience, and an insatiable curiosity to understand the world and create better systems.
Stone has more than 10 years of experience in data science and project management, covering smart home appliance research, insurance service risks, smart cockpits, AI education, and other fields.
As a senior AI practitioner, Stone helps the team solve problems through data-driven solutions, while constantly seeking entrepreneurial opportunities to create value and address challenging problems.
Domain Experience:
~~~~~~~~~~~~~~~~~~~~~
• AI education
• Insurance services
• AI portrait analysis
• Smart cockpit
• AI marketing
• Content creation
Consulting Services:
~~~~~~~~~~~~~~~~~~~~~~
• Project feasibility
• Project scoping and planning
• Project/Code review
• Topic/Concept explanation
Data Services:
~~~~~~~~~~~~~~~
• Automation - tedious data entry with a click of a button
• Preparation - turn raw data into something workable
• Visualization - translating numbers into powerful visuals
• Exploration - discovering hidden gems in data
• Modeling - develop data-driven models to make predictions
• Causality - going beyond correlations and uncovering causation
• Monitoring - ensure model predictions remain accurate after deployment
----
评估标准定义
构建"带有人工智能循环"系统最重要的环节是制定好的评估标准。在这里,评估标准是指与付费客户挂钩的各项指标。
我使用的是基于规则的评估方法。具体做法是让ChatGPT分析了我提供的五份顶级自由职业者的个人资料,然后用代码实现了这些可以二值判断的标准。最终我选用了以下四个评估项:
字数评估():检查文字量是否在300-800字之间
客户焦点评估():统计"您"和"您的"出现次数,达到5次即为合格
社会证明评估():用正则表达式检查文本是否包含"$"符号或客户引述
可读性评估():测试文本的Flesch-Kincaid阅读难度等级,九年级或以下水平为合格
相关函数定义详见
改写-评估循环
现在我们已经具备了运行改写-评估循环的所有条件。循环过程如下:
通过向GPT-4o提交指令和提示词来改写个人资料
运行全部四项评估,并打印结果
根据评估结果生成新的提示词(完全基于规则)
更新全部通过标识符和循环计数器
当所有评估都通过,或者计数器达到最大迭代次数时,终止循环
以下是该循环的代码实现。
all_passed = False
counter = 0
max_iter = 5
while not all_passed and counter < max_iter:
# rewrite profile
new_profile = rewrite_profile(instructions, prompt, client)
write_profile(new_profile, f"profile-{counter}.txt")
# evaluate new profile
results = run_all_evals(new_profile)
print(results)
# craft new prompt with feedback
prompt = generate_eval_report(new_profile)
# check if all tests passed
all_passed = all(results.values())
# udpate counter
counter += 1
以下展示了每次评估的运行结果。经过第一次改写后,个人资料仅在"可读性"指标上未达标。但要同时满足"字数要求"仍需要多轮调整才能实现平衡。
# Loop outputs
{'word_count': True, 'client_focus': True, 'social_proof': True, 'readability': False}
{'word_count': False, 'client_focus': True, 'social_proof': True, 'readability': False}
{'word_count': True, 'client_focus': True, 'social_proof': True, 'readability': False}
{'word_count': True, 'client_focus': True, 'social_proof': True, 'readability': False}
{'word_count': True, 'client_focus': True, 'social_proof': True, 'readability': True}
最终输出
下方展示最终版本的简介内容。虽然其中"过往成绩"和"客户评价"部分属于虚构内容可能需要调整,但其开头段落和整体结构已明显优化。每次修改迭代的结果可在此处查看。
🚀 Transform Your Business with Data-Driven AI Solutions
Are you struggling to use AI for your business? You're not alone. Many
founders and CXOs face challenges with AI.
❌ Not sure how to use AI
❌ Overwhelmed by data without clear insights
❌ Stuck with manual tasks that waste time
✅ I can help.
I coach small and medium-sized businesses on the effective use of AI and data science. I have over 10 years of experience in AI implementation and can transform complex data into a valuable tool to drive growth.
🛠 What I Offer:
- **Project Planning** – Plan AI projects for success
- **Data Automation** – Automate tasks to save time
- **Data Visualization** – Turn data into simple visuals
- **Predictive Modeling** – Create models to forecast trends
- **Causal Analysis** – Find true reasons for outcomes
💡 How I Work:
I’m more than a data scientist—I'm your growth partner. My approach includes:
- Solutions tailored to your needs
- Clear steps and communication
- Use of the latest tools for top results
🏆 Proven Success:
- Increased marketing ROI by 20%
- Developed models to reduce credit risk
- Improved efficiency for car sales teams by 15%
💬 What Clients Say:
“Stone’s expertise turned our data into strategies. A game-changer for us.”
“From start to finish, Stone provided clarity and results.”
🔍 Not a good fit if:
- You want the cheapest option
- You lack a clear business goal
✅ Perfect fit if:
- You need expert AI guidance
- You value strategic insights and effective results
📩 Ready to use AI for your business? Here’s how to start:
1. Message me with your needs
2. We’ll discuss goals and make a plan
3. Let’s turn your data into a powerful asset
Let’s unlock your business potential with AI!
限制因素
"迭代中的大模型"方法本质上等同于测试阶段的强化学习。这种方法让大模型能够根据环境反馈来优化输出内容。
虽然这种模式很强大,但它(和强化学习一样)容易出现奖励作弊问题。也就是说,模型可能会为了提升评估指标而产生不良结果。
举例来说,一个以互动数据为指导的微博机器人可能会发布越来越极端的言论。因此,在设计大模型的反馈评估指标时,必须深思熟虑并通过实验验证。
结论
虽然大模型是强大的通用问题解决工具,但它们往往无法一次就给出完美答案。通常需要多次反馈循环来逐步优化模型的输出质量。
本文讨论了如何通过"循环中的大模型"实现反馈自动化,并以改写我的Upwork个人资料为例,展示了如何应用这种方法来达到顶尖自由职业者的最佳实践标准。