OpenAI launches SWE-bench Verified
DIYuan | 2024-08-15 17:12
【数据猿导读】 OpenAI launches SWE-bench Verified

On August 15, OpenAI introduced a more reliable code generation evaluation benchmark: SWE-bench Verified. The most important line on the company's blog is: "As our systems get closer to AGI, we need to evaluate them in increasingly challenging tasks." The benchmark is an improved version (subset) of the existing SWE-bench, designed to more reliably evaluate the ability of AI models to solve real-world software problems.
来源:DIYuan
声明:数据猿尊重媒体行业规范,相关内容都会注明来源与作者;转载我们原创内容时,也请务必注明“来源:数据猿”与作者名称,否则将会受到数据猿追责。
刷新相关文章
我要评论
不容错过的资讯
大家都在搜
