#自然语言处理#We introduced a new model designed for the Code generation task. Its test accuracy on the HumanEval base dataset surpasses that of GPT-4 Turbo (April 2024) and GPT-4o.
[LREC-COLING'24] HumanEval-XL: A Multilingual Code Generation Benchmark for Cross-lingual Natural Language Generalization
Self-evaluating interview for AI coders
Evaluation results of code generation LLMs
Dataset with coverage annotations for HumanEval dataset
A collection of practical code generation tasks and tests in open source projects. Complementary to HumanEval by OpenAI.
Benchmark results from code generation with LLMs
Evaluate LLM-synthesized @JuliaLang code.