Skip to content

unit-mesh/unit-gen

Repository files navigation

UnitGen Logo

UnitGen

CI/CD Powered By Maven Open In OpenBayes Built with OpenBayes codecov

UnitGen 是一个用于生成微调代码的数据框架 —— 直接从你的代码库中生成微调数据:代码补全、测试生成、文档生成等。

Docs: https://gen.unitmesh.cc/

Thanks to OpenBayes for providing computing resources.

Finetune Model Examples:

name model download (HuggingFace) finetune Notebook model download (OpenBayes)
DeepSeek 6.7B unit-mesh/autodev-coder finetune.ipynb AutoDev Coder

Language support by Chapi

  • supported:
    • Java
    • Kotlin
  • doing:
    • TypeScript/JavaScript
    • Rust
  • future:
    • Go
    • Python
    • C/C++
    • C#
    • Scala

Features:

Architecture

Layered Architecture

Architecture

Workflow

UnitGen Workflow

Design Philosophy

  • Unique prompt. Integrated use of fine-tuning, evaluation, and tooling.
  • Code quality pipeline. With estimate with code complex, bad smell, test bad smell, and more rules.
  • Extendable customize quality thresholds. Custom rules, custom thresholds, custom quality type or more.

Unique Prompt

Keep the same prompt: AutoDev <-> UnitGen <-> UnitEval

AutoDev prompt

AutoDev prompt template example:

Write unit test for following code. ${context.coc} ${context.framework} ${context.related_model} ```${context.language} ${context.selection} ``` 

Unit Picker prompt

Unit Picker prompt should keep the same structure as the AutoDev prompt. Prompt example:

Instruction( instruction = "Complete ${it.language} code, return rest code, no explaining", output = it.output, input = """  |```${it.language}  |${it.relatedCode}  |```  |  |Code:  |```${it.language}  |${it.beforeCursor}  |```""".trimMargin() )

UnitGen prompt

UnitGen prompt should keep the same structure as the AutoDev prompt. Prompt example:

Complete ${language} code, return rest code, no explaining ```${language} ${relatedCode} ``` Code: ```${language} ${beforeCursor} ``` 

Code quality pipeline

Code Quality Workflow

Extendable customize quality thresholds

Optional quality type:

enum class CodeQualityType { BadSmell, TestBadSmell, JavaController, JavaRepository, JavaService, }

Custom thresholds' config:

data class BsThresholds( val bsLongParasLength: Int = 5, val bsIfSwitchLength: Int = 8, val bsLargeLength: Int = 20, val bsMethodLength: Int = 30, val bsIfLinesLength: Int = 3, )

Custom rules:

val apis = apiAnalyser.toContainerServices() val ruleset = RuleSet( RuleType.SQL_SMELL, "normal", UnknownColumnSizeRule(), LimitTableNameLengthRule() // more rules ) val issues = WebApiRuleVisitor(apis).visitor(listOf(ruleset)) // if issues are not empty, then the code has bad smell

Quick Start

for examples, see: examples folder

use CLI

see in config-examples

download the latest version from GitHub Release

Generate Instructions

  1. config project by processor.yml
  2. run picker: java -jar unit-gen.jar

use Java API

see in config-example

1.add dependency

dependencies { implementation("cc.unitmesh:unit-picker:0.1.5") implementation("cc.unitmesh:code-quality:0.1.5") }

2.config the unit-gen.yml file and connection.yml

3.write code

public class App { public static void main(String[] args) { List<InstructionType> builderTypes = new ArrayList<>(); builderTypes.add(InstructionType.RELATED_CODE_COMPLETION); List<CodeQualityType> codeQualityTypes = new ArrayList<>(); codeQualityTypes.add(CodeQualityType.BadSmell); codeQualityTypes.add(CodeQualityType.JavaService); PickerOption pickerOption = new PickerOption( "https://github.com/unit-mesh/unit-gen-testing", "master", "java", ".", builderTypes, codeQualityTypes, new BuilderConfig() ); SimpleCodePicker simpleCodePicker = new SimpleCodePicker(pickerOption); List<Instruction> output = simpleCodePicker.blockingExecute(); // handle output in here } } 

Thanks to

  • abstract syntax tree: Chapi. Used features: multiple language to same data structure.
  • legacy system analysis: Coca. Inspired: Bad Smell, Test Bad Smell
  • architecture governance tool: ArchGuard. Used features: Estimation, Rule Lint (API, SQL)
  • code database CodeDB. Used features: Code analysis pipeline

LICENSE

This code is distributed under the MPL 2.0 license. See LICENSE in this directory.

About

UnitGen 是一个用于生成微调代码的数据框架 —— 直接从你的代码库中生成微调数据:代码补全、测试生成、文档生成等。UnitGen is a code fine-tuning data framework that generates data from your existing codebase.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors