Skip to content

matthewrenze/jhu-llm-temperature

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The Effect of Sampling Temperature on Problem Solving in Large Language Models

Abstract

In this research study, we empirically investigate the effect of sampling temperature on the performance of Large Language Models (LLMs) on various problem-solving tasks.

We created a multiple-choice question-and-answer (MCQA) exam by randomly sampling problems from standard LLM benchmarks. Then, we used nine popular LLMs with five prompt-engineering techniques to solve the MCQA problems while increasing the sampling temperature from 0.0 to 1.6.

Despite anecdotal reports to the contrary, our empirical results indicate that changes in temperature from 0.0 to 1.0 do not have a statistically significant impact on LLM performance for problem-solving tasks. In addition, these results appear to generalize across LLMs, prompt-engineering techniques, and problem domains.

Documents

Code

  • Source - contains all source code
  • Models - contains the model-specific code
  • Prompts - contains LLM agent prompt code
  • Exams - contains the code to load exams

Data

  • Exams - contains the test dataset
  • Results - contains the high-level test results
  • Details - contains the low-level test results
  • Responses - contains the LLM response text
  • Logs - contains the experiment event logs

Analysis

  • Plots - contains all data visualizations

Notes

  • Source contains all scripts for experiments, processing, and analysis
  • See Requirements.txt for a list of packages used in this experiment.
  • GitHub Copilot was used in the creation of this experiment.

About

The Effect of Sampling Temperature on Problem Solving in Large Language Models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages