17
$\begingroup$

This is more of a community question than an answerable problem. If it's out of bounds I'll be happy to withdraw it. But, one year out, I'm curious as to other WL users' experiences with the Notebook Assistant.

My biggest use of WL is either to code economics models that I find in textbooks or to port code that solves such models from other languages. I often consult with AI in doing this: I get good suggestions, good understanding (usually better than mine) of the math and theory behind the models, intriguing (and sometimes excellent) high-level solution approaches and templates -- and fairly buggy lower level code. Will these tradeoffs change if I move to the Notebook Assistant?

$\endgroup$

1 Answer 1

14
$\begingroup$

I have evaluated ChatGPT, Perplexity, and Claude for Wolfram Language coding. Claude Code is the clear winner. I can rough out solutions in my domain (signal processing) within minutes. Using precise prompts, I receive WL code that runs successfully on the first pass 80-90% of the time. From this starting point, I iteratively add features and requirements, ultimately producing 1,000-3,000 lines of production-quality WL. I also use the Notebook analysis function as a linting tool to identify and avoid issues in the raw code provided by Claude.

Regarding your question, I have used Notebook Assistant for more complex language-domain problems, particularly for code optimization. Interestingly, I find Claude exceptionally adept at decoding the sometimes cryptic error messages from the Mathematica interface. Claude can ingest error messages and provide code fixes and simplifications. In my opinion, Notebook Assistant is less useful for addressing the detailed bugs that inevitably appear in code. There is a difference in repairing syntactic errors (Notebook Assistant) verses structural errors where the LLMs like Claude can contribute. One area I am exploring is using Notebook Assistant to generate test code for Claude-produced implementations. This approach seems promising, particularly for solo WL users who lack in-house WL experts for code reviews. I caveat my characterization of the Notebook Assistant a bit because it remains a small part of my evolving workflow.

I plan on testing Claude v. Notebook Assistant in the future, likely using the same prompts in both to compare the results. I think the two models come from different places, so I am not sure if that test will provide much insight as to which is 'better'.

I believe the optimal solution may be a hybrid approach that leverages LLM providers alongside Notebook Assistant for deep, domain-specific training. That said, for example, Claude can be fed explicit domain knowledge using the Projects feature and tuned to produce better first-pass results.

Good luck developing your process!

$\endgroup$
8
  • $\begingroup$ Very instructive. What is your opinion on ChatGPT (I've never used Perplexity)? $\endgroup$ Commented Nov 11 at 18:32
  • $\begingroup$ Does Claude comment it’s code? $\endgroup$ Commented Nov 11 at 19:02
  • 1
    $\begingroup$ @eddy ardonne - At the time I used ChatGPT, the WL it produced was pretty bad, both syntacticaly and struturally, but that was already a couple of generation of CHatGP ago. I have not gone back to see if it has improved. $\endgroup$ Commented Nov 11 at 19:55
  • $\begingroup$ @David Keith - Yes, it does an excellent job of commenting code. You can control the level of detail in the comments through your prompts. It can also structure files appropriately for .nb or .wl use. Additionally, I have it write version description notes with usage instructions and references as a separate document to include in my workflow. $\endgroup$ Commented Nov 11 at 19:59
  • 2
    $\begingroup$ @Lexington1776, I would guess ChatGPT has improved a lot since then. For me, GPT-5 has been really helpful with coding in Mathematica, and translating Mathematica to C++, but that's beside the OP. $\endgroup$ Commented Nov 11 at 21:16

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.