LLaDA di LLaDA do, tech progresses on, brah.

It's been a little while since I last hosted a meetup. I'm hosting one again. I thought I'd do a brief summary of the new things people should be aware of when it comes to AI. This isn't a comprehensive overview but mostly just a medley of what's worth keeping in mind.

Reasoning Models

Large Language Models that <think></think>.

Chain-of-Thought

Chain-of-Thought (CoT) is a way of prompting Large Language Models (LLMs) to get them to break down a problem and "reason" about the request from a user. This technique proved so useful that it has become more popular and has become codified into the models themselves.

Where did Chain-of-Thought come from?

It's likely it originated from this paper done from AI researchers at Google:

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

We explore how generating a chain of thought -- a series of intermediate reasoning steps -- significantly improves the ability of large language models to perform complex reasoning. In particular, we show how such reasoning abilities emerge naturally in sufficiently large language models via a simple method called chain of thought prompting, where a few chain of thought demonstrations are provided as exemplars in prompting. Experiments on three large language models show that chain of thought prompting improves performance on a range of arithmetic, commonsense, and symbolic reasoning tasks. The empirical gains can be striking. For instance, prompting a 540B-parameter language model with just eight chain of thought exemplars achieves state of the art accuracy on the GSM8K benchmark of math word problems, surpassing even finetuned GPT-3 with a verifier.

arXiv.orgJason Wei

Many people who were exploring the development of AI Agents were exploring with different ways of prompting to elicit completion of tasks on behalf of a user. I'd imagine this heavily influenced the creation of reasoning models.

An example taken from the Chain-of-Thought paper showing the basics behind the approach.

How CoT was added to LLMS:

For the case of Deepseek, in DeepSeek-R1-Zero, they did Reinforcement Learning by doing the following:

Started with their base AI model
Set up a simple template that told the AI to first think through its reasoning in a section called <think>, then give its answer in an <answer> section
Gave the AI lots of problems to solve
Rewarded the AI when it got the answers right

What happened next was fascinating. Without being shown examples of good reasoning, the AI naturally began to:

Write longer and more detailed thinking steps
Check its own work and fix mistakes
Try alternative approaches when stuck
Even have "aha moments" where it would realize it made a mistake and start over

While DeepSeek-R1-Zero showed promising results, it had some issues:

Its reasoning wasn't always easy for humans to read
It sometimes mixed different languages together

So they created an improved version called DeepSeek-R1:

First, they collected thousands of examples of good reasoning to give the AI a "cold start" (some initial guidance)
They fine-tuned their base AI on these examples
Then they applied reinforcement learning like before, rewarding correct answers
They collected the best outputs from this AI and used them as new training examples
Finally, they did another round of reinforcement learning for all types of questions

This multi-stage approach produced an AI that could reason through complex problems while keeping its explanations clear and readable for humans.

The "Aha Moment"

One of the coolest things they observed was that the AI sometimes had what they called an "aha moment" - where it would be working on a problem, realize it made a mistake, and then say something like "Wait, wait. Wait. That's an aha moment I can flag here," and start over with a better approach.

The AI wasn't specifically programmed to do this - it emerged naturally as the AI learned to solve problems more effectively through reinforcement learning.

By the end of training, their model could solve advanced math problems, code complex programs, and answer scientific questions at a level comparable to the best AI systems available, all by learning to think through problems step by step, just like humans do.

OpenAI O1 - ChatGPT

One of the first popular reasoning models. It appeared in September of 2024 as o1-preview.

I was most fascinated with its performance with writing code. https://openai.com/index/learning-to-reason-with-llms/

Deepseek R1

Towards the end of this past January Deepseek, a research lab based out of China, demonstrated how a reasoning model could be trained with less resources. They released their model Deepseek R1. It demonstrated that less compute was needed on Nvidia GPUs within data centers to get a comparable result as to OpenAI's O1 reasoning model. This shook the confidence of many investors in companies such as Nvidia:

At the time some investors were mentioning something called Jevon's Paradox:

Nvidia seems to have somewhat recovered in the past month:

Open R1 - HuggingFace

Open-R1: a fully open reproduction of DeepSeek-R1

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

The DeepSeek-R1 release leaves open several questions about:

Data collection: How were the reasoning-specific datasets curated?

Model training: No training code was released by DeepSeek, so it is unknown which hyperparameters work best and how they differ across different model families and scales.

Scaling laws: What are the compute and data trade-offs in training reasoning models?

These questions prompted us to launch the Open-R1 project, an initiative to systematically reconstruct DeepSeek-R1’s data and training pipeline, validate its claims, and push the boundaries of open reasoning models. By building Open-R1, we aim to provide transparency on how reinforcement learning can enhance reasoning, share reproducible insights with the open-source community, and create a foundation for future models to leverage these techniques.

Open-R1: a fully open reproduction of DeepSeek-R1

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Claude 3.7 - Anthropic

Anthropic just recently released their reasoning models in their 3.7 release this month.

Source: https://www.anthropic.com/news/claude-3-7-sonnet

AI Agents

Typing out in plain English what you want your computer do.

I believe the success of reasoning models that were heavily influenced by the work of what people were doing with AI Agents led companies like OpenAI to explore making their own AI agent. They call it OpenAI Operator. I believe only Pro users have access and it costs about $200 a month.

When others see this, they think AI Agents. When I see this, I think Selenium.

Selenium is a powerful automation framework built on a web standard supported by all major browsers. At its core, it allows developers to programmatically control web browsers—essentially letting code interact with websites just as a human would: clicking buttons, filling forms, and extracting information.

Selenium

Selenium automates browsers. That’s it! What you do with that power is entirely up to you. Primarily it is for automating web applications for testing purposes, but is certainly not limited to just that. Boring web-based administration tasks can (and should) also be automated as well. Getting Started Selenium WebDriver Selenium WebDriver If you want to create robust, browser-based regression automation suites and tests, scale and distribute scripts across many environments, then you want to use Selenium WebDriver, a collection of language specific bindings to drive a browser - the way it is meant to be driven.

Selenium

GitHub - SeleniumHQ/selenium: A browser automation framework and ecosystem.

A browser automation framework and ecosystem. Contribute to SeleniumHQ/selenium development by creating an account on GitHub.

GitHubSeleniumHQ

WebDriver

WebDriver is a remote control interface that enables introspection and control of user agents. It provides a platform- and language-neutral wire protocol as a way for out-of-process programs to remotely instruct the behavior of web browsers.

W3CSimon Stewart

If it's not using Selenium, then it's at the very least making use of the web standard that web browsers go by.

I've done some interesting things with Selenium back when I was exploring software for resellers and 2nd hand used goods.

From time to time, I use Selenium to grab data from the internet and I've become quite proficient at it. You can see an example of code I've done here:

GitHub - legut2/export_clubhouse: Export member data from Clubhouse social audio mobile app. This proof of concept shows how this could be done to any electron application.

Export member data from Clubhouse social audio mobile app. This proof of concept shows how this could be done to any electron application. - legut2/export_clubhouse

GitHublegut2

The above code makes use of how desktop software developed with Electron has a web browser in it. This meant that I could try controlling it with Selenium. This means other desktop software could be driven by Selenium in much the same way that a web browser can be controlled. There are many very popular applications that were built with Electron and could potentially be controlled by Selenium and by extension an AI agent.

However, in the above example, I was just helping some people I know have a path to move their group off of clubhouse. Platform risk is a real danger and it happens. When you're on the bleeding edge sometimes you get cut. Who remembers Vine? Now it's all about TikTok, or wait? Is it RedNote now? Sometimes people put enormous effort into one platform and don't have a path to get off of it when the platform gets into trouble.

AI Agents in Spreadsheets

A Personal Opinion

=GRAB(“URL”,”Instructions in English for what data to grab from web page.”)
=ACT(A2, “URL”, “Instructions for what action to take at web page.”)

I think people should be able to define and create their AI agents within spreadsheets. It would take only a handful of functions to do so.

Gatebreaker Spreadsheets

Gatebreaker Spreadsheet Software By Daniel Legut Spreadsheet software is the minecraft of the business world. It allows people to imagine and plan for the future. It allows one to build up mental models to make decisions. However, spreadsheets break Bushnell’s law: “All the best games are easy…

Google Docs

If you don't want to read the above document, you can watch a video I put together showing what's missing in spreadsheets.

Code editors with built-in support for LLMs and Reasoning Models are all the rage now, but what about helping people who do their work within spreadsheets?

AI Code Editors

Writing software is easier than it used to be.

Cursor

Cursor - The AI Code Editor

Built to make you extraordinarily productive, Cursor is the best way to code with AI.

Cursor

Windsurf

Windsurf Editor by Codeium

Tomorrow’s editor, today. Windsurf Editor is the first AI agent-powered IDE that keeps developers in the flow. Available today on Mac, Windows, and Linux.

Products

PearAI

PearAI evidently forked an OSS code editor, then later got funding from Y Combinator for it. The lack of attribution to the original authors got them in trouble.

PearAI - The Open Source AI Code Editor

PearAI is an open source AI-powered code editor with powerful features like AI chat, PearAI Creator, and AI debugging to help you make what excites.

The Open Source AI Code EditorNang

It's common place to rely on open source code, but PearAI got in trouble for not giving credit where credit is due.

Y Combinator is being criticized after it backed an AI startup that admits it basically cloned another AI startup | TechCrunch

A Y Combinator startup named PearAI launched with a tweet thread and YouTube video on Saturday and caused an immediate backlash.

TechCrunchJulie Bort

It's fairly normal to use other people's software within your own but they crossed a line with how they did it.

PearAI forked the OSS code editor Continue, which was itself funded by YC, and got YC funding for it. Also the editor they forked is itself a fork of VS Code, but is not to be confused with Void Editor, which is a third YC funded VS Code fork with AI features. It's YC funded VS Code forks with AI all the way down. - https://news.ycombinator.com/item?id=41701709

Continue.dev

Continue

Amplified developers, AI-enhanced development · The leading open-source AI code assistant. You can connect any models and any context to build custom autocomplete and chat experiences inside the IDE

Void Editor

Void

Void is an open source Cursor alternative. Full privacy. Fully-featured.

Void Editor

For reference, VS Code is an editor made by Microsoft that is pretty slick and I use it. It's just a regular code editor and it has a plug-in ecosystem where each plugin is called an extension, which begs the question - why didn't they just make a plugin? Maybe you can't make money off of extensions?

Cline

Cline - Autonomous Coding Agent for VSCode

Cline is an AI-powered coding assistant for Visual Studio Code.

Cline

Ollama

Run the models on your own hardware.

Ollama

Get up and running with large language models.

Aider

Get help with writing code with AI within your terminal.

Home

aider is AI pair programming in your terminal

aider

Many of these I have not tried out myself. I mostly do quite a bit of copy and pasting of snippets directly from chat interfaces. I plan on trying out aider because it seems to be the most promising. I mostly don't like being too far away from what the model underneath is doing and an editor kind-of obscures that. So I believe something lightweight is important for me.

Large Language Diffusion Models

LLaDA di LLaDA do, tech progresses on, brah.

A new type of model that combines techniques from both diffusion models and large language models. Diffusion models are normally used to generate images. These LLaDA models are seemingly more performant at a smaller scale when it comes to speed. This is something to keep an eye on.

GitHub.gg - Repository Analysis

Repository Analysis

GitHub - ML-GSAI/LLaDA: Official PyTorch implementation for “Large Language Diffusion Models”

Official PyTorch implementation for “Large Language Diffusion Models” - ML-GSAI/LLaDA

GitHubML-GSAI

Large Language Diffusion Models

Autoregressive models (ARMs) are widely regarded as the cornerstone of large language models (LLMs). We challenge this notion by introducing LLaDA, a diffusion model trained from scratch under the pre-training and supervised fine-tuning (SFT) paradigm. LLaDA models distributions through a forward data masking process and a reverse process, parameterized by a vanilla Transformer to predict masked tokens. By optimizing a likelihood bound, it provides a principled generative approach for probabilistic inference. Across extensive benchmarks, LLaDA demonstrates strong scalability, outperforming our self-constructed ARM baselines. Remarkably, LLaDA 8B is competitive with strong LLMs like LLaMA3 8B in in-context learning and, after SFT, exhibits impressive instruction-following abilities in case studies such as multi-turn dialogue. Moreover, LLaDA addresses the reversal curse, surpassing GPT-4o in a reversal poem completion task. Our findings establish diffusion models as a viable and promising alternative to ARMs, challenging the assumption that key LLM capabilities discussed above are inherently tied to ARMs. Project page and codes: https://ml-gsai.github.io/LLaDA-demo/.

arXiv.orgShen Nie

LLaDA - a Hugging Face Space by multimodalart

Large Language Diffusion Models

a Hugging Face Space by multimodalart