NEWTrain a custom GPT Chatbot on YouTube videosTry Now

[AINews] small little news items • ButtondownTwitterTwitter

buttondown.com

Updated on January 15 2025

Chapters

AI Twitter Recap
AI Discord Recap
Interconnects (Nathan Lambert)
Codeium (Windsurf) Discussion
AI Tool Performance and User Experiences
Interconnects (Nathan Lambert) - ML Drama
Interconnects (Nathan Lambert)
Perplexity AI, Nous Research AI, and Aider Discussions
Rerank Fine-Tuning Pricing Confusion
Discussion on Bot Interaction, Alice in Wonderland Quip, and Cohere Documentation
Epilogue and Subscription

AI Twitter Recap

Model Releases and Updates

Ollama Model Enhancements: @ollama announced the inclusion of Cohere's R7B in their Command R series, optimized for RAG and tool use tasks. Additionally, @ollama released Ollama v0.5.5, featuring multiple quality of life updates and a transition to a new engine. The upcoming 2025 Ollama meetup in San Francisco was highlighted by @ollama, attracting significant interest with 31,592 impressions.
Together AI and OpenBMB Models: @togethercompute introduced Llama 3.3 70B, a multimodal model available for free on Together AI, boasting improved reasoning and math capabilities. Concurrently, @OpenBMB released the MiniCPM-o 2.6, an 8B parameter multimodal model that outperforms GPT-4V on visual tasks.
Process Reward Models and Qwen Developments: @_philschmid shared insights into Process Reward Models (PRM), emphasizing their role in enhancing LLM reasoning. The Qwen team also unveiled their Qwen2.5-Math-PRM models, demonstrating superior performance in mathematical reasoning.
LangChain and Codestral Updates: @LangChainAI released a beta version of tasks, allowing ChatGPT to handle future tasks like reminders and summaries. Codestral 25.01 by @dchaplot achieved joint #1 on LMSys Copilot Arena, showcasing significant performance improvements over previous versions.

AI Features and Tools

OpenAI Task Rollout: @OpenAI announced the rollout of Tasks, a feature enabling users to schedule actions for ChatGPT such as weekly news briefings and personalized workouts. This feature is currently in beta for Plus, Pro, and Teams users and will eventually be available to all ChatGPT accounts.
Ambient Agents and Email Assistants: @LangChainAI introduced an open-source email assistant agent, part of their new "ambient agents" paradigm. These agents are always active, handling tasks like email triage and drafting responses, enhancing productivity without traditional UX interfaces.
AI Software Engineering Advancements: @bindureddy discussed the rapid maturation of AI software engineers, highlighting their capabilities in codebase analysis, test case generation, and security infrastructure, predicting that AI will match SWE capabilities within the next 18 months.

AI Research and Papers

LLM Scaling Laws: @cwolferesearch delved into LLM scaling laws, explaining the power law relationships between compute, model size, and dataset size. The research emphasizes that while test loss decreases with scaling, the improvements plateau, challenging the notion of exponential AI advancements.
GANs Revival: @TheTuringPost reported on the revival of GANs through the paper "The GAN Is Dead; Long Live the GAN! A Modern GAN Baseline," highlighting the R3GAN architecture and its superior performance over some diffusion models on benchmarks like FFHQ and CIFAR-10.
Multimodal RAG and VideoRAG: @TheTuringPost introduced VideoRAG, an extension of multimodal RAG that retrieves videos in real-time, utilizing both visual and textual data to enhance response accuracy.
Tensor Product Attention: @iScienceLuvr presented the "Tensor Product Attention (TPA)" mechanism, which reduces inference-time cache size by 10x and outperforms previous attention methods like MHA and GQA in performance benchmarks.

AI Community and Events

Ollama Meetup and Community Engagement: @ollama promoted the 2025 Ollama meetup in San Francisco, fostering ###

AI Discord Recap

Theme 1. New AI Models: Codestral, MiniMax-01, and DeepSeek V3

Codestral Model Debuts with 256k Context: Mistral API released the Codestral model with a massive 256k context window, described as 'stupid fast and good' for speeding up code generation tasks.
MiniMax-01 Launches Open-Source Models with 4M Tokens: MiniMax-01 introduced models that handle up to 4 million tokens, offering improved performance and surpassing existing models by 20–32 times.
DeepSeek V3 Outperforms Claude in Coding Tasks: DeepSeek V3 was praised for surpassing Claude in code generation and reasoning, though it requires substantial resources to run locally.

Theme 2. AI Tools and IDEs: Performance Hiccups and User Innovations

Cursor IDE Faces Slowdowns and User Workarounds: Users encountered slow requests with Cursor IDE, leading to creative solutions like using Beyond Compare for managing code snapshots.
Codeium's Windsurf Woes and the Quest for Clarity: Issues with AI-generated code in Windsurf prompted users to seek improved structuring with .windsurfrules files.
LM Studio Users Compare Qwen 2.5 and QwQ Models: Comparison between Qwen 2.5 and QwQ models revealed Qwen's superiority in code generation efficiency.

Theme 3. Advancements in AI Features: From Task Scheduling to Ambient Agents

ChatGPT Introduces Task Scheduling Feature: ChatGPT rolled out a Task Scheduling feature for Plus, Team, and Pro users to set reminders, repositioning ChatGPT as a proactive AI agent.
Ambient Agents Automate Email Management: An AI email assistant autonomously triages and drafts emails to reduce inbox overload.
Hyper-Connections Proposed to Improve Neural Networks: Introduction of Hyper-Connections aims to enhance neural networks by addressing challenges like gradient vanishing.

Theme 4. AI Infrastructure: GPU Access and Support Challenges

Thunder Compute Offers Affordable Cloud GPUs: Thunder Compute launched with A100 instances at $0.92/hr, simplifying GPU workflows for high-performance computing.
Unsloth AI Limited to NVIDIA GPUs, AMD Users Left Waiting: Users expressed frustrations over Unsloth's lack of AMD support, seeking broader GPU compatibility.
OpenRouter Users Face Rate-Limiting Issues with Models: Performance bottlenecks were noted in models like DeepSeek V3 due to high demand, resulting in rate-limiting hurdles.

Theme 5. AI in Code Development: Practices and Philosophies

Developers Debate Testing Tensions: Discussions ranged from minimal testing practices to the importance of rigorous testing to avoid deployment risks.
Community Emphasizes Clear Guidelines for AI Code Collaboration: Detailed guidelines like .windsurfrules were recommended for reducing ambiguous AI responses and fostering better interactions.
Interest in AI for Real-Time Bug Fixing in Game Development: Speculation arose on AI's potential to fix bugs in real-time in video games, enhancing the gaming experience.

Interconnects (Nathan Lambert)

Qwen's PRM Gains Ground on Process Supervision:

The new Qwen2.5-Math-PRM aced intermediate error detection in math tasks on ProcessBench, referencing a 72B model on Hugging Face that uses human-annotated data for stronger reasoning. Developers cautioned that Monte Carlo synthetic approaches lag behind human methods, highlighting the need for careful evaluation.

Claude Sonnet & MiniCPM-o Make Waves:

Claude Sonnet 3.5 hit 62.2% on SWE-Bench Verified, trailing OpenAI's o3 at 71.7%, startling many who saw it as a previous-gen coding contender. Meanwhile, MiniCPM-o 2.6 from OpenBMB, boasting an 8B-size Omni design, impressed with real-time bilingual audio capability, as shown on GitHub and Hugging Face.

Higher Ed Chatbots & Stripe's Tax Trick:

A talk for higher-ed CIOs spotlighted U-M GPT and Maizey, with the University of Michigan championing tailored AI offerings for diverse campus needs. On the tax front, members praised Stripe's Non-Union One Stop Shop, letting outside businesses handle EU VAT in one swoop.

Synthetic CoT & O1 Drama Bawl:

Members found synthetic chain-of-thought training underwhelming, especially when it was just supervised fine-tuning with no RL. They doubted the chances of O1 models, hinting that Big Molmo or Tulu-V might do a better job for vision tasks.

Policy Punch: AI Blueprint & Datacenter Boom:

An Economic Blueprint proposes harnessing AI for national security and growth, echoing repeated policy suggestions from OpenAI. President Biden's executive order unlocks federal land for gigawatt-scale datacenters, mandating on-site clean energy to match capacity.

Codeium (Windsurf) Discussion

The Codeium (Windsurf) channel discussions included a range of topics such as unresolved support tickets, connectivity issues on Codeium, technical assistance, debates on freedom of speech, and telemetry errors on the VS Code extension. Members shared experiences with delayed support responses, abnormal server connections, gratitude for help received, discussions on freedom of speech, and concerns about telemetry errors despite enabling the extension. A GitHub link related to telemetry issues was also mentioned.

AI Tool Performance and User Experiences

The section discusses various AI tools like Windsurf and Unsloth, focusing on user experiences, challenges, and feedback. Users express struggles with AI-generated code in Windsurf, emphasizing the need for clear guidelines to improve collaboration. Additionally, experiences with Windsurf applications and user feedback prompt discussions on AI suggestion behavior. On the other hand, Unsloth users face challenges with GPU support limitations and Mistral's Codestral model release uncertainty. The section also touches on the dynamic filters in LLAMA, language learning preferences, and discussions on video resources in the off-topic channel.

Interconnects (Nathan Lambert) - ML Drama

420gunna shared a link providing a summary. For more details or context, you can visit the link provided.

Interconnects (Nathan Lambert)

This section discusses various topics related to AI, CIO functions, university chatbot initiatives, LLM limitations, and tax registrations. Members share insights on AI talks for CIOs, the University of Michigan's chatbot tools, challenges faced by CIOs with LLM capabilities, and Stripe's tax registration system in Europe. Conversations also touch on naming conventions for models and the potential collaborations between Molmo and Reinforcement Learning. The section also highlights links mentioned related to the topics discussed.

Perplexity AI, Nous Research AI, and Aider Discussions

supported:

Public sharing not available in paid NotebookLM version, internal sharing within organization possible.
Feedback on audio summary generation bugs: Functionality broken, team working on restoring it.
Exploration of NoCode RAG solutions: Complexity acknowledged, integrating features with NotebookLM challenging.
User experiences with NotebookLM: Positive feedback for research tasks but improvements needed in citation exporting and fixing bugs.

Perplexity AI:

Mixed reception of Perplexity Pro, dissatisfaction with coding assistance, struggles with redeemed rewards codes.
UI changes and user control: Complaints about unwanted ads and platform changes affecting usability.
Frustrations with coding assistant: AI repeatedly requesting confirmation for providing complete code.
Request for API access and features: Frustration over Pro search not supported via API.
Concerns over content visibility: Users worry about unpublished pages not being indexed by search engines.

Nous Research AI:

Claude's distinct personality noted, performance compared to DeepSeek v3.
Open source models limitations discussed against proprietary data.
Defensive practices in AI training scrutinized, concerns about AI development monopolies.
Evaluation of training data quality and impact on model performance.
Interest in human-like interactions in AI models highlighted, user admiration for Claude's engaging design.

Aider:

Discussion on .aider.conf.yml inclusion in .gitignore, debate over team decisions vs. individual setups.
Running DeepSeek v3 requires significant resources, alternatives explored due to hardware limitations.
API configuration for custom endpoints shared, commands exchanged for testing interactions successfully.
Recommendations for open-source models, balancing performance and hardware capabilities.
Performance inquiry for Gemini models, positive feedback on effectiveness for specific tasks.
Error handling with LLM edit formats in Aider, toggling edit formats, and limitations in the copy context command discussed.

Links mentioned:

Websites related to the respective discussions.

Rerank Fine-Tuning Pricing Confusion

A member inquired about the pricing for fine-tuning rerank models, mentioning that it wasn't listed on the Cohere pricing page. Another member provided documentation links related to Rerank FT and FAQs for further assistance.

Discussion on Bot Interaction, Alice in Wonderland Quip, and Cohere Documentation

Bot Interaction on Language: A user engaged the Cmd R Bot with a question about any similarity between corvo and escrivaninha, challenging the initial response with a reference to Alice in Wonderland. This prompted a playful and quirky connection between the words, showcasing how cultural references can lead to deeper language discussions.
Alice in Wonderland Quip: The conversation involving Alice in Wonderland highlighted the potential for cultural references to spark discussions about language nuances. It reflected on the limitations of the bot's data access in addressing literary and cultural queries.
Cohere Documentation Ineffectiveness: Despite attempting to search the Cohere documentation for information on corvo and escrivaninha, the Cmd R Bot found no relevant details. This demonstrated the challenges the bot faced in accessing data for cultural or literary queries.

Epilogue and Subscription

The epilogue section encourages readers to stay updated by subscribing to AI News. A subscription form is provided for users to enter their email address and subscribe. Additionally, links to the AI News Twitter account and newsletter are included for readers to follow. The footer section also highlights finding AI News on other platforms and acknowledges Buttondown as the platform used to start and grow the newsletter.

FAQ

Q: What are some recent model releases and updates mentioned in the AI Twitter recap?

A: Recent model releases and updates include Ollama's inclusion of Cohere's R7B in their Command R series, Together AI's introduction of Llama 3.3 70B, OpenBMB's release of MiniCPM-o 2.6, LangChainAI's beta version release, and Codestral's achievement on LMSys Copilot Arena.

Q: What AI tools and features were highlighted in the AI Twitter recap?

A: Highlighted AI tools and features include the introduction of Tasks by OpenAI for ChatGPT scheduling, LangChainAI's ambient agents for email management, and advancements in AI software engineering capabilities discussed by bindureddy.

Q: What AI research papers were featured in the AI Twitter recap?

A: Featured AI research papers include discussions on LLM scaling laws by cwolferesearch, the revival of GANs with the R3GAN architecture by TheTuringPost, the introduction of Tensor Product Attention by iScienceLuvr, and the introduction of VideoRAG by TheTuringPost.

Q: What were the key themes in the AI Twitter recap?

A: Key themes include new AI models like Codestral and MiniMax-01, advancements in AI tools and IDEs, enhancements in AI features from task scheduling to ambient agents, developments in AI infrastructure such as GPU access challenges, and AI's role in code development practices and philosophies.

Q: What recent advancements were mentioned in AI community events in the AI Twitter recap?

A: Recent advancements in AI community events include the Ollama meetup attracting significant interest, the release of Qwen2.5-Math-PRM models showcasing superior performance in mathematical reasoning, and the success of MiniCPM-o 2.6 from OpenBMB in visual tasks.

Get your own AI Agent Today

Thousands of businesses worldwide are using Chaindesk Generative AI platform.
Don't get left behind - start building your own custom AI chatbot now!

Start For Free

Book a Demo