ESM3, a General Theory of Neural Networks, HAWK clinical trial, ChemCrow, Chemical reservoir reaction network, Lifelike quadrupedal robots, Mobility VLA, scHolography, MobileLLM, LightRAG, LLaVA-NeXT
+ PaliGemma, WildGaussians, CURLoRA, LETS-C, Toto, Thrive, AIMO, 3DGRT, A “Turing Test” case study, The Illustrated AlphaFold, FoUBARthes, Odyssey, Strawberry, dravid, etc.
July 8-15th: ESM3: Simulating 500 million years of protein evolution with a language model, General Theory of Neural Networks (Rob Leclerc), Optical coherence tomography-derived texture-based radiomics features identify eyes with intraocular inflammation in the HAWK clinical trial (Heliyon), ChemCrow: Augmenting large language models with chemistry tools (Nature Machine Intelligence, May 2024), Chemical reservoir computation in a self-organizing reaction network (Nature), Lifelike agility and play in quadrupedal robots using reinforcement learning and generative pre-trained models (Nature Machine Intelligence), Mobility VLA: Multimodal Instruction Navigation with Long-Context VLMs and Topological Graphs (DeepMind), scHolography: a computational method for single-cell spatial neighborhood reconstruction and analysis (Genome Biology), MobileLLM: MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases (Facebook), LightRAG: The "PyTorch" library for LLM applications, Training of Physical Neural Networks, LLaVA-NeXT: Tackling Multi-image, Video, and 3D in Large Multimodal Models (ByteDance), PaliGemma: A versatile 3B VLM for transfer, WildGaussians: 3D Gaussian Splatting in the Wild, VLMs are Blind, CURLoRA: Leveraging CUR Matrix Decomposition for Stable LLM Continual Fine-Tuning and Catastrophic Forgetting Mitigation, LETS-C: Leveraging Language Embedding for Time Series Classification, Toto: Time Series Optimized Transformer for Observability, Chinese AI firms woo OpenAI users as US company plans API restrictions (Reuters), A real-world test of artificial intelligence infiltration of a university examinations system: A “Turing Test” case study (PLOS ONE), OpenAI and Thrive create an AI health coaching company, AIMO Prize (AI Mathematical Olympiad = ‘AIMO’), Facial Recognition at Checkout: Convenient or Creepy? (The Walrus), Open challenges for AI engineering (Simon Willison), BOFH: It's not generative AI at all, it's degenerate AI (The Register), Selfie-based ID raises eyebrows among infosec experts, Someone is wrong on the internet (AGI Doom edition), OpenAI whistleblowers ask SEC to investigate alleged restrictive non-disclosure agreements (Reuters), New Fiber Optics Tech Smashes Data Rate Record, Taiwan’s Forerunner 1 supercomputer to go online later in July, Neural MMO 2.0, ZoeDepth, Topaz Labs releases Video AI Pro, Stable Assistant, Physics-based Deep Learning, Machine Learning Systems with TinyML (Book | Harvard), Dravid (drd) cli is an AI powered cli coding framework, So you want to rent an NVIDIA H100 cluster? 2024 consumer guide (Photoroom), Dropbase helps developers build and prototype web apps faster with AI, 3DGRT (3D Gaussian Ray Tracing: Fast Tracing of Particle Scenes), The Illustrated AlphaFold, The GraphRAG Manifesto: Adding Knowledge to GenAI, OpenAI working on new reasoning technology under code name ‘Strawberry’ (Reuters), OpenAI Scale Ranks Progress Toward ‘Human-Level’ Problem Solving (Bloomberg), Real-time video of extremely precise deposition of 1 nanoliter to 1 microliter droplets, Klarna using GenAI to cut marketing costs by $10 mln annually (Reuters), FoUBARthes: Death of the Author (Dayna McLeod), Odyssey, Gen-3 Alpha Transitions, World Skins (Kyle Goodrich), Prepare for AI Hackers (Harvard Magazine | Feb 2023)
TL;DR : OpenAI blocks users from China; a rigorous, blind study injected 100% AI written submissions into the examinations system at a UK university, 94% of AI submissions were undetected, and grades awarded to AI were on average higher than that achieved by real students; ESM3, a frontier multimodal model generated a fluorescent proteins a far distance (58% identity) from known fluorescent proteins, hypothetically separated by over five hundred million years of evolution; Sam Altman and Arianna Huffington have written a post in Time as OpenAI and Thrive create an AI health coaching company; Facebook released MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases with SoTA results for MobileLLM-600M/1B/1.5B; Chemical reservoir computation research continues to evolve; as does scHolography: a robust computational framework for elucidating 3D tissue organization and analyzing spatial dynamics at the cellular level; a hierarchical model for Lifelike agility and play in quadrupedal robots is proposed; Rob LeClerc (a VC with a PhD in Theoretical and Computation Biology) released an intriguing compelling General Theory of Neural Networks, linking gene regulatory networks and artificial neural networks to Universal Activation Networks (UANs); the Illustrated AlphaFold is worth skimming for ML folks; an exemplary AI-merged-with-AR demo World Skins by Kyle Goodrich might interest interactivity designers; and for irreverent fun, watch media performance artist Dayna McLeod who asked ChatGPT to write an increasingly snarky and heated dialogue between Roland Barthes and Michel Foucault about The Death of the Author
🏓 Observations: Chinese AI firms woo OpenAI users as US company plans API restrictions (Reuters), A real-world test of artificial intelligence infiltration of a university examinations system: A “Turing Test” case study (PLOS ONE), OpenAI and Thrive create an AI health coaching company, AIMO Prize (AI Mathematical Olympiad = ‘AIMO’), Facial Recognition at Checkout: Convenient or Creepy? (The Walrus), Open challenges for AI engineering (Simon Willison), BOFH: It's not generative AI at all, it's degenerate AI (The Register), Selfie-based ID raises eyebrows among infosec experts, Someone is wrong on the internet (AGI Doom edition), OpenAI whistleblowers ask SEC to investigate alleged restrictive non-disclosure agreements (Reuters)
✭Chinese AI firms woo OpenAI users as US company plans API restrictions | Reuters “Chinese artificial intelligence (AI) companies are moving swiftly to attract users of OpenAI's technology, following reports the U.S. firm plans to restrict access in China and other countries to its application programming interface (API), a platform that allows developers of other products to integrate its AI models. ~ChatGPT maker OpenAI is planning to block access to technology used to build AI products for entities in China and some other countries, Chinese state-owned newspaper Securities Times reported on Tuesday. ChatGPT is not available in mainland China but many Chinese startups have been able to access OpenAI's API platform and use it to build their own applications, the Securities Times said. "We are taking additional steps to block API traffic from regions where we do not support access to OpenAI's services," an OpenAI spokesperson said in a statement to Reuters.”
✭ A real-world test of artificial intelligence infiltration of a university examinations system: A “Turing Test” case study | PLOS ONE “The recent rise in artificial intelligence systems, such as ChatGPT, poses a fundamental problem for the educational sector. In universities and schools, many forms of assessment, such as coursework, are completed without invigilation. Therefore, students could hand in work as their own which is in fact completed by AI. Since the COVID pandemic, the sector has additionally accelerated its reliance on unsupervised ‘take home exams’. If students cheat using AI and this is undetected, the integrity of the way in which students are assessed is threatened. We report a rigorous, blind study in which we injected 100% AI written submissions into the examinations system in five undergraduate modules, across all years of study, for a BSc degree in Psychology at a reputable UK university. We found that 94% of our AI submissions were undetected. The grades awarded to our AI submissions were on average half a grade boundary higher than that achieved by real students. Across modules there was an 83.4% chance that the AI submissions on a module would outperform a random selection of the same number of real student submissions.”
✭ OpenAI and Thrive create an AI health coaching company. “Sam Altman and Arianna Huffington have written a post in Time. They’re teaming up to build an AI health coach that could transform how we approach chronic diseases and daily wellness.” ✭ AI-Driven Behavior Change Could Transform Health Care | TIME “AI is already greatly accelerating the rate of scientific progress in medicine—offering breakthroughs in drug development, diagnoses, and increasing the rate of scientific progress around diseases like cancer. In fact, OpenAI is partnering with Color Health on an AI copilot to assist doctors in cancer screening and in creating treatment plans after a doctor has made a diagnosis. ~ But humans are more than medical profiles. Every aspect of our health is deeply influenced by the five foundational daily behaviors of sleep, food, movement, stress management, and social connection. And AI, by using the power of hyper-personalization, can significantly improve these behaviors. ~ These are the ideas behind Thrive AI Health, the company the OpenAI Startup Fund and Thrive Global are jointly funding to build a customized, hyper-personalized AI health coach that will be available as a mobile app and also within Thrive Global’s enterprise products. It will be trained on the best peer-reviewed science as well as Thrive’s behavior change methodology—including Microsteps, which are tiny daily acts that cumulatively lead to healthier habits. And it will also be trained on the personal biometric, lab, and other medical data you’ve chosen to share with it. It will learn your preferences and patterns across the five behaviors: what conditions allow you to get quality sleep; which foods you love and don’t love; how and when you’re most likely to walk, move, and stretch; and the most effective ways you can reduce stress. Combine that with a superhuman long-term memory, and you have a fully integrated personal AI coach that offers real-time nudges and recommendations unique to you that allows you to take action on your daily behaviors to improve your health.”
✭ AIMO Prize “The $10mn AIMO Prize was launched in November 2023 to help spur the open development of AI models that can reason mathematically, leading to the creation of a publicly shared AI model capable of winning a gold medal in the International Mathematical Olympiad (‘IMO’). ~ The Grand Prize of $5mn will be awarded to the first AI model to enter an AIMO Prize approved competition and perform at a standard equivalent to a gold medal in the IMO. The First Progress Prize is designed to incentivise the achievement of key milestones towards the grand prize, and opened in April 2024.” → ✭ Thomas Wolf on X: "There was a super impressive AI competition that happened last week that many people missed in the noise of AI world. I happen to know several participants so let me tell you a bit of this story as a Sunday morning coffee time. “Final push: The result were really amazing and the model climbed to the 1 place. And even more, while tying up for first place on the public, validation leaderboard (28 solved challenges versus 27 for the second place), it really shined when tested on the private, test leaderboard where it took a wide margin solving 29 challenges versus 22 for the second team. As Terence Tao himself set it up, this is "higher than expected"”
✭ Facial Recognition at Checkout: Convenient or Creepy? (The Walrus) “There’s something disconcerting about a sophisticated piece of surveillance technology deployed for something as banal as selling candy. Invenda, the Swiss company behind the machines, issued a statement to the CBC that no cameras were inside the machines and the software was designed for facial analysis, not recognition—it was there to “determine if an anonymous individual faces the device, for what duration, and approximates basic demographic attributes unidentifiably.” (The CBC report pointed out that Invenda’s CEO had used the term “facial recognition” in a previous promo video.) The Waterloo Reddit thread opened up a can of other questions. Why would a company collect this information to begin with? How many of the mundane transactions that make up our daily lives are being mined for intimate biometric data without our knowledge or approval? And how far will this technology go? ~ Police forces and border security commonly use facial recognition software to identify people deemed security threats. New surveillance tech is coming online amid increasing concerns about retail theft. Critics argue stats about retail crime are murky—some companies amp up anti-theft measures without releasing info on how much they’re actually losing. And sometimes the data is flawed: last December, the National Retail Federation, a US lobby group, retracted its claim that organized retail crime was responsible for half of the $94.5 billion (US) in inventory losses.”
✭Open challenges for AI engineering (Simon Willison) “Plus a flurry of tiny tools built using Claude 3.5 Sonnet”
✭BOFH: It's not generative AI at all, it's degenerate AI (The Register) “It's training day at HQ which means pub time is getting closer.”
✭ Selfie-based ID raises eyebrows among infosec experts “Vietnam now requires it for some purchases. It may be a fraud risk in Singapore. Or ML could be making it safe”
✭Someone is wrong on the internet (AGI Doom edition) “The last few years have seen a wave of hysteria about LLMs becoming conscious and then suddenly attempting to kill humanity. This hysteria, often expressed in scientific-sounding pseudo-bayesian language typical of the „lesswrong“ forums, has seeped into the media and from there into politics, where it has influenced legislation. ~ This hysteria arises from the claim that there is an existential risk to humanity posed by the sudden emergence of an AGI that then proceeds to wipe out humanity through a rapid series of steps that cannot be prevented. ~ Much of it is entirely wrong, and I will try to collect my views on the topic in this article - focusing on the „fast takeoff scenario“.”
✭ OpenAI whistleblowers ask SEC to investigate alleged restrictive non-disclosure agreements | Reuters “OpenAI whistleblowers have filed a complaint with the U.S. Securities and Exchange Commission, calling for an investigation over the artificial intelligence company's allegedly restrictive non-disclosure agreements, according to a letter seen by Reuters. "Given the well-documented potential risks posed by the irresponsible deployment of AI, we urge the Commissioners to immediately approve an investigation into OpenAI’s prior NDAs, and to review current efforts apparently being undertaken by the company to ensure full compliance with SEC rules," according to the letter, which was provided to Reuters by the office of Sen. Chuck Grassley. The AI company allegedly made employees sign agreements that required them to waive their federal rights to whistleblower compensation, according to the letter. The whistleblowers requested the SEC to fine OpenAI for each improper agreement made to the extent the agency deemed appropriate.”
⛲Foundational Revelations: ESM3: Simulating 500 million years of protein evolution with a language model, General Theory of Neural Networks (Rob Leclerc)
✭ ESM3: Simulating 500 million years of evolution with a language model “More than three billion years of evolution have produced an image of biology encoded into the space of natural proteins. Here we show that language models trained on tokens generated by evolution can act as evolutionary simulators to generate functional proteins that are far away from known proteins. We present ESM3, a frontier multimodal generative language model that reasons over the sequence, structure, and function of proteins. ESM3 can follow complex prompts combining its modalities and is highly responsive to biological alignment. We have prompted ESM3 to generate fluorescent proteins with a chain of thought. Among the generations that we synthesized, we found a bright fluorescent protein at far distance (58% identity) from known fluorescent proteins. Similarly distant natural fluorescent proteins are separated by over five hundred million years of evolution.”
✭General Theory of Neural Networks (Rob Leclerc) “From gene regulatory networks to artificial neural networks”
🛠️ Tech: New Fiber Optics Tech Smashes Data Rate Record, Taiwan’s Forerunner 1 supercomputer to go online later in July, Neural MMO 2.0, ZoeDepth, Topaz Labs releases Video AI Pro, Stable Assistant, Physics-based Deep Learning, Machine Learning Systems with TinyML (Book | Harvard), Dravid (drd) cli is an AI powered cli coding framework, So you want to rent an NVIDIA H100 cluster? 2024 consumer guide (Photoroom), Dropbase helps developers build and prototype web apps faster with AI, 3DGRT (3D Gaussian Ray Tracing: Fast Tracing of Particle Scenes), The Illustrated AlphaFold, The GraphRAG Manifesto: Adding Knowledge to GenAI, OpenAI working on new reasoning technology under code name ‘Strawberry’ (Reuters), OpenAI Scale Ranks Progress Toward ‘Human-Level’ Problem Solving (Bloomberg), Real-time video of extremely precise deposition of 1 nanoliter to 1 microliter droplets, Klarna using GenAI to cut marketing costs by $10 mln annually (Reuters)
✭ New Fiber Optics Tech Smashes Data Rate Record “Expanded bandwidth yields a transmission rate of 402 terabits per second. ~ According to Polina Bayvel, professor of optical communications and networks at University College London, those same transceivers that Puttnam referenced are a next-stage challenge for the field. “Transceivers need to be intelligent—akin to self-driving cars, able to sense and adapt to their environment, delivering capacity when and where it’s needed,” says Bayvel, who has collaborated with members of the team before but was unaffiliated with the present research. To that end, AI and machine learning (ML) techniques can help next-generation efforts to squeeze still more bits through fiber optic lines, she says. “AI/ML techniques may help detect and undo distortions and need to be developed in combination with high-capacity capabilities,” Bayvel adds. “We need to understand that optical fiber systems and networks are not just high-capacity plumbing. Optical fiber networks must be intelligent as well as secure and resilient.””
✭ Taiwan’s Forerunner 1 supercomputer to go online later in July | Taiwan News | Jul. 8, 2024 15:24
✭Neural MMO 2.0 documentation “NMMO is an open-source research platform that simulates populations of agents in virtual worlds. We challenge you to train agents that generalize to tasks, opponents, and maps never seen during training. Our objective is to spur research on increasingly general and cognitively realistic environments. Your Agents must collect food and water to survive. Each Agent has 8 individual professions to help them collect resources. Agents can level up their skills in each profession. Resources can be used to create consumable items that restore food, water and heath as well as to create ammunition that increases damage in combat. Higher level resources create better consumables and ammunition. Agents can also trade items on a global market. Agents may aquire armor to protect themselves in combat and weapons to increase their damage output. Agents can attack each other using one of three styles: Melee, Range, and Magic. The world is populated by NPCs that can be defeated to obtain items and increase power.”
✭Topaz Labs | Video AI Pro™ | Studio-grade video enhancement. For creative professionals. “Local processing for security. Multi-GPU rendering for speed.”
✭ Stability AI Releases Stable Assistant Features — Stability AI “Today we’re announcing new features to Stable Assistant, our user-friendly chatbot that leverages Stable Image Ultra, our most advanced image generation technology based on Stable Diffusion 3.”
✭Physics-based Deep Learning “TL;DR: This document contains a practical and comprehensive introduction of everything related to deep learning in the context of physical simulations. As much as possible, all topics come with hands-on code examples in the form of Jupyter notebooks to quickly get started. Beyond standard supervised learning from data, we’ll look at physical loss constraints, more tightly coupled learning algorithms with differentiable simulations, training algorithms tailored to physics problems, as well as reinforcement learning and uncertainty modeling. We live in exciting times: these methods have a huge potential to fundamentally change what computer simulations can achieve.”
✭ Machine Learning Systems with TinyML (Book | Harvard) “Machine Learning Systems with TinyML offers readers an entry point to understand machine learning (ML) systems by grounding concepts in applied ML. As the demand for efficient and scalable ML solutions grows, the ability to construct robust ML pipelines becomes increasingly crucial. This book aims to demystify the process of developing complete ML systems suitable for deployment, spanning key phases like data collection, model design, optimization, acceleration, security hardening, and integration, all from a systems perspective. The text covers a wide range of concepts relevant to general ML engineering across industries and applications, using TinyML as a pedagogical tool due to its global accessibility. Readers will learn basic principles around designing ML model architectures, hardware-aware training strategies, performant inference optimization, and benchmarking methodologies. The book also explores crucial systems considerations in areas like reliability, privacy, responsible AI, and solution validation. Enjoy reading it!”
✭GitHub - vysakh0/dravid: dravid (drd) cli is an AI powered cli coding framework
✭So you want to rent an NVIDIA H100 cluster? 2024 consumer guide (Photoroom) “Tips and technical analysis on how to pick an H100 cluster (interconnect, reliability and CO2 emissions)”
✭3DGRT 3D Gaussian Ray Tracing: Fast Tracing of Particle Scenes “Particle-based representations of radiance fields such as 3D Gaussian Splatting, have found great success for reconstructing and re-rendering of complex scenes. Most existing methods render particles via rasterization, projecting them to screen space tiles for processing in a sorted order. This work instead considers ray tracing the particles, building a bounding volume hierarchy and casting a ray for each pixel using high-performance GPU ray tracing hardware. To efficiently handle large numbers of semi-transparent particles, we describe a specialized rendering algorithm which encapsulates particles with bounding meshes to leverage fast ray-triangle intersections, and shades batches of intersections in depth-order. The benefits of ray tracing are well-known in computer graphics: processing incoherent rays for secondary lighting effects such as shadows and reflections, rendering from highly-distorted cameras common in robotics, stochastically sampling rays, and more. With our renderer, this flexibility comes at little cost compared to rasterization. Experiments demonstrate the speed and accuracy of our approach, as well as several applications in computer graphics and vision. We further propose related improvements to basic Gaussian representation, including a simple use of generalized kernel functions which significantly reduces particle hit counts.”
✭ The Illustrated AlphaFold “A visual walkthrough of the AlphaFold3 architecture, with more details and diagrams than you were probably looking for. ~ Do you want to understand exactly how AlphaFold3 works? The architecture is quite complicated and the description in the paper can be overwhelming, so we made a much more friendly (but just as detailed!) visual walkthrough. ~ This is mostly written for an ML audience and multiple points assume familiarity with the steps of attention. If you’re rusty, see Jay Alammar’s The Illustrated Transformer for a thorough visual explanation. That post is one of the best explanations of a model architecture at the level of individual matrix operations and also the inspiration for the diagrams and naming. There are already many great explanations of the motivation for protein structure prediction, the CASP competition, model failure modes, debates about evaluations, implications for biotech, etc. so we don’t focus on any of that. Instead we explore the how.”
✭ The GraphRAG Manifesto: Adding Knowledge to GenAI - Graph Database & Analytics “Discover why GraphRAG will subsume vector-only RAG and emerge as the default RAG architecture for most use cases.”
✭ Exclusive: OpenAI working on new reasoning technology under code name ‘Strawberry’ (Reuters) “ChatGPT maker OpenAI is working on a novel approach to its artificial intelligence models in a project code-named “Strawberry,” according to a person familiar with the matter and internal documentation reviewed by Reuters.” → ✭ OpenAI Scale Ranks Progress Toward ‘Human-Level’ Problem Solving (Bloomberg) “The company believes its technology is approaching the second level of five on the path to artificial general intelligence. The ChatGPT maker, seen by many as a leader in the race to build more powerful AI systems, shared the new classification system with employees on Tuesday during an all-hands meeting, an OpenAI spokesperson said. The tiers, which OpenAI plans to share with investors and others outside the company, range from the kind of AI available today that can interact in conversational language with people (Level 1) to AI that can do the work of an organization (Level 5). ~ OpenAI executives told employees that the company believes it is currently on the first level, according to the spokesperson, but on the cusp of reaching the second, which it calls “Reasoners.” This refers to systems that can do basic problem-solving tasks as well as a human with a doctorate-level education who doesn’t have access to any tools.”
✭ Klarna using GenAI to cut marketing costs by $10 mln annually | Reuters “Fintech firm Klarna, one of the early adopters of generative AI (GenAI), said on Tuesday it is using AI for purposes such as running marketing campaigns and generating images, saving about $10 million in costs annually. The company has cut its sales and marketing budget by 11% in the first quarter, with AI responsible for 37% of the cost savings, while increasing the number of campaigns, the company said. Using GenAI tools like Midjourney, DALL-E, and Firefly for image generation, Klarna said it has reduced image production costs by $6 million. ~It is using GenAI to update images on its app and website on a weekly basis, reflecting key retail events such as Valentine's Day, Mother's Day and summer sales. "Traditionally, it would have been very costly to cater to these occasions with bespoke imagery, but with AI that is no longer an issue," Klarna CMO David Sandström said in a statement. "Essentially, we have removed the need for stock imagery." It has generated more than 1,000 images in the first three months of 2024 using GenAI, reducing the image development cycle from six weeks to seven days. A further $4 million in savings come from cutting spending on external marketing suppliers for translation, production, and social agencies.”
👁️🗨️ Research into AI: MobileLLM: MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases (Facebook), LightRAG: The "PyTorch" library for LLM applications, Training of Physical Neural Networks, LLaVA-NeXT: Tackling Multi-image, Video, and 3D in Large Multimodal Models (ByteDance), PaliGemma: A versatile 3B VLM for transfer, WildGaussians: 3D Gaussian Splatting in the Wild, VLMs are Blind, CURLoRA: Leveraging CUR Matrix Decomposition for Stable LLM Continual Fine-Tuning and Catastrophic Forgetting Mitigation, LETS-C: Leveraging Language Embedding for Time Series Classification, Toto: Time Series Optimized Transformer for Observability
✭GitHub - facebookresearch/MobileLLM: MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024. “This repository contains the training code of MobileLLM introduced in our work: "MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases", published in ICML 2024. In this work, we comprehensively consider multiple design factors to obtain high-quality LLMs with fewer than a billion parameters. We integrated (1) SwiGLU activation function, (2) deep and thin architectures, (3) embedding sharing, (4) grouped-query attention to build MobileLLM. MobileLLM-125M/350M attains a remarkable 2.7%/4.3% accuracy boost over preceding 125M/350M SoTA models on zero-shot commonsense reasoning tasks. In our updated version, we further demonstrate that our design philosophy scales effectively to larger models, with SoTA results for MobileLLM-600M/1B/1.5B.”
✭ GitHub - SylphAI-Inc/LightRAG: The "PyTorch" library for LLM applications. “The "PyTorch" library for LLM applications. Contribute to SylphAI-Inc/LightRAG development by creating an account on GitHub.”
✭Training of Physical Neural Networks “Physical neural networks (PNNs) are a class of neural-like networks that leverage the properties of physical systems to perform computation. While PNNs are so far a niche research area with small-scale laboratory demonstrations, they are arguably one of the most underappreciated important opportunities in modern AI. Could we train AI models 1000x larger than current ones? Could we do this and also have them perform inference locally and privately on edge devices, such as smartphones or sensors? Research over the past few years has shown that the answer to all these questions is likely "yes, with enough research": PNNs could one day radically change what is possible and practical for AI systems. To do this will however require rethinking both how AI models work, and how they are trained - primarily by considering the problems through the constraints of the underlying hardware physics. To train PNNs at large scale, many methods including backpropagation-based and backpropagation-free approaches are now being explored. These methods have various trade-offs, and so far no method has been shown to scale to the same scale and performance as the backpropagation algorithm widely used in deep learning today. However, this is rapidly changing, and a diverse ecosystem of training techniques provides clues for how PNNs may one day be utilized to create both more efficient realizations of current-scale AI models, and to enable unprecedented-scale models.”
✭LLaVA-NeXT: Tackling Multi-image, Video, and 3D in Large Multimodal Models (ByteDance) “Recent advancements in Large Multimodal Models (LMMs) have showcased impressive capabilities in multimodal understanding and reasoning. However, most existing open-source LMMs such as LLaVA-NeXT have primarily focused on pushing the performance limit of single-image, leaving the potential of multi-image scenarios largely unexplored. Considering the diverse range of computer vision scenarios and data formats—including single and multi-image inputs, videos, and 3D data—there is a pressing need to develop methodologies for open LMMs that can operate effectively across these varied contexts. We observe that the image-text interleaved format can naturally serve as a general data template to unify different scenarios, e.g., single-image or multi-image as special cases, video as multi-frames, and 3D as multi-views. Therefore, we present LLaVA-NeXT-Interleave, an all-around LMM that extends the model capabilities to new real-world settings: Multi-image, Multi-frame (videos), Multi-view (3D) and maintains the performance of the Multi-patch (single-image) scenarios. We denote the four settings as M4.”
✭PaliGemma: A versatile 3B VLM for transfer “PaliGemma is an open Vision-Language Model (VLM) that is based on the SigLIP-So400m vision encoder and the Gemma-2B language model. It is trained to be a versatile and broadly knowledgeable base model that is effective to transfer. It achieves strong performance on a wide variety of open-world tasks. We evaluate PaliGemma on almost 40 diverse tasks including standard VLM benchmarks, but also more specialized tasks such as remote-sensing and segmentation.”
✭ WildGaussians: 3D Gaussian Splatting in the Wild “While the field of 3D scene reconstruction is dominated by NeRFs due to their photorealistic quality, 3D Gaussian Splatting (3DGS) has recently emerged, offering similar quality with real-time rendering speeds. However, both methods primarily excel with well-controlled 3D scenes, while in-the-wild data - characterized by occlusions, dynamic objects, and varying illumination - remains challenging. NeRFs can adapt to such conditions easily through per-image embedding vectors, but 3DGS struggles due to its explicit representation and lack of shared parameters. To address this, we introduce WildGaussians, a novel approach to handle occlusions and appearance changes with 3DGS. By leveraging robust DINO features and integrating an appearance modeling module within 3DGS, our method achieves state-of-the-art results. We demonstrate that WildGaussians matches the real-time rendering speed of 3DGS while surpassing both 3DGS and NeRF baselines in handling in-the-wild data, all within a simple architectural framework.”
✭ VLMs are Blind “Research showing that vision language models (VLMs) fail on simple visual tasks that are easy for humans.”
✭CURLoRA: Leveraging CUR Matrix Decomposition for Stable LLM Continual Fine-Tuning and Catastrophic Forgetting Mitigation “This paper introduces CURLoRA, a novel approach to fine-tuning large language models (LLMs) that leverages CUR matrix decomposition in the context of Low-Rank Adaptation (LoRA). Our method addresses two critical challenges in LLM fine-tuning: mitigating catastrophic forgetting during continual learning and reducing the number of trainable parameters. We propose a unique modification to the CUR decomposition process, utilizing inverted probabilities for column and row selection which acts as an implicit regularization, and initializing the U matrix as a zero matrix, and only fine-tuning it. Through experiments on multiple datasets, we demonstrate that CURLoRA outperforms standard LoRA in mitigating catastrophic forgetting maintaining model stability and performance across tasks while significantly reducing the number of trainable parameters. Our results show that CURLoRA achieves superior accuracy and perplexity scores compared to LoRA, particularly in scenarios with limited fine-tuning data.”
✭[2407.06533] LETS-C: Leveraging Language Embedding for Time Series Classification “Recent advancements in language modeling have shown promising results when applied to time series data. In particular, fine-tuning pre-trained large language models (LLMs) for time series classification tasks has achieved state-of-the-art (SOTA) performance on standard benchmarks. However, these LLM-based models have a significant drawback due to the large model size, with the number of trainable parameters in the millions. In this paper, we propose an alternative approach to leveraging the success of language modeling in the time series domain. Instead of fine-tuning LLMs, we utilize a language embedding model to embed time series and then pair the embeddings with a simple classification head composed of convolutional neural networks (CNN) and multilayer perceptron (MLP). We conducted extensive experiments on well-established time series classification benchmark datasets. We demonstrated LETS-C not only outperforms the current SOTA in classification accuracy but also offers a lightweight solution, using only 14.5% of the trainable parameters on average compared to the SOTA model. Our findings suggest that leveraging language encoders to embed time series data, combined with a simple yet effective classification head, offers a promising direction for achieving high-performance time series classification while maintaining a lightweight model architecture.”
✭[2407.07874] Toto: Time Series Optimized Transformer for Observability “This technical report describes the Time Series Optimized Transformer for Observability (Toto), a new state of the art foundation model for time series forecasting developed by Datadog. In addition to advancing the state of the art on generalized time series benchmarks in domains such as electricity and weather, this model is the first general-purpose time series forecasting foundation model to be specifically tuned for observability metrics.
Toto was trained on a dataset of one trillion time series data points, the largest among all currently published time series foundation models. Alongside publicly available time series datasets, 75% of the data used to train Toto consists of fully anonymous numerical metric data points from the Datadog platform. In our experiments, Toto outperforms existing time series foundation models on observability data. It does this while also excelling at general-purpose forecasting tasks, achieving state-of-the-art zero-shot performance on multiple open benchmark datasets.”
🔎 Applied Research: Optical coherence tomography-derived texture-based radiomics features identify eyes with intraocular inflammation in the HAWK clinical trial (Heliyon), ChemCrow: Augmenting large language models with chemistry tools (Nature Machine Intelligence, May 2024), Chemical reservoir computation in a self-organizing reaction network (Nature), Lifelike agility and play in quadrupedal robots using reinforcement learning and generative pre-trained models (Nature Machine Intelligence), Mobility VLA: Multimodal Instruction Navigation with Long-Context VLMs and Topological Graphs (DeepMind), scHolography: a computational method for single-cell spatial neighborhood reconstruction and analysis (Genome Biology)
✭Optical coherence tomography-derived texture-based radiomics features identify eyes with intraocular inflammation in the HAWK clinical trial: Heliyon “Wilcoxon Rank Sum feature selection were evaluated using Random Forest (RF) classifier on the training set (𝑆𝑡𝑟, N = 47) to differentiate between the two patient groups. Classifier performance was subsequently validated on the independent test set (𝑆𝑡, N = 20). Additionally, the classifier performance in discriminating the Control and Safety group was also validated on 𝑆𝑡 at the IOI event timepoint. ~ Results ~ The RF classifier yielded area under the Receiver Operating Characteristics curve (AUC) of 0.76 and 0.81 on 𝑆𝑡 using texture-based radiomics features at pre-IOI and event time-point, respectively. ~ Conclusions ~ In this analysis, the presence of a pre-IOI safety signal was detected in the form of textural heterogeneity within the vitreous compartment even prior to the actual event being identified by the investigator. This finding may help the clinicians to assess for underlying posterior inflammation.” ✭ AI technology advances early detection of severe eye inflammation, new research shows “Age-related macular degeneration (AMD) is a leading cause of vision loss in the U.S., affecting 11 million people, particularly older adults. The more severe form, neovascular age-related macular degeneration (nAMD), is characterized by abnormal blood vessel growth under the retina. These vessels leak fluid or blood, leading to vision loss. Besides age, smoking, poor diet, and lack of physical activity also contribute to the risk. ~ The primary treatment for nAMD is anti-VEGF drugs. This treatment involves injecting a drug into the eye that blocks a protein called vascular endothelial growth factor (VEGF), which is responsible for the growth of abnormal blood vessels in the retina; however, it can cause eye inflammation as a serious side effect. ~ A team of researchers from Emory AI.Health and Cleveland Clinic aimed to predict which patients might develop this inflammatory response. By combining routine optical coherence tomography (OCT) scans with machine learning and precision medicine, they sought to identify patterns in eye scan images that could appear before or during inflammation caused by anti-VEGF drugs.”
✭ChemCrow: Augmenting large language models with chemistry tools (Nature Machine Intelligence, May 2024) “Large language models (LLMs) have shown strong performance in tasks across domains but struggle with chemistry-related problems. These models also lack access to external knowledge sources, limiting their usefulness in scientific applications. We introduce ChemCrow, an LLM chemistry agent designed to accomplish tasks across organic synthesis, drug discovery and materials design. By integrating 18 expert-designed tools and using GPT-4 as the LLM, ChemCrow augments the LLM performance in chemistry, and new capabilities emerge. Our agent autonomously planned and executed the syntheses of an insect repellent and three organocatalysts and guided the discovery of a novel chromophore. Our evaluation, including both LLM and expert assessments, demonstrates ChemCrow’s effectiveness in automating a diverse set of chemical tasks. Our work not only aids expert chemists and lowers barriers for non-experts but also fosters scientific advancement by bridging the gap between experimental and computational chemistry.”
✭Chemical reservoir computation in a self-organizing reaction network (Nature) “Chemical reaction networks, such as those found in metabolism and signalling pathways, enable cells to process information from their environment1,2. Current approaches to molecular information processing and computation typically pursue digital computation models and require extensive molecular-level engineering3. Despite considerable advances, these approaches have not reached the level of information processing capabilities seen in living systems. Here we report on the discovery and implementation of a chemical reservoir computer based on the formose reaction4. We demonstrate how this complex, self-organizing chemical reaction network can perform several nonlinear classification tasks in parallel, predict the dynamics of other complex systems and achieve time-series forecasting. This in chemico information processing system provides proof of principle for the emergent computational capabilities of complex chemical reaction networks, paving the way for a new class of biomimetic information processing systems.” ✭ Scientists demonstrate chemical reservoir computation using the formose reaction (Phys.org) “Researchers from the Institute for Molecules and Materials at Radboud University, Netherlands, have demonstrated that a complex self-organizing chemical reaction network can perform various computational tasks, such as nonlinear classification and complex dynamics prediction. The field of molecular computing interests researchers who wish to harness the computational power of chemical and biological systems. In these systems, the chemical reactions or molecular processes act as the reservoir computer, transforming inputs into high-dimensional outputs. The researchers used the reservoir computer to do several tasks. The first was doing nonlinear classification tasks. The reservoir computer could emulate all Boolean logic gates and even tackle more complex classifications like XOR, checkers, circles, and sine functions. The team also showed that it could predict the behavior of a complex metabolic network model of E. coli, accurately capturing both linear and nonlinear responses to fluctuating inputs across various concentration ranges. Furthermore, the system demonstrated the ability to forecast future states of a chaotic system (the Lorenz attractor), accurately predicting two out of three input dimensions several hours into the future. The research team also found that some chemical species in the system exhibit short-term memory, retaining information about past inputs.”
✭Lifelike agility and play in quadrupedal robots using reinforcement learning and generative pre-trained models (Nature Machine Intelligence) “Knowledge from animals and humans inspires robotic innovations. Numerous efforts have been made to achieve agile locomotion in quadrupedal robots through classical controllers or reinforcement learning approaches. These methods usually rely on physical models or handcrafted rewards to accurately describe the specific system, rather than on a generalized understanding like animals do. Here we propose a hierarchical framework to construct primitive-, environmental- and strategic-level knowledge that are all pre-trainable, reusable and enrichable for legged robots. The primitive module summarizes knowledge from animal motion data, where, inspired by large pre-trained models in language and image understanding, we introduce deep generative models to produce motor control signals stimulating legged robots to act like real animals. Then, we shape various traversing capabilities at a higher level to align with the environment by reusing the primitive module. Finally, a strategic module is trained focusing on complex downstream tasks by reusing the knowledge from previous levels. We apply the trained hierarchical controllers to the MAX robot, a quadrupedal robot developed in-house, to mimic animals, traverse complex obstacles and play in a designed challenging multi-agent chase tag game, where lifelike agility and strategy emerge in the robots.” ✭ New framework enables animal-like agile movements in four-legged robots “Researchers at Tencent Robotics X in China recently introduced a new hierarchical framework that could facilitate the execution of animal-like agile movements in four-legged robots. This framework, introduced in a paper published in Nature Machine Intelligence, was initially applied to a quadrupedal robot called MAX, yielding highly promising results. "Numerous efforts have been made to achieve agile locomotion in quadrupedal robots through classical controllers or reinforcement learning approaches," Lei Han, Qingxu Zhu and their colleagues wrote in their paper. "These methods usually rely on physical models or handcrafted rewards to accurately describe the specific system, rather than on a generalized understanding like animals do. We propose a hierarchical framework to construct primitive-, environmental- and strategic-level knowledge that is all pre-trainable, reusable and enrichable for legged robots."”
✭[2407.07775] Mobility VLA: Multimodal Instruction Navigation with Long-Context VLMs and Topological Graphs “An elusive goal in navigation research is to build an intelligent agent that can understand multimodal instructions including natural language and image, and perform useful navigation. To achieve this, we study a widely useful category of navigation tasks we call Multimodal Instruction Navigation with demonstration Tours (MINT), in which the environment prior is provided through a previously recorded demonstration video. Recent advances in Vision Language Models (VLMs) have shown a promising path in achieving this goal as it demonstrates capabilities in perceiving and reasoning about multimodal inputs. However, VLMs are typically trained to predict textual output and it is an open research question about how to best utilize them in navigation. To solve MINT, we present Mobility VLA, a hierarchical Vision-Language-Action (VLA) navigation policy that combines the environment understanding and common sense reasoning power of long-context VLMs and a robust low-level navigation policy based on topological graphs. The high-level policy consists of a long-context VLM that takes the demonstration tour video and the multimodal user instruction as input to find the goal frame in the tour video. Next, a low-level policy uses the goal frame and an offline constructed topological graph to generate robot actions at every timestep. We evaluated Mobility VLA in a 836m^2 real world environment and show that Mobility VLA has a high end-to-end success rates on previously unsolved multimodal instructions such as "Where should I return this?" while holding a plastic bin.” ✭ DeepMind demonstrates a robot capable of giving context-based guided tours of an office building “A team of roboticists and AI specialists at Google's DeepMind have demonstrated a robot capable of giving context-based guided tours around its offices. They have posted a paper describing their work, along with demonstration videos, on the arXiv preprint server. … In this new effort, the research team gave RT-2 robots AI capabilities via Gemini 1.5 Pro and used it to allow the robot to perform sophisticated activities. The robot can listen to a person it is guiding, parse a request and translate it into behavior. As an example, one researcher asked the robot to take it to a place in the office where writing or drawing could be done. The robot thought about the request for approximately 30 seconds and then guided the person to a place where a whiteboard had been attached to the wall in one of the offices.”
✭ scHolography: a computational method for single-cell spatial neighborhood reconstruction and analysis | Genome Biology | Full Text “Spatial transcriptomics has transformed our ability to study tissue complexity. However, it remains challenging to accurately dissect tissue organization at single-cell resolution. Here we introduce scHolography, a machine learning-based method designed to reconstruct single-cell spatial neighborhoods and facilitate 3D tissue visualization using spatial and single-cell RNA sequencing data. scHolography employs a high-dimensional transcriptome-to-space projection that infers spatial relationships among cells, defining spatial neighborhoods and enhancing analyses of cell–cell communication. When applied to both human and mouse datasets, scHolography enables quantitative assessments of spatial cell neighborhoods, cell–cell interactions, and tumor-immune microenvironment. Together, scHolography offers a robust computational framework for elucidating 3D tissue organization and analyzing spatial dynamics at the cellular level.”
👀Watching: Ilya Sutskever: “AI will have a human brain that can think for itself”, The "Modern Day Slaves" Of The AI Tech World
Ilya Sutskever |AI will have a human brain that can think for itself |AI security be taken seriously
The "Modern Day Slaves" Of The AI Tech World ✭ The "Modern Day Slaves" Of The AI Tech World - YouTube “In 2027, virtual assistants like Sarah seamlessly handle our daily tasks, making life appear effortless. But behind this technological marvel lies a hidden reality. This documentary unveils the untold story of the ghost workers who power our digital world, performing the tedious and often traumatic tasks that machines can't handle. ~ Meet the invisible workforce behind tech giants like Google, Facebook, Amazon, and Uber. These underpaid and disposable workers label images, moderate content, and train AI systems, often earning less than minimum wage. Their work is essential yet remains in the shadows, unacknowledged by the companies that depend on them.”
🖲️AI Art-Research: FoUBARthes: Death of the Author (Dayna McLeod), Odyssey, Gen-3 Alpha Transitions, World Skins (Kyle Goodrich)
✭ FoUBARthes: Death of the Author (Dayna McLeod) “Media performance artist Dayna McLeod asked ChatGPT to write an increasingly snarky and heated dialogue between Roland Barthes and Michel Foucault about The Death of the Author, inspired by Barthes.”
✭ Odyssey “We need to decide how AI tells stories. We need to hold AI to higher standards. A short trip around the web will reveal that we’re inundated with low-quality AI-generated content. Content farms, spam bots, and even well-intentioned companies are using AI to churn out text and imagery, with the goal of gamifying algorithms and capturing your attention. If not done right, AI video generation could head in a similar direction, where we are inundated with random videos that have no spark or story. On a long-enough timespan, perhaps we become addicted to these junk-food videos, forgetting what high-quality human storytelling looks like. Perhaps humans are relegated to storywatchers—not storytellers. ~ At Odyssey, we reject this future. Humans telling stories is too important to our way of life, and professional storytellers have proven that they have so much to offer. Similarly, so does powerful AI, if we build it right. Instead of replacing humans with algorithms that optimize for clicks, we believe a new visual AI should be placed in the hands of professional storytellers. This visual AI should enable them to not only generate stunning video, but also to precisely direct it, and to tell the epic tales stuck in their head.”
✭Runway on X: "Gen-3 Alpha can create surreal and novel transitions between all kinds of objects, animals and characters. “To learn more, read the prompt guide: http://bit.ly/3Lgmgmj” → ✭ Proper on X: "Gen-3 excels at handling impossible transitions of objects and scenes. 10 wild examples
⚔️War (wAIr) 📚Retroactive Readings: Prepare for AI Hackers (Harvard Magazine | Feb 2023)
✭ Prepare for AI Hackers (Harvard Magazine | Feb 2023) “Human systems of all kinds may soon be vulnerable to subversion by artificial intelligence. ~ In partnership with the Defense Advanced Research Projects Agency (DARPA), a branch of the U.S. Department of Defense that funds research on breakthrough technologies, DEF CON hosted this first—and to date only—Cyber Grand Challenge, a hacking competition where artificial-intelligence systems competed to autonomously hack computer programs defended by other AIs. The competition was structured as a game: the AI that was best able to find and exploit vulnerabilities in other systems while protecting its own system from being hacked during the 10-hour competition would earn its creators a $2-million prize.”
✭2019: Jessica Flack on X: "The worrying singularity is not AGI but a full-fledged adaptive engineering of society thru control of individual + collective behavior using AI + big data ~ @nybooks Data Leviathan & the Surveillance State http://tinyurl.com/y3zeu8l7 ~ @LRB Document Number Nine http://tinyurl.com/y2vojnke