Open Interpreter, StableAudio, Davin Heckman’s Re-Riposte, Decoding by Contrasting Layers (DoLa), Considering Biased Data as Informative Artifacts, POM (Principal Odor Map), Claude Pro, AI Essay Mills
🔋Sept 12-18th: + AI design, MediaPipe FaceStylizer, Descript, AGI, NExT-GPT, visuotactile neuron, Generative Image Dynamics, AI Audio & Animation Tools, The Jagged Frontier, SDXL experiments, Rabbots
Single-Prompt experiment: “intriguing parameters”
🏓 Observations: Davin Heckman’s Re-Riposte, What Ilya Sutskever Really Wants
✭ Davin Heckman’s Re-Riposte › electronic book review “Over 50 years ago, Lyotard’s concern was that the computer age would mean the end of the University as a cultural institution. And even as the University is in crisis, we still can’t seem to see why. If it takes more than a half a century to arrive at a partial understanding of Lyotard, how can we even begin to hope to process the impact of the machine-driven acceleration we are undergoing right now? ~ If I suspend the customary techno-bravado of being a digital media scholar, I am not even remotely confident that I will be able to tell which papers are real or fake next semester. I can kind of sniff things out right now, but I still get help from AI to do it. Will I be able to tell the difference tomorrow? How about a month from now? Does it even matter anymore? Maybe Lyotard was right, what if my job is just to manage the collapse?”
✭ What Ilya Sutskever Really Wants - by Nirit Weiss-Blatt “Back in May 2023, before Ilya Sutskever started to speak at the event, I sat next to him and told him, “Ilya, I listened to all of your podcast interviews. And unlike Sam Altman, who spread the AI panic all over the place, you sound much more calm, rational, and nuanced. I think you do a really good service to your work, to what you develop, to OpenAI.” He blushed a bit, and said, “Oh, thank you. I appreciate the compliment.” ~ An hour and a half later, when we finished this talk, I looked at my friend and told her, “I’m taking back every single word that I said to Ilya.” ~ He freaked the hell out of people there. And we’re talking about AI professionals who work in the biggest AI labs in the Bay area. They were leaving the room, saying, “Holy shit.””
🔎 Research: DoLa (Decoding by Contrasting Layers), Biased Data as Informative Artifacts in AI-Assisted Health Care, POM (Principal Odor Map), NExT-GPT, Bio-Inspired Visuotactile Neuron For Multisensory Integration, Generative Image Dynamics, MediaPipe FaceStylizer, Field Experimental Evidence of the Effects of AI
✭ [2309.03883] DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models “Despite their impressive capabilities, large language models (LLMs) are prone to hallucinations, i.e., generating content that deviates from facts seen during pretraining. We propose a simple decoding strategy for reducing hallucinations with pretrained LLMs that does not require conditioning on retrieved external knowledge nor additional fine-tuning. Our approach obtains the next-token distribution by contrasting the differences in logits obtained from projecting the later layers versus earlier layers to the vocabulary space, exploiting the fact that factual knowledge in an LLMs has generally been shown to be localized to particular transformer layers. We find that this Decoding by Contrasting Layers (DoLa) approach is able to better surface factual knowledge and reduce the generation of incorrect facts. DoLa consistently improves the truthfulness across multiple choices tasks and open-ended generation tasks, for example improving the performance of LLaMA family models on TruthfulQA by 12-17% absolute points, demonstrating its potential in making LLMs reliably generate truthful facts.”
✭ Considering Biased Data as Informative Artifacts in AI-Assisted Health Care | NEJM “Technical solutions such as attempting to fix biased clinical data used for AI training are well intentioned, but what undergirds all these initiatives is the notion that skewed clinical data are “garbage,” as in the computer science adage “garbage in, garbage out.” Instead, we propose thinking of clinical data as artifacts that, when examined, can be informative of societies and institutions in which they are found. ~ Viewing biased clinical data as artifacts can identify values, practices, and patterns of inequity in medicine and health care. Examining clinical data as artifacts can also provide alternatives to current methods of medical AI development. Moreover, this framing of data as artifacts expands the approach to fixing biased AI from a narrowly technical view to a sociotechnical perspective that considers historical and current social contexts as key factors in addressing bias. This broader approach contributes to the public health goal of understanding population inequities and also provides novel ways to use AI as a means of detecting patterns of racial and ethnic correction, missing data, and population inequities that are relevant to health equity.”✭ NIH launches Bridge2AI program to expand the use of artificial intelligence in biomedical and behavioral research | National Institutes of Health (NIH) “Generating high-quality ethically sourced data sets is crucial for enabling the use of next-generation AI technologies that transform how we do research.” ✭ How an archaeological approach can help leverage biased data in AI to improve medicine
✭ A principal odor map unifies diverse tasks in olfactory perception | Science “Editor’s summary: For vision and hearing, there are well-developed maps that relate physical properties such as frequency and wavelength to perceptual properties such as pitch and color. The sense of olfaction does not yet have such a map. Using a graph neural network, Lee et al. developed a principal odor map (POM) that faithfully represents known perceptual hierarchies and distances. This map outperforms previously published models to the point that replacing a trained human’s responses with the model output would improve overall panel description. The POM coordinates were able to predict odor intensity and perceptual similarity, even though these perceptual features were not explicitly part of the model training. These results were used to build a variety of olfactory predictions that outperformed previous feature sets even without fine-tuning.”
✭ NExT-GPT “an end-to-end general-purpose any-to-any MM-LLM system, NExT-GPT. We connect an LLM with multimodal adaptors and different diffusion decoders, enabling NExT-GPT to perceive inputs and generate outputs in arbitrary combinations of text, images, videos, and audio. By leveraging the existing well-trained highly-performing encoders and decoders, NExT-GPT is tuned with only a small amount of parameter (1%) of certain projection layers, which not only benefits low-cost training and also facilitates convenient expansion to more potential modalities. Moreover, we introduce a modality-switching instruction tuning (MosIT) and manually curate a high-quality dataset for MosIT, based on which NExT-GPT is empowered with complex cross-modal semantic understanding and content generation. Overall, our research showcases the promising possibility of building an AI agent capable of modeling universal modalities, paving the way for more human-like AI research in the community.”
✭A bio-inspired visuotactile neuron for multisensory integration | Nature Communications “Multisensory integration is a salient feature of the brain which enables better and faster responses in comparison to unisensory integration, especially when the unisensory cues are weak. Specialized neurons that receive convergent input from two or more sensory modalities are responsible for such multisensory integration. Solid-state devices that can emulate the response of these multisensory neurons can advance neuromorphic computing and bridge the gap between artificial and natural intelligence. Here, we introduce an artificial visuotactile neuron based on the integration of a photosensitive monolayer MoS2 memtransistor and a triboelectric tactile sensor which minutely captures the three essential features of multisensory integration, namely, super-additive response, inverse effectiveness effect, and temporal congruency. We have also realized a circuit which can encode visuotactile information into digital spiking events, with probability of spiking determined by the strength of the visual and tactile cues. We believe that our comprehensive demonstration of bio-inspired and multisensory visuotactile neuron and spike encoding circuitry will advance the field of neuromorphic computing, which has thus far primarily focused on unisensory intelligence and information processing” ✭ Making AI smarter with an artificial, multisensory integrated neuron “The researchers fabricated the multisensory neuron by connecting a tactile sensor to a phototransistor based on a monolayer of molybdenum disulfide, a compound that exhibits unique electrical and optical characteristics useful for detecting light and supporting transistors. The sensor generates electrical spikes in a manner reminiscent of neurons processing information, allowing it to integrate both visual and tactile cues.”
✭Generative Image Dynamics | Google Research “We present an approach to modeling an image-space prior on scene dynamics. Our prior is learned from a collection of motion trajectories extracted from real video sequences containing natural, oscillating motion such as trees, flowers, candles, and clothes blowing in the wind. Given a single image, our trained model uses a frequency-coordinated diffusion sampling process to predict a per-pixel long-term motion representation in the Fourier domain, which we call a neural stochastic motion texture. This representation can be converted into dense motion trajectories that span an entire video. Along with an image-based rendering module, these trajectories can be used for a number of downstream applications, such as turning still images into seamlessly looping dynamic videos, or allowing users to realistically interact with objects in real pictures.”
✭ MediaPipe FaceStylizer: On-device real-time few-shot face stylization – Google Research Blog “we introduce MediaPipe FaceStylizer, an efficient design for few-shot face stylization that addresses the aforementioned model complexity and data efficiency challenges while being guided by Google’s responsible AI Principles. The model consists of a face generator and a face encoder used as GAN inversion to map the image into latent code for the generator. We introduce a mobile-friendly synthesis network for the face generator with an auxiliary head that converts features to RGB at each level of the generator to generate high quality images from coarse to fine granularities. We also carefully designed the loss functions for the aforementioned auxiliary heads and combined them with the common GAN loss functions to distill the student generator from the teacher StyleGAN model, resulting in a lightweight model that maintains high generation quality. The proposed solution is available in open source through MediaPipe. Users can fine-tune the generator to learn a style from one or a few images using MediaPipe Model Maker, and deploy to on-device face stylization applications with the customized model using MediaPipe FaceStylizer.”
✭ [2309.08586] Replacing softmax with ReLU in Vision Transformers “Previous research observed accuracy degradation when replacing the attention softmax with a point-wise activation such as ReLU. In the context of vision transformers, we find that this degradation is mitigated when dividing by sequence length. Our experiments training small to large vision transformers on ImageNet-21k indicate that ReLU-attention can approach or match the performance of softmax-attention in terms of scaling behavior as a function of compute.”
✭Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality by Fabrizio Dell'Acqua, Edward McFowland, Ethan R. Mollick, Hila Lifshitz-Assaf, Katherine Kellogg, Saran Rajendran “The public release of Large Language Models (LLMs) has sparked tremendous interest in how humans will use Artificial Intelligence (AI) to accomplish a variety of tasks. In our study conducted with Boston Consulting Group, a global management consulting firm, we examine the performance implications of AI on realistic, complex, and knowledge-intensive tasks. The pre-registered experiment involved 758 consultants comprising about 7% of the individual contributor-level consultants at the company. After establishing a performance baseline on a similar task, subjects were randomly assigned to one of three conditions: no AI access, GPT-4 AI access, or GPT-4 AI access with a prompt engineering overview. We suggest that the capabilities of AI create a “jagged technological frontier” where some tasks are easily done by AI, while others, though seemingly similar in difficulty level, are outside the current capability of AI. For each one of a set of 18 realistic consulting tasks within the frontier of AI capabilities, consultants using AI were significantly more productive (they completed 12.2% more tasks on average, and completed task 25.1% more quickly), and produced significantly higher quality results (more than 40% higher quality compared to a control group). Consultants across the skills distribution benefited significantly from having AI augmentation, with those below the average performance threshold increasing by 43% and those above increasing by 17% compared to their own scores. For a task selected to be outside the frontier, however, consultants using AI were 19 percentage points less likely to produce correct solutions compared to those without AI. Further, our analysis shows the emergence of two distinctive patterns of successful AI use by humans along a spectrum of human-AI integration. One set of consultants acted as “Centaurs,” like the mythical halfhorse/half-human creature, dividing and delegating their solution-creation activities to the AI or to themselves. Another set of consultants acted more like “Cyborgs,” completely integrating their task flow with the AI and continually interacting with the technology.”
🛠️ Tech: Open Interpreter, Stable Audio, Descript, Guidance Language for LLMs, Claude Pro, AI Essay Mills, AI to design buildings with less concrete
✭The Open Interpreter Project “A new way to use computers. Open Interpreter lets LLMs run code on your computer to complete tasks” ✭ KillianLucas/open-interpreter: OpenAI's Code Interpreter in your terminal, running locally “Open Interpreter lets LLMs run code (Python, Javascript, Shell, and more) locally. You can chat with Open Interpreter through a ChatGPT-like interface in your terminal by running $ interpreter after installing. ~ This provides a natural-language interface to your computer's general-purpose capabilities: Create and edit photos, videos, PDFs, etc. Control a Chrome browser to perform research. Plot, clean, and analyze large datasets...etc. ⚠️ Note: You'll be asked to approve code before it's run.”
✭ Stable Audio: Fast Timing-Conditioned Latent Audio Diffusion — Stability AI “Stable Audio, a latent diffusion model architecture for audio conditioned on text metadata as well as audio file duration and start time, allowing for control over the content and length of the generated audio. This additional timing conditioning allows us to generate audio of a specified length up to the training window size. ~ Working with a heavily downsampled latent representation of audio allows for much faster inference times compared to raw audio. Using the latest advancements in diffusion sampling techniques, our flagship Stable Audio model is able to render 95 seconds of stereo audio at a 44.1 kHz sample rate in less than one second on an NVIDIA A100 GPU.” ✭ Stable Audio - Generative AI for music & sound fx “Start generating music for free. No credit card needed.”
✭ GitHub - guidance-ai/guidance: A guidance language for controlling large language models.
✭Descript | All-in-one video & podcast editing, easy as a doc. AI delete or overdub with your synthetic voice.
✭Anthropic Introducing Claude Pro “a paid plan … currently available in the US and UK.”
✭Nasdaq on X: "⚡ Bringing AI to the capital markets: Today, @Nasdaq announced it has received @SECGov approval to launch Dynamic Midpoint Extended Life Order (M-ELO), the first artificial intelligence (AI) powered order type. ✭ Nasdaq receives SEC approval for AI-based trade orders “By adjusting the holding periods for orders in real time, as opposed to the traditional system that simply applies static timeouts to orders, fill rates should increase without a significant increase in market impact.”
✭ Companies that use AI to pen essays are advertising on Tiktok and Meta “As a result of chatbots’ unreliability, many essay mills, which produce content for a fee, are touting that they combine both AI and human labor to create an end product that is undetectable by software designed to catch cheating. And, according to a new analysis published in open-access repository arXiv, such mills are soliciting clients on TikTok and Meta platforms—despite the fact that the practice is illegal in a number of countries, including England, Wales, Australia, and New Zealand.” ✭ SocArXiv Papers | AI Providers as Criminal Essay Mills? Large Language Models meet Contract Cheating Law “there is already a significant market of AI-enhanced essay mills, many of which are developing features directly designed to frustrate education providers’ current attempts to detect and mitigate the academic integrity implications of AI generated work. Secondly, some jurisdictions have scoped their laws so widely, that it is hard to see how ‘general purpose’ large language models such as Open AI’s GPT-4 or Google’s Bard would not fall into their provisions, and thus be committing a criminal offence through their provision. This is particularly the case in England and Wales and in Australia. Thirdly, the boundaries between assistance and cheating are being directly blurred by essay mills utilizing AI tools. Most enforcement, given the nature of the academic cheating regimes, we suspect will result from private enforcement, rather than prosecutions. These regimes interact in important and until now unexplored ways with other legal regimes, such as the EU’s Digital Services Act, the UK’s proposed Online Safety Bill, and contractual governance mechanisms such as the terms of service of AI API providers, and the licensing terms of open source models.”
✭ MIT student uses AI to design buildings with less concrete “Concrete is responsible for 8 percent of the world's carbon emissions.”
🍉 Watching/Listening : AGI (Sutskever), Stanford CS330: Deep Multi-Task and Meta Learning, Rabbots
Ilya Sutskever - Opening Remarks: Confronting the Possibility of AGI “Every single stunning example of creativity in AI comes from a reinforcement learning… AGI and superintelligence are definitely possible, extremely likely in our lifetime, perhaps a lot sooner.”
10 Free AI Animation Tools: Bring Images to Life
LeiaPix: https://convert.leiapix.com/
CapCut: https://www.capcut.com (3D zoom only works on mobile)
Pika Labs: https://www.pika.art discord: https://discord.gg/pika
InstaVerse: https://ilumineai.github.io/instaverse/
Animated Drawings: https://sketch.metademolab.com/
GenMo: https://www.genmo.ai/
D-ID: https://www.d-id.com/
HeyGen: https://bit.ly/heygenavatar
SadTalker: https://huggingface.co/spaces/vinthon...
Kaiber: https://kaiber.ai/?via=obscurious
AI Audio Generation Tools
Stable Audio (grainy but intriguing: released on Sept 12th by stablediffusion creators Stability AI)
Riffusion (stable-diffusion audio-spectrograms convert prompts to soundtracks)
AI Music Generator - SOUNDRAW (good, but expensive)
AIVA - The AI composing emotional soundtrack music (kitschy: 3 free downloads per month)
Boomy (lacks fine-grained control, costs)
ONGOING (to be skim-watched) ✭ (264) Stanford CS330: Deep Multi-Task and Meta Learning I Autumn 2022 - YouTube “While deep learning has achieved remarkable success in many problems such as image classification, natural language processing, and speech recognition, these models are, to a large degree, specialized for the single task they are trained for. This course will cover the setting where there are multiple tasks to be solved, and study how the structure arising from multiple tasks can be leveraged to learn more efficiently or effectively. ~ This includes: self-supervised pre-training for downstream few-shot learning and transfer learning, meta-learning methods that aim to learn efficient learning algorithms that can learn new tasks quickly curriculum and lifelong learning, where the problem requires learning a sequence of tasks, leveraging their shared structure to enable knowledge transfer. ~ This is a graduate-level course. By the end of the course, students will be able to understand and implement the state-of-the-art multi-task learning and meta-learning algorithms and be ready to conduct research on these topics.”
🎈Demos / Playing
Intrinsic Interconnectedness (Clipdrop SDXL & HeyGen)
SDXL single-prompt experiments: "intriguing parameters"html |vimeo & "rabbit with robot eyes"html |vimeo & "esoteric"html |vimeo & "voynich algorithm"html |vimeo & "profound"html |vimeo & "intrinsic"html |vimeo & “abnormal” html | vimeo & "normal"html |vimeo
Audio prompt: “Soundtrack for AI Spring” (generated by StableAudio)

