TextGenEd, Global Challenges (Anima Anandkumar), Books Train AI. Good. (Bogost), Full Stack Search, LLMs for Health, Nvidia Illumina Healthcare, Meta Connect, Meta-DisConnect, LLM: ‘Local’ LMs
🥭 Sept 26th - Oct 2nd: Video Outpainting, GPT-4V(ision), Mistral 7B, Instant evolution: AI design of robot, Mandala with Eyeball, Feral Humans
🏓 Observations: TextGenEd, Anima Anandkumar on Using Generative AI to Tackle Global Challenges, My Books Were Used to Train Meta’s Generative AI. Good. (Ian Bogost)
✭TextGenEd - The WAC Clearinghouse “At the cusp of this moment defined by generative AI, TextGenEd collects early experiments in pedagogy with generative text technology, including but not limited to AI. The resources in this collection will help writing teachers to integrate computational writing technologies into their assignments. Many of the assignments ask teachers and students to critically probe the affordances and limits of computational writing tools. Some assignments ask students to generate Markov chains (statistically sequenced language blocks) or design simple neural networks and others ask students to use AI platforms in order to critique or gain fluency with them. A few assignments require teachers to have significant familiarity with text generation technologies in order to lead students, but most are set up to allow teachers and students to explore together.”
✭Anima Anandkumar on Using Generative AI to Tackle Global Challenges - Ep. 203 by The AI Podcast “Generative AI-based models can not only learn and understand natural languages — they can learn the very language of nature itself, presenting new possibilities for scientific research. Anima Anandkumar, Bren Professor at Caltech and senior director of AI research at NVIDIA, was recently invited to speak at the President’s Council of Advisors on Science and Technology. At the talk, Anandkumar says that generative AI was described as “an inflection point in our lives,” with discussions swirling around how to “harness it to benefit society and humanity through scientific applications.” On the latest episode of NVIDIA’s AI Podcast, host Noah Kravitz spoke with Anandkumar on generative AI’s potential to make splashes in the scientific community. It can, for example, be fed DNA, RNA, viral and bacterial data to craft a model that understands the language of genomes. That model can help predict dangerous coronavirus variants to accelerate drug and vaccine research. Generative AI can also predict extreme weather events like hurricanes or heat waves. Even with an AI boost, trying to predict natural events is challenging because of the sheer number of variables and unknowns. However, Anandkumar explains that it’s not just a matter of upsizing language models or adding compute power — it’s also about fine-tuning and setting the right parameters.”
✭ My Books Were Used to Train Meta’s Generative AI. Good. - Ian Bogost | The Atlantic “Whether or not Meta’s behavior amounts to infringement is a matter for the courts to decide. Permission is a different matter. One of the facts (and pleasures) of authorship is that one’s work will be used in unpredictable ways. The philosopher Jacques Derrida liked to talk about “dissemination,” which I take to mean that, like a plant releasing its seed, an author separates from their published work. Their readers (or viewers, or listeners) not only can but must make sense of that work in different contexts. A retiree cracks a Haruki Murakami novel recommended by a grandchild. A high-school kid skims Shakespeare for a class. My mother’s tree trimmer reads my book on play at her suggestion. A lack of permission underlies all of these uses, as it underlies influence in general: When successful, art exceeds its creator’s plans.”
🛠️ Tech: Large Language Models for Healthcare (Part 2), Nvidia Illumina Acquisition: The AI Foundry for Healthcare, Full Stack Search, Meta Connect: Quest 3 and Ray-Ban smart glasses (Zuckerberg), Meta-DisConnect, LLM: ‘Local’ Language Model, GPT-4V(ision), Mistral 7B
✭ Large Language Models for Healthcare (Part 2) “2 trillion gigabytes of healthcare data are generated every year — the majority of which are unstructured. Structured data are usable. You can store it in a database, load it into Excel, analyze it, generate charts, and share it easily. Unstructured data — diagnostic reports, physicians’ notes, voice memos, and transcripts — are not so easy. Historically, the best way to get information out of documents was to pay people to organize it (read: copy and paste it into Excel). Old-school NLP systems could help somewhat. ... In the new age of language modeling, healthcare records and documentation are vastly more usable and useful.”
✭ Nvidia Illumina Acquisition: The AI Foundry for Healthcare – The Hardware of Life “With many hospitals using different electronic health record (EHR) implementations, the inability to easily access EHRs for a patient from a different healthcare provider or run data analysis across a huge number of EHR records robs the healthcare industry of access to game changing data to drive drug development and learnings for better clinical treatment. ~ Using Large Language Models (LLMs) to ingest the huge volumes of unstructured data within electronic health records could enable the use of this information in diagnosis or treatment, particularly in urgent and critical care, and provide a wealth of insights in research settings. LLMs are also well suited towards categorizing data and streamlining data portability issues, potentially addressing the current situation where individual patients’ EHRs at different hospitals or clinics effectively live on their own island.”
✭ Full Stack Search “Search for an author (is it in the OpenAI GPT-4 training set?)”
✭Meta Connect | Keynote Zuckerberg | Sept 27th 2023 “Join Mark Zuckerberg and special guests as they unveil the new Meta Quest 3 and reveal how Meta is bringing mixed reality to life. You’ll hear how AI can help people connect and express themselves in new ways and get a first look at the latest products and updates that will help developers build the future of social technology.: ‘’... focussed on building the future of human connection.”” ✭ Meet the A.I. Jane Austen: Meta Weaves A.I. Throughout Its Apps - The New York Times “...[virtual AI] characters were part of a suite of products that Meta introduced on Wednesday — all powered by artificial intelligence — and that will soon be found throughout its products, including Instagram, Messenger, and virtual- and augmented-reality devices like the Quest 3 headset and Ray-Ban Stories smart glasses. The rollout also includes a chatbot that will be powered partly by Microsoft’s Bing search engine, as well as A.I.-assisted image-editing tools to use on Instagram.” [My opinion in-process draft: Meta-Disconnect: there is nothing meta about Meta (except the math in AnyMAL)]
✭ LLM: A CLI utility and Python library for interacting with Large Language Models | Simon Willison “An open-source CLI utility and Python library for interacting with Large Language Models, both via remote APIs and models that can be installed and run on your own machine. Run prompts from the command-line, store the results in SQLite, generate embeddings and more. ~ … Accessing Llama 2 from the command-line with the llm-replicate plugin. Run Llama 2 on your own Mac using LLM and Homebrew.”
✭ GPT-4V(ision) system card | OpenAI “GPT-4 with vision (GPT-4V) enables users to instruct GPT-4 to analyze image inputs provided by the user, and is the latest capability we are making broadly available. Incorporating additional modalities (such as image inputs) into large language models (LLMs) is viewed by some as a key frontier in artificial intelligence research and development. Multimodal LLMs offer the possibility of expanding the impact of language-only systems with novel interfaces and capabilities, enabling them to solve new tasks and provide novel experiences for their users. In this system card, we analyze the safety properties of GPT-4V. Our work on safety for GPT-4V builds on the work done for GPT-4 and here we dive deeper into the evaluations, preparation, and mitigation work done specifically for image inputs.”
✭Mistral 7B | Mistral AI | Open source models “Mistral 7B, the most powerful language model for its size to date. Mistral 7B is a 7.3B parameter model that: Outperforms Llama 2 13B on all benchmarks, Outperforms Llama 1 34B on many benchmarks, Approaches CodeLlama 7B performance on code, while remaining good at English tasks, Uses Grouped-query attention (GQA) for faster inference, Uses Sliding Window Attention (SWA) to handle longer sequences at smaller cost.”
🔎 Research: AnyMAL: Any-Modality LLM, Video Outpainting, Instant evolution: AI design of robot
✭AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model | Meta “We present Any-Modality Augmented Language Model (AnyMAL), a unified model that reasons over diverse input modality signals (i.e. text, image, video, audio, IMU motion sensor), and generates textual responses. AnyMAL inherits the powerful text-based reasoning abilities of the state-of-the-art LLMs including LLaMA-2 (70B), and converts modality-specific signals to the joint textual space through a pre-trained aligner module. To further strengthen the multimodal LLM's capabilities, we fine-tune the model with a multimodal instruction set manually collected to cover diverse topics and tasks beyond simple QAs. We conduct comprehensive empirical analysis comprising both human and automatic evaluations, and demonstrate state-of-the-art performance on various multimodal tasks.” Released on arxiv on same day as Meta Connect | Keynote Zuckerberg
✭Hierarchical Masked 3D Diffusion Model for Video Outpainting “Video outpainting aims to adequately complete missing areas at the edges of video frames. Compared to image outpainting, it presents an additional challenge as the model should maintain the temporal consistency of the filled area. In this paper, we introduce a masked 3D diffusion model for video outpainting. We use the technique of mask modeling to train the 3D diffusion model. This allows us to use multiple guide frames to connect the results of multiple video clip inferences, thus ensuring temporal consistency and reducing jitter between adjacent frames. Meanwhile, we extract the global frames of the video as prompts and guide the model to obtain information other than the current video clip using cross-attention. We also introduce a hybrid coarse-to-fine inference pipeline to alleviate the artifact accumulation problem. The existing coarse-to-fine pipeline only uses the infilling strategy, which brings degradation because the time interval of the sparse frames is too large. Our pipeline benefits from bidirectional learning of the mask modeling and thus can employ a hybrid strategy of infilling and interpolation when generating sparse frames. Experiments show that our method achieves state-of-the-art results in video outpainting tasks.”
✭ Instant evolution: AI designs new robot from scratch in seconds “The computer started with a block about the size of a bar of soap. It could jiggle but definitely not walk. Knowing that it had not yet achieved its goal, AI quickly iterated on the design. With each iteration, the AI assessed its design, identified flaws and whittled away at the simulated block to update its structure. Eventually, the simulated robot could bounce in place, then hop forward and then shuffle. Finally, after just nine tries, it generated a robot that could walk half its body length per second—about half the speed of an average human stride. ~ The entire design process—from a shapeless block with zero movement to a full-on walking robot—took just 26 seconds on a laptop.” ✭ [2306.03263] Efficient automatic design of robots “Robots are notoriously difficult to design because of complex interdependencies between their physical structure, sensory and motor layouts, and behavior. Despite this, almost every detail of every robot built to date has been manually determined by a human designer after several months or years of iterative ideation, prototyping, and testing. Inspired by evolutionary design in nature, the automated design of robots using evolutionary algorithms has been attempted for two decades, but it too remains inefficient: days of supercomputing are required to design robots in simulation that, when manufactured, exhibit desired behavior. Here we show for the first time de-novo optimization of a robot's structure to exhibit a desired behavior, within seconds on a single consumer-grade computer, and the manufactured robot's retention of that behavior. Unlike other gradient-based robot design methods, this algorithm does not presuppose any particular anatomical form; starting instead from a randomly-generated apodous body plan, it consistently discovers legged locomotion, the most efficient known form of terrestrial movement. If combined with automated fabrication and scaled up to more challenging tasks, this advance promises near instantaneous design, manufacture, and deployment of unique and useful machines for medical, environmental, vehicular, and space-based tasks.
🍉 Watching/Listening : Mustafa Suleyman & Yuval Noah Harari | What does the AI revolution mean for our future?, Mark Zuckerberg: First Interview in the Metaverse (Lex Fridman Podcast)
Random Pika user: Néreus - AI Video
As often: An Actually Big Week in AI: AutoGen, The A-Phone, Mistral 7B, GPT-Fathom and Meta Hunts CharacterAI
Mustafa Suleyman & Yuval Noah Harari -FULL DEBATE- What does the AI revolution mean for our future? The Economist
✭ #398 – Mark Zuckerberg: First Interview in the Metaverse | Lex Fridman Podcast
🎈Demos / Playing : Mandala with Eyeball, Feral Humans Spotted in Arcana National Park for First Time
mandala with eye in centre (another ai-film experiment) 2m3s Entire sequence. Audio: Hildegard Von Bingen, Canticles of Ecstasy
3D hyper-real eyeball with intricate mandala iris (SDXL, Pika, Riffusion, DaVinci, Topaz AI) (1m3s with Riffusion audio)
TEST : eye-mandala-sept30-speed555-noadditivecrossdissolve (23s with Riffusion audio)
TEST : mandala with multi-coloured eye (14s with Riffusion audio)
Synaptic Microbiome Recursion Attunement : rapid metamorphic transformations (SDXL & Pika Sept 25, 2023) 4K (39s with Riffusion audio)
Synaptic Microbiome Recursion (SDXL Sept 24, 2023) 4K (1m52s with Riffusion audio)
Feral Humans Spotted in Arcana National Park for First Time (October 3rd, 2023. GPT-4 reporting for The Vigil)


