Dec 4-11th 🕸️: GNoME, A-Lab, AlphaCode2, Gemini, MMMU, Pika, EU AI Act, fUS BMI, Looking Glass Go, SplaTAM, Loudly, Imagine, Gaussian Codec Avatars, Jailbreak GPT-4, Exploiting Criticality
The Wizard of AI, Fireship, Jacobi, Anduril Lattice, Cartagena, ChatGPT: who is using it, how and why? (nature), Haerlin
🕸️: Graph Networks for Materials Exploration (GNoME), A-Lab (autonomous laboratory), AlphaCode2, Gemini (Google DeepMind), MMMU, ChatGPT: who is using it, how and why? (nature), The impact of AI on photography (Alejandro Cartagena), Pika (Forbes), BigTech’s Efforts to Derail the EU AI Act, Functional ultrasound (fUS) BMI, Looking Glass Go: Holograms X Radiance Fields, SplaTAM: Map 3D Gaussians for Dense RGB-D SLAM, Loudly's Vision For Democratizing Creation, Imagine (Meta AI), Relightable Gaussian Codec Avatars, Low-Resource Languages Jailbreak GPT-4, Exploiting Criticality in Spiking Neural Networks, Are you Ready (Frans Jacobi), The Wizard of AI (Alan Warburton), Fireship, Anduril Lattice (“accelerates complex kill chains”), Haerlin
🏓 Observations: ChatGPT: who is using it, how and why? (nature), The impact of AI on photography (Alejandro Cartagena), BigTech’s Efforts to Derail the EU AI Act, Working with AI: Two paths to prompting (Ethan Mollick)
✭ ChatGPT one year on: who is using it, how and why? | nature “On 30 November 2022, the technology company OpenAI released ChatGPT — a chatbot built to respond to prompts in a human-like manner. It has taken the scientific community and the public by storm, attracting one million users in the first 5 days alone; that number now totals more than 180 million. Seven researchers told Nature how it has changed their approach.”
✭alejandro cartagena on X: "The impact of AI: framing the future For the past 180 years, photography has been our window into the world, leaving an indelible mark on every aspect of our existence. “It has transformed medicine, sports, and scientific exploration – allowing us to see microscopic details of bodies for the first time, freeze movement, and analyze the aerodynamics of animal and human movement. It has given us a window into outer space, capturing the moon and galaxies far, far away. ~ Moreover, it has revolutionized storytelling, marketing, news, surveillance, ideas about identity, and art. Photography reigns supreme in our collective consciousness – it defines the era we live in – the Photographic Era. ~ Photography's influence has been profound, but now AI is challenging this, or more so, offering a second way. AI is being trained on centuries of images created for art and illustration. Many of those images are photographs or digitized versions of other art forms. Once the balance of ‘real or human-made’ images is lesser than that of AI-generated images, something might change forever. ~ How we see or image our world might become based on a set of parameters and algorithmic understandings of past representations – and less on what we actually observe in front of us. The Photographic Era will be over, and we could possibly see the rise of standardized and generic visual representations of our world based on algorithms and visual learning models. ~ I don´t know if this is good or bad, but we created this. Humans created this. Putting cameras in cell phones created this. The ease of representation led to this. What did we think was going to happen?”
✭How Nations Are Losing a Global Race to Tackle A.I.’s Harms - The New York Times “The talks, in the end, produced a deal to keep talking.”
✭BigTech’s Efforts to Derail the EU AI Act – Verfassungsblog “BigTech companies from the US, such as Meta and Alphabet, and from Europe, such as Aleph Alpha and Mistral – companies that hold the necessary infrastructure – are so determined to water down the current paragraph on foundation models in the AI Act. It is precisely the transferability and adaptability of these models to perform tasks in all social domains – ranging from healthcare (visual cancer detection), education (large-language-model writing tools) to social scoring (risk assessment for credit or social aid) – that makes their strict regulation so pivotal. Without thorough assessment, potential security gaps, performance issues or discriminatory biases can be disseminated and scaled widely into other AI applications. The German, French and Italian proposal shifts all compliance and liability costs from the shoulders of the biggest AI companies to the thousands of downstream users, public agencies and small-and-medium enterprises that adapt and deploy them.”
✭(3) Working with AI: Two paths to prompting - by Ethan Mollick “Lots of folks have read our paper showing that using AI boosted the quality of the work done by consultants at the top-tier Boston Consulting Company by 40%, but there is a key factor in that paper that most people are missing. The consultants were not given some special version of AI, trained on proprietary data and with a customized interface. Nope, they were just given GPT-4 with minimal training and examples. The plain old GPT-4 from back in April, before all of its new capabilities were added. The same GPT-4 that everyone in 169 countries can access for free, via Microsoft Bing in creative mode. And while some of the consultants received a small amount of training (which didn’t help much), most of them just started using the AI without any instructions. ~ And they still saw massive performance increases. ~ The lesson is that just using AI will teach you how to use AI. You can become a world expert in the application of AI to your domain by just using AI a lot until you figure out what it is good and bad at. This is one of two reasons that I dislike the emphasis on prompting that pervades much of the discussions of AI: it makes using AI systems seem much harder and more mysterious than it is. Just use it and see where that takes you.”
🛠️ Tech: Graph Networks for Materials Exploration (GNoME), A-Lab (autonomous laboratory), Gemini (Google DeepMind), MMMU, Pika (Forbes), Looking Glass Go: Holograms X Radiance Fields, SplaTAM: Map 3D Gaussians for Dense RGB-D SLAM, Loudly's Vision For Democratizing Creation, Imagine (Meta AI), AlphaCode2
✭ Millions of new materials discovered with deep learning - Google DeepMind “AI tool GNoME finds 2.2 million new crystals, including 380,000 stable materials that could power future technologies. ~ Modern technologies from computer chips and batteries to solar panels rely on inorganic crystals. To enable new technologies, crystals must be stable otherwise they can decompose, and behind each new, stable crystal can be months of painstaking experimentation. Today, in a paper published in Nature, we share the discovery of 2.2 million new crystals – equivalent to nearly 800 years’ worth of knowledge. We introduce Graph Networks for Materials Exploration (GNoME), our new deep learning tool that dramatically increases the speed and efficiency of discovery by predicting the stability of new materials. With GNoME, we’ve multiplied the number of technologically viable materials known to humanity. Of its 2.2 million predictions, 380,000 are the most stable, making them promising candidates for experimental synthesis. Among these candidates are materials that have the potential to develop future transformative technologies ranging from superconductors, powering supercomputers, and next-generation batteries to boost the efficiency of electric vehicles. … 52,000 new layered compounds similar to graphene that have the potential to revolutionize electronics with the development of superconductors. Previously, about 1,000 such materials had been identified. We also found 528 potential lithium ion conductors, 25 times more than a previous study, which could be used to improve the performance of rechargeable batteries.”
✭ An autonomous laboratory for the accelerated synthesis of novel materials | Nature “To close the gap between the rates of computational screening and experimental realization of novel materials1,2, we introduce the A-Lab, an autonomous laboratory for the solid-state synthesis of inorganic powders. This platform uses computations, historical data from the literature, machine learning (ML) and active learning to plan and interpret the outcomes of experiments performed using robotics. Over 17 days of continuous operation, the A-Lab realized 41 novel compounds from a set of 58 targets including a variety of oxides and phosphates that were identified using large-scale ab initio phase-stability data from the Materials Project and Google DeepMind. Synthesis recipes were proposed by natural-language models trained on the literature and optimized using an active-learning approach grounded in thermodynamics. Analysis of the failed syntheses provides direct and actionable suggestions to improve current techniques for materials screening and synthesis design. The high success rate demonstrates the effectiveness of artificial-intelligence-driven platforms for autonomous materials discovery and motivates further integration of computations, historical knowledge and robotics.”
✭Gemini - Google DeepMind “Gemini is the first model to outperform human experts on MMLU (Massive Multitask Language Understanding), one of the most popular methods to test the knowledge and problem solving abilities of AI models.” ✭ Gemini: A Family of Highly Capable Multimodal Models | Report: Gemini Team, Google “This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks — notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of Gemini models in cross-modal reasoning and language understanding will enable a wide variety of use cases and we discuss our approach toward deploying them responsibly to users.” It is trained on : ✭ [2304.01433] TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings “a twisted 3D torus topology … For similar sized systems, it is ~4.3x-4.5x faster than the Graphcore IPU Bow and is 1.2x-1.7x faster and uses 1.3x-1.9x less power than the Nvidia A100. TPU v4s inside the energy-optimized warehouse scale computers of Google Cloud use ~3x less energy and produce ~20x less CO2e than contemporary DSAs in a typical on-premise data center.” https://arxiv.org/abs/2304.01433
✭MMMU “🔥[2023-12-04]: Our evaluation server for the test set is now available on EvalAI. We welcome all submissions and look forward to your participation! 😆~ Introduction ~ We introduce MMMU: a new benchmark designed to evaluate multimodal models on massive multi-discipline tasks demanding college-level subject knowledge and deliberate reasoning. MMMU includes 11.5K meticulously collected multimodal questions from college exams, quizzes, and textbooks, covering six core disciplines: Art & Design, Business, Science, Health & Medicine, Humanities & Social Science, and Tech & Engineering. These questions span 30 subjects and 183 subfields, comprising 30 highly heterogeneous image types, such as charts, diagrams, maps, tables, music sheets, and chemical structures. Unlike existing benchmarks, MMMU focuses on advanced perception and reasoning with domain-specific knowledge, challenging models to perform tasks akin to those faced by experts. Our evaluation of 14 open-source LMMs and the proprietary GPT-4V(ision) highlights the substantial challenges posed by MMMU. Even the advanced GPT-4V only achieves a 56% accuracy, indicating significant room for improvement. We believe MMMU will stimulate the community to build next-generation multimodal foundation models towards expert artificial general intelligence.” ✭ Yu Su @NeurIPS on X: "Hi @emilymbender, I'm one of the lead authors of MMMU. I can certify that 1) Google didn't fund this work, and 2) Google didn't have early access.
✭Pika Launches AI Video Editing App And Announces $55 Million In Funding | Forbes “the company is hard at work tinkering with algorithms to further improve the model, and also developing ones for filtering copyrighted materials that have tripped up rivals and dragged them into costly IP litigation. “Right now that’s still very exploratory,” Guo said. ~ With new funding in hand, Guo says she plans to expand Pika’s team to about 20 people next year, most of them engineers and researchers. Monetizing the product, which is currently free, isn’t a key priority yet, though she says the company may eventually introduce a tiered subscription model (pay more for access to more features) for consumers.”
2023/12/05 ✭ Looking Glass Go: Holograms X Radiance Fields | Radiance Fields “In an era where technology constantly reshapes our perception of reality, a groundbreaking innovation is set to transform the way we interact with digital content. On December 5th, 2023, Looking Glass is unveiling its latest creation – Looking Glass Go, a portable holographic display (USB-C) that promises to bring the long-awaited dream of accessible holograms to life. ~ Looking Glass Go is the culmination of five years of rigorous prototyping and innovation. Designed to be portable and user-friendly, it allows users to experience holograms without the need for bulky VR or AR headsets. At a launch price of $199 (for 48 hours only), significantly discounted from its $300 list price, the Go is set to make waves in the tech world. The Go will be offered in two styles— white (standard) and clear (+$49).”
2023/12/05 ✭ Nikhil Varma Keetha on X: "SplaTAM: Splat, Track : Map 3D Gaussians for Dense RGB-D SLAM https://t.co/NRxmwdK39s We extend Gaussian Splatting to solve SLAM, i.e., automatically calculate the camera poses when fitting the Gaussian scene from RGB-D http://spla-tam.github.io “SplaTAM enables precise camera tracking and high-fidelity reconstruction in challenging real-world scenarios. ~ Dense simultaneous localization and mapping (SLAM) is pivotal for embodied scene understanding. Recent work has shown that 3D Gaussians enable high-quality reconstruction and real-time rendering of scenes using multiple posed cameras. In this light, we show for the first time that representing a scene by a 3D Gaussian Splatting radiance field can enable dense SLAM using a single unposed monocular RGB-D camera. Our method, SplaTAM, addresses the limitations of prior radiance field-based representations, including fast rendering and optimization, the ability to determine if areas have been previously mapped, and structured map expansion by adding more Gaussians. In particular, we employ an online tracking and mapping pipeline while tailoring it to specifically use an underlying Gaussian representation and silhouette-guided optimization via differentiable rendering. Extensive experiments on simulated and real-world data show that SplaTAM achieves up to 2 X state-of-the-art performance in camera pose estimation, map construction, and novel-view synthesis, demonstrating its superiority over existing approaches.
✭Generative AI Is Revolutionizing Music: Loudly's Vision For Democratizing Creation | Bernard Marr “Loudly is an AI music creation platform trained on 10 million songs, built from a bank of 200,000 sounds. Users can generate their own royalty-free music for their projects by using simple natural-language prompts. For example, ask it to create a soundtrack for your product launch video, photographic slideshow or just family videos, and that’s what you’ll get. You can choose the style of music, the tempo, and the mood, or select individual instruments. ~ All of the sounds are based on human-generated recordings rather than being synthesized. Any of the songs can be customized to fit individual projects or new songs can be created from scratch. ~ The concept is “music as code”. This means that rather than existing as a linear soundwave, it can be interacted with at a granular level to create potentially billions of different sounds. ~ Importantly, Loudly itself owns the copyright to all of the music that the system has been trained on, meaning there’s no danger that musicians will feel they have been ripped off by having their existing work fed into the AI and then used to generate derivative works.”
✭Imagine with Meta AI “Describe an image for Meta AI to generate.” ~ ✭ Meta’s new AI image generator was trained on 1.1 billion Instagram and Facebook photos | Ars Technica "Imagine with Meta AI" turns prompts into images, trained using public Facebook data. ~ On Wednesday, Meta released a free standalone AI image-generator website, "Imagine with Meta AI," based on its Emu image-synthesis model. Meta used 1.1 billion publicly visible Facebook and Instagram images to train the AI model, which can render a novel image from a written prompt. Previously, Meta's version of this technology—using the same data—was only available in messaging and social networking apps such as Instagram. ~ If you're on Facebook or Instagram, it's quite possible a picture of you (or that you took) helped train Emu. In a way, the old saying, "If you're not paying for it, you are the product" has taken on a whole new meaning. Although, as of 2016, Instagram users uploaded over 95 million photos a day, so the dataset Meta used to train its AI model was a small subset of its overall photo library.”
2023/12/06 ✭AlphaCode2 Tech Report pdf “AlphaCode (Li et al., 2022) was the first AI system to perform at the level of the median competitor in competitive programming, a difficult reasoning task involving advanced maths, logic and computer science. This paper introduces AlphaCode 2, a new and enhanced system with massively improved performance, powered by Gemini (Gemini Team, Google, 2023). AlphaCode 2 relies on the combination of powerful language models and a bespoke search and reranking mechanism. When evaluated on the same platform as the original AlphaCode, we found that AlphaCode 2 solved 1.7× more problems, and performed better than 85% of competition participants.”
🔎 Research: Functional ultrasound (fUS) BMI, Low-Resource Languages Jailbreak GPT-4, Exploiting Criticality in Spiking Neural Networks
✭ Decoding motor plans using a closed-loop ultrasonic brain–machine interface | Nature Neuroscience “Brain–machine interfaces (BMIs) enable people living with chronic paralysis to control computers, robots and more with nothing but thought. Existing BMIs have trade-offs across invasiveness, performance, spatial coverage and spatiotemporal resolution. Functional ultrasound (fUS) neuroimaging is an emerging technology that balances these attributes and may complement existing BMI recording technologies. In this study, we use fUS to demonstrate a successful implementation of a closed-loop ultrasonic BMI. We streamed fUS data from the posterior parietal cortex of two rhesus macaque monkeys while they performed eye and hand movements. After training, the monkeys controlled up to eight movement directions using the BMI. We also developed a method for pretraining the BMI using data from previous sessions. This enabled immediate control on subsequent days, even those that occurred months apart, without requiring extensive recalibration. These findings establish the feasibility of ultrasonic BMIs, paving the way for a new class of less-invasive (epidural) interfaces that generalize across extended time periods and promise to restore function to people with neurological impairments.”
✭ [2310.02446] Low-Resource Languages Jailbreak GPT-4 “AI safety training and red-teaming of large language models (LLMs) are measures to mitigate the generation of unsafe content. Our work exposes the inherent cross-lingual vulnerability of these safety mechanisms, resulting from the linguistic inequality of safety training data, by successfully circumventing GPT-4's safeguard through translating unsafe English inputs into low-resource languages. On the AdvBenchmark, GPT-4 engages with the unsafe translated inputs and provides actionable items that can get the users towards their harmful goals 79% of the time, which is on par with or even surpassing state-of-the-art jailbreaking attacks. Other high-/mid-resource languages have significantly lower attack success rate, which suggests that the cross-lingual vulnerability mainly applies to low-resource languages. Previously, limited training on low-resource languages primarily affects speakers of those languages, causing technological disparities. However, our work highlights a crucial shift: this deficiency now poses a risk to all LLMs users. Publicly available translation APIs enable anyone to exploit LLMs' safety vulnerabilities. Therefore, our work calls for a more holistic red-teaming efforts to develop robust multilingual safeguards with wide language coverage.”
2023/12/06 ✭Relightable Gaussian Codec Avatars “In this work, we present Relightable Gaussian Codec Avatars, a method to build high-fidelity relightable head avatars that can be animated to generate novel expressions. ~ Our geometry model based on 3D Gaussians can capture 3D-consistent sub-millimeter details such as hair strands and pores on dynamic face sequences. To support diverse materials of human heads such as the eyes, skin, and hair in a unified manner, we present a novel relightable appearance model based on learnable radiance transfer. Together with global illumination-aware spherical harmonics for the diffuse components, we achieve real-time relighting with all-frequency reflections using spherical Gaussians. This appearance model can be efficiently relit in real-time under both point light and continuous illumination. We further improve the fidelity of eye reflections and enable explicit gaze control by introducing relightable explicit eye models. ~ Our method outperforms existing approaches without compromising real-time performance. We also demonstrate real-time relighting of avatars on a tethered consumer VR headset, showcasing the efficiency and fidelity of our avatars.” ✭ Linus (●ᴗ●) on X: "Relightable Real-Time Avatars Meta Codec Avatars 2.0 gets an update, building on 3D Gaussian Splatting from Meta. Accuracy is down to the human hair strand level 🔬 🧵 A thread https://t.co/F093A1vcp7"
✭ Brain-Inspired Efficient Pruning: Exploiting Criticality in Spiking Neural Networks “Spiking Neural Networks (SNNs) have been an attractive option for deployment on devices with limited computing resources and lower power consumption because of the event-driven computing characteristic. As such devices have limited computing and storage resources, pruning for SNNs has been widely focused recently. However, the binary and non-differentiable property of spike signals make pruning deep SNNs challenging, so existing methods require high time overhead to make pruning decisions. In this paper, inspired by critical brain hypothesis in neuroscience, we design a regeneration mechanism based on criticality to efficiently obtain the critical pruned networks. Firstly, we propose a low-cost metric for the criticality of pruning structures. Then we re-rank the pruned structures after pruning and regenerate those with higher criticality. We evaluate our method using VGG-16 and ResNet-19 for both unstructured pruning and structured pruning. Our method achieves higher performance compared to current state-of-the-art (SOTA) method with the same time overhead. We also achieve comparable performances (even better on VGG-16) compared to the SOTA method with 11.3x and 15.5x acceleration. Moreover, we investigate underlying mechanism of our method and find that it efficiently selects potential structures, learns the consistent feature representations and reduces the overfitting during the recovery phase.”
🦾AI Art-Research: Are you Ready (Frans Jacobi), The Wizard of AI (Alan Warburton)
✭ The Current Situation | fransjacobi “Works on paper by Frans Jacobi, A4 (watercolor, pencil, AI-realism, etc), 2022 -> ongoing. The Current Situation is a series of ‘drawings’ begun in October 2022 as an ongoing project. The drawings are made in in-between-moments, like a kind of diary; small spontaneous reflections on the mundane daily happenings, news, subtracts from reading, quotes from other artists, political comments and whatever comes to mind.” ✭ Are You Ready? “a climate futuristic fable in 7 episodes”
✭The Wizard of AI on Vimeo | Alan Warburton for Data as Culture at the ODI “Alan Warburton was commissioned by the ODI's Data as Culture programme to bring us 'The Wizard of AI,' a 20-minute video essay about the cultural impacts of generative AI. It was produced over three weeks at the end of October 2023, one year after the release of the infamous Midjourney v4, which the artist treats as "gamechanger" for visual cultures and creative economies. According to the artist, the video itself is "99% AI" and was produced using generative AI tools like Midjourney, Stable Diffusion, Runway and Pika. Yet the artist is careful to temper the hype of these new tools, or as he says, to give in to the "wonder-panic" brought about by generative AI.”
⚔️War (wAIr): Anduril Lattice (“accelerates complex kill chains”)
✭Anduril - Command & Control “Lattice accelerates complex kill chains by orchestrating machine-to-machine tasks at scales and speeds beyond human capacity. ~ Lattice uses technologies like sensor fusion, computer vision, edge computing, and machine learning and artificial intelligence to detect, track, and classify every object of interest in an operator's vicinity.” ✭ Anduril - Anduril’s Lattice: a trusted dual use — commercial and military — platform for public safety, security, and defense “Anduril’s Lattice is an open software platform capable of being used for a variety of missions and industries — including public safety, security, and defense. Designed to be sensor, network, and system agnostic, Lattice takes data from disparate and distributed sensors, feeds, and systems and moves this data into a single integration layer. In this integration layer, AI, machine learning, and sensor/data processing techniques are used to filter high-value information to users.” ✭ (354) Can Palmer Luckey Reinvent the U.S. Defense Industry? | WSJ - YouTube “Military tech startup Anduril Industries is shaking up the U.S. defense industry as it is one of the few privately held technology companies finding success as a Defense Department contractor. But what makes the company’s software so unique that it is being used across multiple branches of the U.S. military and in both the Russia-Ukraine War and Israel-Hamas War.”
🔭Watching: Fireship, Martin Haerlin
✭Google's Gemini just made GPT-4 look like a baby’s toy - Fireship on YouTube “Take a first look at Gemini - Google's latest AI model and GPT-4 competitor. Learn how the multimodal Gemini can handle generative AI tasks, solve programming challenges with Alpha Code 2, and how it was trained using TPU superpods.”
✭The truth about the OpenAI drama | Fireship - YouTube “What is going on at Open AI and why was Sam Altman fired? Let's look at the timeline of events and possible theories about the leadership change.”