Multi-Model AI Upgrades Humanoid Robots

Robots are getting smarter, quickly

Cutting-Edge AI Technology and Multimodal Models Transform Robotics, Enabling Advanced Problem-Solving and Interaction

Recent advancements in artificial intelligence have significantly enhanced the capabilities of robots, allowing them to understand and respond to their environments more effectively. Key developments include the use of large language models (LLMs) and vision-language-action models (VLAMs), which enable robots to process multimodal data and perform tasks with a high degree of autonomy and common sense. Companies like Figure, Covariant, and Wayve are leading this transformation, applying AI to various robotic applications, from warehouse automation to autonomous vehicles, showcasing a shift from AI in software to embodied AI in physical machines.

  • Extracting Concepts from GPT-4 - Researchers have developed scalable methods to interpret GPT-4's neural activity, discovering 16 million often interpretable features. These methods, using sparse autoencoders, identify patterns that correspond to human-understandable concepts by training on multimodal data. This progress aims to enhance AI interpretability, enabling better monitoring and steering of AI behaviors. While challenges remain in fully understanding neural networks, the shared paper and tools are intended to foster further exploration and improve AI model safety and robustness.

  • Apple Made Once-Unlikely Deal With Sam Altman to Catch Up in AI - Apple has partnered with OpenAI, led by Sam Altman, to integrate ChatGPT into the iPhone’s operating system, a significant move to enhance its AI capabilities. This partnership will be a highlight at Apple’s Worldwide Developers Conference, illustrating the company's efforts to remain competitive in AI technology. The agreement allows Apple to leverage OpenAI's advanced chatbot technology while it continues to develop its own AI solutions, including enhancements to Siri. This collaboration underscores a shift in Silicon Valley’s dynamics, with Apple seeking external expertise to bridge its AI development gap.

  • Amazon’s Project PI AI looks for product defects before they ship - Amazon's Project PI employs generative AI and computer vision within a scanning tunnel to inspect outgoing products for any damage, incorrect color, or size discrepancies. This system identifies defected items, isolates them, and correlates issues to pinpoint the root cause, facilitating proactive quality control in several North American warehouses. Amazon integrates human oversight for final assessments, offering flagged items for resale at a discount or for donation. Additionally, Amazon is developing AI that analyzes customer feedback and visual data to diagnose sources of dissatisfaction, potentially aiding sellers in preventing mislabeling and optimizing the customer experience. The initiative also aims to make returns less cumbersome and reduce carbon emissions.

  • Sam Altman's Secretive Investment Portfolio Worth at Least $2.8 Billion, Report Says - Sam Altman, CEO of OpenAI, holds a substantial personal investment portfolio valued at over $2.8 billion, according to a Wall Street Journal report. His investments span numerous startups, including early stakes in Reddit, Airbnb, and Stripe. Concerns about potential conflicts of interest have arisen, especially since some companies in his portfolio have business deals with OpenAI, such as Reddit's recent licensing agreement. Despite these concerns, OpenAI's board chairman, Bret Taylor, asserts that Altman has been transparent about his investments and that any potential conflicts are managed by an independent audit committee.

  • What I learned from the UN’s “AI for Good” summit - The conference showcased global AI efforts including a focus on inclusivity, with speakers like Pelonomi Moiloa highlighting AI development for African languages. Despite the rhetoric on AI for good, there was skepticism about AI's role in advancing UN goals. Speakers noted AI’s potential to exacerbate environmental damage and suggested it does not sufficiently address gender biases. Critics called for accountability and sustainable AI technology, pointing out concerns over its energy consumption and exploitative labor practices. OpenAI CEO Sam Altman's talk lacked depth on safety measures, following critiques about the company's governance. Discussions around AI’s impact on productivity were optimistic, but some remain unconvinced of its overall benefit.

  • Elon Musk diverting Tesla GPUs to his other companies - Tesla CEO Elon Musk has reportedly redirected a significant order of Nvidia's H100 GPU clusters, intended for Tesla's AI development, to his social media company X. This shift indicates Tesla's evolving focus from strictly automaking to AI and robotics. While Tesla’s Dojo supercomputer is in development to propel its Full Self-Driving (FSD) software, reliance on Nvidia's GPUs remains critical. Musk's reallocation of resources has raised concerns about his commitment to Tesla's interests, as he has been previously scrutinized for purportedly leveraging Tesla assets for other ventures.

  •  A Right to Warn about Advanced Artificial Intelligence - A group of current and former employees from frontier AI companies are advocating for the mitigation of serious risks associated with AI technology, such as exacerbating inequalities or causing human extinction. They recognize the tension between corporate interests and effective oversight and highlight the industry's deficiencies in providing transparent information on AI's capabilities and risks. These advocates call for greater accountability, urging AI companies to commit to principles that enable open criticism, anonymous risk reporting, and protection for whistleblowers. They push for cultural change within AI companies to allow public discussion of potential AI hazards without fear of negative repercussions. Notably, prominent AI experts—including Yoshua Bengio, Geoffrey Hinton, and Stuart Russell—endorse their standpoint.

  • OpenAI Employees Want Protections to Speak Out on ‘Serious Risks’ of AI - Current and former employees of OpenAI and Google DeepMind are advocating for protections against retaliation when voicing concerns about the risks associated with AI technologies. A public letter signed by 13 employees highlights that broad confidentiality agreements prevent them from disclosing these risks to the public, calling for measures such as the elimination of non-disparagement agreements for risk-related issues and the establishment of anonymous reporting processes. This comes amid recent controversies at OpenAI, including the dissolution of a key safety team and allegations that employees were pressured to sign restrictive agreements. OpenAI has acknowledged the need for debate and stated its commitment to safety and engagement with stakeholders.

  • Computex 2024 Day Two Wrap-Up: Intel Xeon 6, CAMM2 memory, and wild cases from InWin and Phanteks - Day two of Computex 2024 brought several notable announcements. Intel revealed its Xeon 6 processors, targeting AMD with two families: "Sierra Forest" with up to 144 efficiency-optimized E-Cores and "Granite Rapids" with up to 86 high-performance P-cores. In power supplies, Seasonic and Noctua presented the Prime TX-1600 with a special fan, while Super Flower launched a hefty 2800W model. The CAMM2 memory standard gains momentum with Kingston and MSI's support, promising space-saving benefits.

  • Cisco Launches $1B Global AI Investment Fund - Cisco has announced the launch of a $1 billion global investment fund aimed at fostering secure and reliable AI solutions. This initiative includes strategic investments in startups like Cohere, Mistral AI, and Scale AI to enhance Cisco's AI innovation strategy and support the broader AI ecosystem. Nearly $200 million of the fund has already been committed to these ventures. The fund aligns with Cisco’s long-standing strategy to drive innovation through investments, acquisitions, and strategic partnerships, ensuring their leadership in the evolving AI landscape and supporting the development of AI technologies across various industries.

  • Ex-OpenAI Researcher Leopold Aschenbrenner Starts AGI-focused Investment Firm - Leopold Aschenbrenner, a former super-alignment researcher at OpenAI who was dismissed for allegedly leaking information, has founded an investment firm dedicated to backing startups in the artificial general intelligence (AGI) sector. The firm is supported by notable figures including former GitHub CEO Nat Friedman, investor Daniel Gross, and Stripe’s CEO and president, Patrick and John Collison. Aschenbrenner describes the firm as a hybrid between a hedge fund and a think tank, aiming to make significant financial bets on AGI and superintelligence development within the decade. In a podcast, he defended his actions at OpenAI, explaining that he shared redacted documents with external researchers for feedback on AGI preparedness and safety.

  • ‘Most exciting moment’ since birth of WiFi: chipmakers hail arrival of AI PCs - At the annual Computex conference in Taiwan, top chipmakers like Intel, AMD, Qualcomm, and Nvidia showcased their advancements in AI-enabled PCs, signaling a major evolution in the PC market. These AI PCs, embedded with specialized silicon, aim to revolutionize user interactions by enabling on-device AI applications, potentially reducing reliance on cloud services. Microsoft has spurred this development by unveiling AI-enabled PCs equipped with its Copilot assistant. Despite the excitement, analysts remain cautious about whether consumer demand will justify the higher costs of these advanced devices.

  • AI’s Insatiable Data-Center Demand Makes Crypto Miners Targets - The AI boom has created a significant shortage in data-center space and GPU chips, leading AI companies to seek capacity from crypto miners, who possess the necessary infrastructure. Core Scientific's partnership with AI startup CoreWeave and a subsequent takeover bid highlight the growing value of these assets. Crypto mining firms, such as Northern Data, are transitioning their facilities and investing in high-demand Nvidia chips to cater to AI needs, transforming a previously niche industry into a pivotal player in the tech landscape.

  • A new AI tool to help monitor coral reef health - Coral reefs, which are home to 25% of marine species but cover only 0.1% of the ocean's floor, are threatened by activities like overfishing and climate change. The "Calling in our Corals" project, a collaboration between Google Arts & Culture and research organizations, harnesses public involvement to compile a bioacoustic data library to understand reef health. The innovative AI tool, SurfPerch, developed with Google Research and DeepMind, automates the analysis of vast quantities of underwater audio, aiding in monitoring reef health, biodiversity, and restoration efforts. Citizens contributed to this project by listening to over 400 hours of reef audio, assisting in the AI tool's refinement, which now can identify reef sounds effectively, offering a leap forward in reef conservation techniques.

Awesome Research Papers

  • MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark - The paper discusses MMLU-Pro, an improvement on the Massive Multitask Language Understanding (MMLU) benchmark, devised to better evaluate cutting-edge AI language models. MMLU-Pro raises the bar by introducing more complex reasoning questions and increasing the choice set to ten options from four, effectively reducing model performance by 16-33% relative to the original MMLU. The new benchmark also shows more consistent results across 24 different prompt styles, with reduced variation in model scores, and encourages the use of Chain of Thought reasoning, which has proven more effective than direct answering on this more challenging dataset. This establishes MMLARC-Pro as a more stringent and discriminating tool for tracking AI progress in language comprehension and reasoning.

  • To Believe or Not to Believe Your LLM - The paper explores the challenge of measuring uncertainty in large language models (LLMs). It introduces a novel information-theoretic metric designed to assess when there's significant epistemic uncertainty, indicating unreliable outputs from LLMs. This metric is particularly useful for detecting "hallucinations" or instances of high epistemic uncertainty, which traditional methods fail to identify in cases with multiple possible answers. Experiments show the benefit of this new approach and provide insights into how iterative prompting can enhance the probabilities an LLM assigns to specific outputs, presenting an area of potential independent interest.

  • Future You: A Conversation with an AI-Generated Future Self Reduces Anxiety, Negative Emotions, and Increases Future Self-Continuity - The "Future You" intervention is an interactive digital chat platform designed to enhance an individual's connection with their future self, a link known to benefit mental health and wellbeing. The system, aimed at users aged 18-30, creates a virtual future self based on the users' personal goals and characteristics, complete with an age-progressed image and a backstory. A study found that interacting with this AI-powered future persona reduced anxiety and increased future self-continuity, marking the first successful demonstration of personalized AI characters used for improving wellbeing.

Introducing Aurora: The first large-scale AI foundation model of the atmosphere - Storm Ciarán revealed the limitations of current weather-prediction models, prompting researchers to enhance forecasting techniques. Microsoft's Aurora—a state-of-the-art AI model—offers a solution. With 1.3 billion parameters, the Aurora system applies a unique architecture and training approach to optimize predictions from vast and varying atmospheric data. It has shown it can outperform the current leading numerical forecasting system by 5,000 times in terms of computational speed. Additionally, Aurora demonstrates proficiency in a wide spectrum of forecasting tasks with high accuracy, even in areas with minimal data. Its ability to accurately predict extreme weather scenarios and air pollution levels indicates its potential to revolutionize weather forecasting and climate analysis.

Awesome New Launches

Introducing Stable Audio Open - An Open Source Model for Audio Samples and Sound Design - Stable Audio Open is an open-source model designed to generate high-quality audio samples up to 47 seconds long through text prompts. Aimed at enhancing the toolkit of sound designers and musicians, it specializes in producing various audio elements like drum beats, instrument riffs, and ambient sounds. Users can also personalize the output by fine-tuning the model with their own audio data, enabling unique sound creation tailored to their projects. This release signifies an advancement in the accessibility of generative audio technology to the creative community.

Kling Video - Kuaishou Big Model Team announced the Kling video generator, allowing users to generate videos up to 2 minutes long.

Announcing Mozilla Builders - The 2024 Accelerator Theme "Local AI" emphasizes artificial intelligence that operates on personal devices, promoting privacy, user control, and open-source innovation. The initiative aims to decentralize AI, making it accessible to individuals and small communities, challenging the cloud-centric model dominated by large companies. The Mozilla Builders Accelerator facilitates exploration into these areas, seeking contributors to help create a decentralized, user-centric AI future. Applications are open for developers, researchers, and technologists interested in contributing to this vision.

Asana Unveils AI Teammates to Tackle Complex Workflows and Elevate Teamwork - Asana has introduced AI teammates, adaptable AI collaborators designed to enhance team productivity by advising on priorities, managing workflows, and executing tasks. Built on Asana’s Work Graph® data model, these AI teammates offer customization and integration into various workflows while maintaining human oversight for transparency and control. Currently in beta, the AI teammates help streamline processes such as creative production and marketing campaign management, demonstrating significant productivity gains for users. This launch aligns with Asana's broader strategy to merge human and AI efforts to drive innovation and efficiency.

Plot AI Video Social Listening - Plot AI Video Social Listening leverages AI to analyze video content for insightful brand and product mentions, trend detection, and audience engagement. Key features include voice mention tracking without the need for manual tagging, AI object detection to observe product usage, AI-driven video summaries with reasoning and highlights, and in-depth analysis for understanding brand sentiment. Additionally, it enables prompt AI-generated responses for more efficient community interaction.

Google for Startups AI Academy: Supporting US Infrastructure - Google for Startups - The Google for Startups AI Academy: American Infrastructure is a three-month virtual initiative with one in-person event aimed at startups using AI to enhance US public infrastructure. Google's PAIR team delivers a tailored curriculum, focusing on AI applications and go-to-market strategies, supplemented by advanced sales training. Participants benefit from Google expert mentorship, cutting-edge AI tools, and strategic networking opportunities. The program emphasizes responsible scalability and market penetration for startups aiming to revolutionize America's foundational systems through AI integration.

Text to Image AI Model & Provider Leaderboard - Artificial Analysis conducted a comparative review of various text-to-image generation models and API providers, evaluating them based on quality (using ELO scores from their Image Arena platform), generation time (measured in seconds), and cost efficiency (USD per 1000 image generations). The lineup includes models like Playground v2.5, several iterations of Stable Diffusion, Amazon Titan G1, DALL-E variants, and Midjourney v6. Midjourney lacks a native API and is assessed through ImagineAPI, which interfaces with the Midjourney Discord. Quality assessments are crowdsourced with over 40,000 user responses. The methodology page details their pricing calculation and percentile-based generation time metrics, along with a note on incorporating image download times.

Check Out My Other Videos:

Join the conversation

or to participate.