Claude 3.5 Is The New KING + Ex-OpenAI Founder New Company

There's a new LLM king in town

AI Race Heats Up: Anthropic's Claude 3.5 Sonnet Claims Top Spot

Ilya Sutskever, co-founder of OpenAI, has launched Safe Superintelligence Inc., a new research lab focused solely on developing a safe and powerful artificial intelligence system. This venture aims to avoid commercial distractions, emphasizing AI safety through engineering breakthroughs rather than reactive measures. Sutskever, alongside co-founders Daniel Gross and Daniel Levy, intends to create a general-purpose AI surpassing current capabilities. The lab will operate from Palo Alto and Tel Aviv, leveraging Sutskever's extensive expertise and reputation in the AI community.

  • Nvidia’s Jensen Huang Is on Top of the World. So Why Is He Worried? - Despite Nvidia's success, CEO Jensen Huang is concerned about potential slowdowns in AI chip demand and data center capacity. To mitigate risks, Nvidia is diversifying into software and cloud services, including the DGX Cloud, which competes with major customers like AWS and Microsoft. Huang's strategic maneuvers include designing specific server racks and influencing data center setups to ensure optimal performance and customer retention. Nvidia aims to avoid the pitfalls of past tech giants by maintaining innovation and leveraging software to complement its hardware dominance.

  • Perplexity Is a Bullshit Machine - A Wired investigation has uncovered dubious practices by AI search startup Perplexity, revealing it scrapes content from websites against protocols and creates inaccurate summaries. Despite backing from high-profile investors and claims of being an "answer engine," Perplexity has been found to ignore the Robots Exclusion Protocol to illicitly gather data from restricted areas of websites, including Wired.com. Further, it occasionally "hallucinates" information, fabricating facts not present in the original content. Perplexity's CEO, Aravind Srinivas, has denied any unethical behavior, stating their engine doesn't train on others' content. However, analyses raise concerns about how Perplexity accesses and generates summaries, suggesting reliance on web scraping rather than legitimate direct access to articles. Despite the controversy, the startup aims to create revenue-sharing deals with publishers. The reporting has prompted discussions on the legality and ethics of such AI-driven content aggregation practices.

  • Meta forms new Wearables group and lays off some employees - Meta's CTO, Andrew Bosworth, has announced a significant reorganization of their hardware division, Reality Labs. The division is consolidating into two main groups: one focused on the Metaverse, incorporating the Quest headset series, and a new "Wearables" group, responsible for other hardware projects including the Ray-Ban smart glasses collaboration. This structural change has resulted in layoffs within Reality Labs, reportedly affecting a modest number of positions, particularly where leadership roles have been duplicated due to the restructuring. The exact number of layoffs is not disclosed, with Meta declining to comment on specifics.

  • China’s Top AI Startups Enter U.S., Defying Political Tensions - Amid heightened scrutiny, Chinese AI startups like Moonshot AI and MiniMax are expanding into the U.S. due to intense competition and price wars in their domestic market. Moonshot AI has launched the Ohai role-play chat app and Noisee music video generator in the U.S., while developing an international version of its Kimi chatbot. Despite regulatory challenges and potential scrutiny similar to that faced by TikTok, these startups seek growth opportunities in the American market, driven by the competitive pressures at home.

  • Why does AI hallucinate? - SARAH, a multilingual AI health assistant powered by GPT-3.5, provides around-the-clock health tips but has faced criticism for distributing misinformation, such as fake clinic details. This incident is among other AI mishaps, including Meta's Galactica 'inventing' academic content and an Air Canada bot creating a non-existent refund policy. These errors, termed 'hallucinations,' are a prevalent hurdle to chatbot reliability. Large language models (LLMs), which underlie chatbots, don't retrieve information from a database but generate responses based on immense data patterns. LLMs produce believable text by sequentially predicting words, yet this process can inadvertently fabricate believable but false information. This inherent design to 'dream up' internet-style documents contributes to the challenge of ensuring chatbot accuracy.

  • Nvidia’s AI Development Service Offers Tech From Chinese Startup - Nvidia has incorporated Yi-Large, a large language model developed by Chinese startup 01.AI, into its NIM platform, enhancing its AI offerings for enterprise customers. This marks 01.AI's first venture into markets outside China, aiming to attract U.S. customers amid intense local competition. The move highlights a trend among Chinese AI startups, like Moonshot AI, seeking overseas opportunities due to limited profitability in their domestic market.

  • I Am Laura Kipnis-Bot, and I Will Make Reading Sexy and Tragic Again - Rebind is a groundbreaking digital venture that turns distinguished authors into AI reading companions. The concept is inviting prominent literary figures to become "Rebinders," creating interactive and insightful commentary on classic literature. Readers can engage in dynamic conversations with AI versions of these authors, receiving personalized responses and fostering a unique reading experience. This innovative approach aims to democratize the in-depth learning usually reserved for one-on-one tutorials by integrating it into the reading experience. Despite potential downsides and the unpredictability of AI, Rebind promises a revolutionary way to connect readers with books and authors in an intimate and modern manner.

  • Musk’s xAI Supercomputer Will Get Server Racks from Dell and Super Micro - Elon Musk announced that Dell Technologies and Super Micro Computer will provide server racks for the supercomputer being developed by his AI startup, xAI. Dell is assembling half of these racks, while Super Micro, known for its collaboration with Nvidia and advanced cooling technology, will supply the rest. The supercomputer aims to power the next iteration of xAI's chatbot Grok, with the training requiring a vast number of Nvidia GPUs. Musk plans to have the supercomputer operational by fall 2025.

  • For Apple’s AI Push, China Is a Missing Piece - Apple's efforts to introduce AI services in China face significant challenges due to the unavailability of Western AI models like ChatGPT and strict regulatory approval processes. To remain competitive in its second-largest market, Apple is seeking local partners such as Baidu and Alibaba for AI integration. Despite Apple's robust presence in China, it lags behind local smartphone manufacturers who have already incorporated AI features. The company’s reliance on China for a substantial portion of its revenue underscores the importance of securing local AI partnerships amid rising Chinese patriotism and competitive pressures.

Awesome Research Papers

  • Detecting hallucinations in large language models using semantic entropy - The paper address the tendency of Large Language Models (LLMs) to produce false information or "hallucinations" by focusing on a subset of this issue known as confabulation. In a recently published paper in Nature, their research introduces a method to assess the reliability of LLM outputs by measuring "semantic entropy"—the entropy within the meaning-space of model responses. The team's method involves generating multiple responses, evaluating the semantic consistency between them, and calculating the uncertainty based on shared meanings rather than varied wording. This study offers a significant advance in detecting and mitigating LLM-generated misinformation through semantic analysis but also highlights limitations and the need for context-specific adaptations.

  • From RAGs to rich parameters: Probing how language models utilize external knowledge over parametric information for factual queries - This paper critically analyzes Retrieval Augmented Generation (RAG), which combines external context with language models to enhance their reasoning and response capabilities. Prominent in search, Q&A, and chatbots, RAG's internal mechanics are not well understood. The study reveals a propensity of language models to prefer using context, rather than their own learned parameters, to generate answers. The authors use Causal Mediation Analysis to demonstrate the limited role of parametric memory in executing tasks and Attention Contributions and Knockouts to show a neglect of the question's subject token in favor of context tokens. These tendencies are consistent across LLaMa and Phi model families.

  • Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning - The paper introduces the Multi-Image Relational Benchmark (MIRB), an evaluation framework for visual language models (VLMs) focusing on multi-image understanding across four categories: perception, visual world knowledge, reasoning, and multi-hop reasoning. The study evaluates various models, noting that while open-source VLMs approach the single-image task performance of GPT-4V, a gap exists in multi-image reasoning. The research shows that even advanced models like GPT-4V face challenges with MIRB, suggesting the need for more development in multi-modal AI and positioning MIRB as a potential milestone for future improvements.

Introducing Claude 3.5 Sonnet - Claude 3.5 Sonnet, a new model within the Claude 3.5 series, has been launched, setting benchmarks in AI performance for graduate and undergraduate-level intelligence and coding proficiency. Doubling the operational speed of its predecessor, Claude 3 Opus, this model performs better in nuanced language understanding and has strong coding and visual reasoning capabilities. Available across multiple platforms, it also introduces Artifacts, enhancing interactive user engagement with AI-generated content. Future updates will include expanding capabilities for business integration and user personalization, with further Claude 3.5 releases planned for the year.

Sharing New Research, Models, and Datasets from Meta FAIR - Meta's Fundamental AI Research (FAIR) team has released several new research artifacts, including models for image-to-text and text-to-music generation, a multi-token prediction model, and an audio watermarking technique called AudioSeal. The releases aim to promote innovation, collaboration, and responsible AI use within the research community. The team also introduced the PRISM dataset, highlighting diverse preferences in AI development, and provided tools to measure and improve geographical disparities in text-to-image models. These initiatives reflect Meta's commitment to openness and advancing AI responsibly.

Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks - Florence-2 is introduced as a pioneering vision foundation model aimed at unifying computer vision and vision-language tasks through prompt-based representation. Unlike previous models, it's designed to perform a wide range of tasks directed by simple text prompts, handling varying levels of complexity. Florence-2's efficacy relies heavily on the FLD-5B dataset, which contains 5.4 billion annotations across 126 million images, enhanced via a cyclic process of auto-annotation and model enhancement. By leveraging a sequence-to-sequence structure, Florence-2 is trained for multitasking, exhibiting exceptional zero-shot learning and fine-tuning abilities, as confirmed by extensive evaluations.

Awesome New Launches

Safe Superintelligence Inc. Aims to Build the World’s First Safe Superintelligence - Safe Superintelligence Inc. (SSI) is pioneering the development of safe superintelligence, addressing the most critical technical challenge of our era. With a singular focus on creating a secure superintelligence, SSI aligns its team, investors, and business model towards this goal, emphasizing simultaneous advancements in safety and capabilities. The company operates from Palo Alto and Tel Aviv, attracting top-tier technical talent. By eliminating management distractions and insulating from commercial pressures, SSI aims to rapidly progress while maintaining safety at the forefront. The company invites leading engineers and researchers to join their mission in achieving this groundbreaking milestone.

Former Snap engineer launches Butterflies, a social network where AIs and humans coexist - Vu Tran, a former Snap engineering manager, has launched Butterflies, a social network where humans and AI personas, called Butterflies, interact through posts, comments, and DMs. The app allows users to create AI personas with backstories and emotions, which then autonomously engage on the platform. After a successful beta phase, the app is now available on iOS and Android. Butterflies aims to provide a more creative and substantial AI interaction compared to existing AI chatbots, fostering unique connections between users and AI. The startup has raised $4.8 million in seed funding and may introduce a subscription model in the future.

Eleven Labs Launches Text to Sound Effects API - The Sound effects API model enables everyone to build with fully custom AI sound effects. It's charged at 100 characters per generation when using auto-generation or 25 characters per second generated when setting a duration.

Eleven Labs Launches Voiceover Studio - Create video voiceovers and podcasts with multiple speakers and sound effects in a single workflow.

Check Out My Other Videos:

Claude

Join the conversation

or to participate.