Anthropic's Major LLM Breakthrough

Seeing Inside the LLM Black Box

Anthropic Achieves Breakthrough in Mapping Millions of Concepts Inside Large Language Models, and Gets Stuck on the Golden Gate Bridge

Researchers have made a breakthrough in AI interpretability by mapping out the internal representations of concepts within Claude Sonnet, a sophisticated large language model. This exploration has uncovered millions of features, revealing a complex network of neurons that collectively encode a wide array of entities such as cities, people, and scientific fields, along with more abstract concepts like gender bias and code bugs. By identifying how these features correlate and influence the model's behavior, the team has demonstrated the potential to manipulate the AI's responses by amplifying or suppressing certain features. This insight could enhance AI safety measures by allowing the monitoring of dangerous behaviors and improving the model’s alignment with ethical guidelines. The research underscores the complexities involved in achieving full transparency within AI systems and highlights ongoing efforts to make AI safer and more reliable.


Vultr is empowering the next generation of generative AI startups with access to the latest NVIDIA GPUs.

Try it yourself when you visit and use promo code "BERMAN300" for $300 off your first 30 days.

  • AI Is a Black Box. Anthropic Figured Out a Way to Look Inside - Researchers at Anthropic, co-founded by AI researcher Chris Olah, are making strides in understanding the inner workings of artificial neural networks, specifically large language models (LLMs). Despite LLMs' capabilities to impress and provoke due to biases or inaccuracies, their internal processes remain elusive. Anthropic's mechanistic interpretability team is using "dictionary learning" to decode and manipulate these models, relating specific groups of artificial neurons to significant outputs or "features." Through extensive experimentation, they've identified millions of features, including safety-related ones, and can alter LLM behavior by enhancing or diminishing these features. This process, likened to AI brain surgery, holds promise for safer and more tuned LLM applications. Anthropic's findings, however, are still just a beginning and not a comprehensive solution to LLM's opacity, with techniques that may not apply broadly across different LLMs.

  • Scale AI secures $1B funding at $14B valuation as its CEO predicts big revenue growth and profitability by year-end - Scale AI, a startup focused on AI data labeling and model training services, secured $1 billion in a Series F funding round, pushing its valuation to $14 billion. Backed by investors like Accel and participants such as Amazon and Intel Capital, Scale AI is capitalizing on the surging demand for generative AI. CEO Alexandr Wang revealed that annual recurring revenue tripled in 2023 and is projected to hit $1.4 billion by 2024, with profitability expected by year's end. The company, which began at Y Combinator in 2016, initially provided crucial data services for autonomous driving firms and now serves a broader AI market, including government and military clients. Despite controversy over its data annotator practices, Scale AI is enhancing its work with specialized experts and focusing on model evaluation and risk management, aiming to stay at the forefront of AI development and infrastructure support.

  • Google Taps AI to Show Shoppers How Clothes Fit Different Bodies - Google unveiled an AI-powered online ad feature that addresses the common challenge of discerning how clothing items will fit various body types while shopping online. The new ads permit brands to display their products on a diverse range of models without needing additional photos, using AI to integrate product images with model shots. This development aims to enhance the online shopping experience by providing a more realistic preview of how garments might look on different individuals. Google has also introduced other generative AI tools at its Marketing Live event to strengthen its advertising offerings, including AI-generated short-form video summaries and new product images. These advancements aim to help businesses create engaging visuals with less reliance on traditional creative resources, potentially changing the advertising and small business landscape amid competition from platforms like Amazon and TikTok.

  • Nvidia’s rivals take aim at its software dominance - Nvidia's dominance in the AI chip market, primarily due to its Cuda software platform, is being challenged by an OpenAI-led initiative and major tech companies like Meta, Microsoft, and Google. These companies are developing Triton, an open-source software aiming to make AI applications compatible with a wider range of chips. Despite Nvidia's stronghold due to its comprehensive software ecosystem, the high costs and supply issues of its hardware are driving customers to seek alternatives. Rival chipmakers such as Intel, AMD, and Qualcomm are also supporting Triton to reduce Nvidia's market control.

  • China’s latest answer to OpenAI is ‘Chat Xi PT’ - China's latest AI initiative, "Chat Xi PT," is a large language model trained primarily on the political philosophy of President Xi Jinping, known as "Xi Jinping Thought on Socialism with Chinese Characteristics for a New Era." This model, developed by the Cyberspace Administration of China, aims to integrate Xi’s ideas into its responses and promote socialist values. Initially used at a research center, it is expected to expand for wider use. This development reflects China's effort to balance strict controls on free speech with fostering AI advancements to rival Western models like OpenAI's ChatGPT.

  • Amazon plans to give Alexa an AI overhaul — and a monthly subscription price - Amazon is set to enhance Alexa with generative AI, potentially introducing a subscription fee to support this upgrade. This move aims to rejuvenate the voice assistant and make it more competitive against recent AI advancements from companies like OpenAI and Google. The subscription will be separate from Amazon Prime, with pricing still undecided. Internal pressures have grown since the company's focus on profitability intensified under CEO Andy Jassy, with large team reorganizations geared toward making Alexa a leading AI contender. Amazon is facing challenges in staying at the forefront of AI innovation and talent acquisition. Despite these hurdles, the extensive existing user base of Alexa offers a prime opportunity for the company.

  • Meta’s Zuckerberg Creates Council to Advise on AI Products - Mark Zuckerberg, CEO of Meta Platforms Inc., has established a new advisory council to guide the company's AI and technological advancements. The Meta Advisory Group includes Patrick Collison (Stripe CEO), Nat Friedman (former GitHub CEO), Tobi Lütke (Shopify CEO), and Charlie Songhurst (investor and former Microsoft executive). Unlike Meta's board of directors, the council will not be elected by shareholders nor have fiduciary duties, and its members will serve without compensation. This initiative aims to bolster Meta’s AI-driven products and strategic growth, with a focus on both hardware (e.g., Quest VR headsets, Ray-Ban smartglasses) and software (e.g., AI assistant in apps).

  • A landmark multi-year global partnership with News Corp - OpenAI and News Corp have entered a multi-year global partnership to bring News Corp’s premium journalism content to OpenAI’s platform. This collaboration allows OpenAI to display current and archived content from major News Corp publications, including The Wall Street Journal, The Times, and The Australian, in response to user queries. The agreement aims to enhance OpenAI’s products by providing reliable information and supporting high journalistic standards. Both organizations express enthusiasm about the partnership’s potential to set new standards for journalism in the digital age.

  • Microsoft's new AI recall feature is being investigated by European regulators - Microsoft revealed a new 'Recall' feature at the Microsoft Build developer conference which acts as a "photographic memory" for users' computer activities by constantly taking screenshots. Although framed as an enhancement, the capability sparked privacy concerns from various quarters, including Tesla CEO Elon Musk. The UK's Information Commissioner's Office has expressed interest in examining Recall's implications for privacy. Microsoft insists the feature is secure, with snapshots encrypted and stored locally, accessible only to the user signed in. Users have options to manage, filter, or delete these snapshots, and IT administrators can control the feature's deployment through Microsoft Intune.


    Careerist’s Software QA Engineering course can be completed in 15 weeks, with personalized guidance from experienced coaches.

    Take the first step towards a successful tech career today by following this link or with promo code MATTHEW BERMAN to receive a $600 discount on the course PLUS a money-back guarantee.

  • Nvidia, Dell Are Building Their Own AI 'Factories - Nvidia and Dell's will collaborate to build AI 'factories,' which are specialized data centers designed to support large-scale AI workloads. These facilities aim to streamline AI development by providing the necessary infrastructure and tools for training, deploying, and managing AI models. The partnership highlights the growing demand for dedicated AI resources in enterprise environments, focusing on enhancing efficiency and scalability for businesses adopting AI technologies.

  • Nvidia will now make new AI chips every year - Nvidia has reported a $14 billion profit in one quarter, largely owing to its AI chip sales. In a strategic shift, CEO Jensen Huang has announced an accelerated design cycle, with Nvidia now set to release new chip architectures annually, a significant increase from its prior biennial pace. Following the release of the Blackwell architecture in 2024, the next-generation 'Rubin' is anticipated for 2025, indicating the R100 AI GPU could launch next year. This increase will also apply to CPUs, GPUs, networking NICs, and switches. Nvidia's AI GPUs are designed for easy transition in data centers, maintaining backward compatibility. Demand for these powerful AI processors is high, owing to their cost-saving and revenue-generating potential. Nvidia is also seeing growth in its automotive sector, with Tesla’s significant purchase of H100 GPUs and Meta's heavy investment highlight the booming interest in Nvidia's technology.

  • Microsoft’s Inflection AI grab likely cost more than $1 billion, says an insider - Microsoft's recent acquisition of Inflection AI's founders and employees, along with the rights to sell access to its model, is revealed to have cost over $1 billion. This deal includes $620 million for model access rights and $33 million for hiring waivers. Additionally, part of the $653 million total may be used to buy back equity from existing Inflection shareholders, who are guaranteed $1.50 for every dollar of equity they own, totaling an anticipated $380 million payout. Despite the high cost, many Inflection investors will see only nominal gains compared to the company's previous $4 billion valuation.

  • Meta AI chief says large language models will not reach human intelligence - Yann LeCun, Meta's chief AI scientist, argues that large language models (LLMs) like ChatGPT are inherently limited and will not achieve human-level intelligence. He critiques LLMs for their lack of understanding of logic, the physical world, and their inability to reason or plan. Instead, LeCun advocates for a new approach called "world modelling" to develop superintelligent AI, which he believes could take a decade to achieve. This approach aims to teach AI systems common sense and the ability to learn from their environment, similar to how humans do. Meta has invested heavily in AI research, but LeCun's vision faces skepticism and is seen as a high-risk strategy.

  • Amazon is pivoting on its order of Nvidia's latest 'superchip' - Months after Nvidia launched its Hopper superchip, Amazon Web Services (AWS) has shifted its focus to Nvidia’s latest offering, the Blackwell processor, foregoing the former. AWS, as the leading cloud service provider globally, cited the brief interval between the two chips as the reason for transitioning to Blackwell. Nvidia CEO Jensen Huang lauded Hopper as the best current GPU, also hinting at the necessity for more robust GPUs. Despite no comments on Blackwell’s pricing, AWS clarified that orders were not halted but rather rerouted in collaboration with Nvidia.

  • AI-intensive sectors are showing a productivity surge, PwC says - Businesses utilizing artificial intelligence (AI) are experiencing productivity growth nearly five times faster than those that are not, according to PwC. Professional, financial services, and IT sectors saw a 4.3% productivity increase from 2018 to 2022, compared to 0.9% in construction, manufacturing, retail, food, and transport. The rise of AI could potentially end the period of low productivity growth, leading to economic improvements, higher wages, and better living standards. Jobs requiring AI skills offer higher wages, with a 25% premium in the U.S. and 14% in Britain, indicating the value and impact of AI on the workforce.

Awesome Research Papers

Observational Scaling Laws and the Predictability of Language Model Performance - This paper presents an observational approach to develop scaling laws for language models by analyzing ~80 publicly available models, negating the need for varied-scale model training. The study challenges the commonality across different model families, suggesting a generalized scaling law where performance is determined by a capability space and efficiency in converting compute to capabilities. Surprising findings include the prediction of complex scaling phenomena, emergent behaviors, agent performance, and the impact of post-training interventions from smaller models' behaviors, implying a systematic predictability in language model advancement.

Towards Modular LLMs by Building and Reusing a Library of LoRAs - The study examines the potential of reusing trained adapters for large language models (LLMs) on new tasks. It introduces techniques for creating a multi-task adapter library and explores model-based clustering (MBC) to group tasks by adapter parameter similarity, enhancing task transfer ability. Additionally, the research presents Arrow, a zero-shot routing method enabling the dynamic selection of appropriate adapters for novel inputs without retraining. Experiments with LLMs like Phi-2 and Mistral on varied tasks show that MBC adapters and Arrow routing significantly improve task generalization, moving towards more adaptable and modular LLMs that can compete with traditional joint training methods.

The Foundation Model Transparency Index - The May 2024 Foundation Model Transparency Index reveals progress in the transparency of foundation model developers, with the average score improving by 21 points from October 2023, reaching 58 out of 100, and the top score rising to 85. This biannual assessment, unlike its predecessor, required developers to proactively submit reports detailing their adherence to 100 transparency indicators. Out of 19 approached developers, 14 participated, each disclosing new information, averaging revelations on 16.6 previously undisclosed indicators. Despite this progress, areas such as data rights, effectiveness of model guardrails, and the societal impact of models remain less transparent. Recommendations for regular transparency reporting align with voluntary codes from influential bodies like the White House and the G7. The report serves as a centralized resource for researchers and policymakers to identify transparency gaps, thereby guiding potential policy interventions.

Mistral-7B-Instruct-v0.3 - The Mistral-7B-Instruct-v0.3 is an upgraded language model building upon its predecessor, Mistral-7B-v0.2, with a refined vocabulary of 32,768 words along with support for the v3 Tokenizer and function calling capabilities. The model can be effectively utilized through the 'mistral_inference' library, complemented by simple CLI commands for interaction. Function calls enable the model to perform specific tasks, such as providing current weather updates. Hugging Face transformers support further extends its application for text generation tasks. However, it lacks moderation features, which is an aspect the Mistral AI team aims to address in collaboration with the community. The team comprises numerous experts across various domains contributing to the development of the model.

New models added to the Phi-3 family - Microsoft has launched Phi-3, a family of small language models (SLMs) designed for efficiency and performance. The first model, Phi-3-mini with 3.8 billion parameters, is now available and optimized for deployment on resource-constrained devices like smartphones. Despite its smaller size, Phi-3 achieves competitive results compared to larger models through techniques like high-quality curated data and reinforcement learning from human feedback. Additional models in the Phi-3 family, including Phi-3-small (7B) and Phi-3-medium (14B), will be released in the coming weeks, offering more options across the quality-cost curve, as well as Phi-3 vision.

Cohere For AI Launches Aya 23, 8 and 35 Billion Parameter Open Weights Release - Cohere For AI has announced Aya 23, a new family of multilingual, generative large language research models, covering 23 languages with 8-billion and 35-billion parameter versions available as open weights. Aya 23 builds on the Aya project, which involved 3,000 collaborators and created the largest multilingual instruction fine-tuning dataset. The Aya 23 models are designed to expand state-of-the-art language modeling capabilities and support underrepresented languages, addressing the limitations of existing models. Aya 23 demonstrates superior performance in natural language understanding, summarization, and translation, and aims to democratize access to advanced AI technology for researchers globally.

Awesome New Launches

Khan Academy and Microsoft partner to expand access to AI tools that personalize teaching and help make learning fun - Khanmigo, an AI-powered teaching assistant, has been introduced into classrooms to support both students and teachers by encouraging problem-solving rather than giving direct answers. This tool prompts students to engage with the material creatively and analytically. Teachers appreciate how Khanmigo asks questions to foster a deeper understanding of subjects like science and chemistry without doing the work for students. Designed with accountability features, the AI assists against cheating by making student chats visible to parents and educators and flagging concerning interactions. Partnering with Microsoft has allowed Khan Academy's founder, Sal Khan, to position AI as a 21st-century rendition of personalized 1:1 tutoring, envisioning a future where every student is guided towards knowledge

Cover-Agent: We created the first open-source implementation of Meta’s TestGen–LLM - Meta researchers developed TestGen-LLM, a tool to enhance software test coverage, which caught the attention of developers. While Meta did not release the code, an implementation as part of the open-source Cover-Agent has been introduced. Cover-Agent iteratively generates, validates, and adds tests to an existing test suite, aiming to increase code coverage without manual intervention until pre-set targets are met. The Cover-Agent approach circumvents some limitations of the original tool by generating multiple tests prior to developer review, but it still acknowledges that the technology is assistant-level, rather than fully autonomous.

Introducing Tako, a new way to reference real knowledge And our first integration, Perplexity - Tako introduces a new AI search engine designed for visualizing and sharing knowledge in an engaging, shareable format. Unlike traditional AI, which often provides unreliable or verbose responses, Tako sources real-time data from authoritative providers. This allows for precise, visual answers to complex queries, such as comparing financial and political metrics over time. Tako's first integration with Perplexity enhances its answer engine, enabling users to perform unique, visually-rich research. This partnership aims to revolutionize how knowledge is accessed and shared in AI-native applications.

Cool New Tools

Say What You See - Google Arts & Culture - Learn the art of image prompting with the help of Google AI.

Check Out My Other Videos:

Join the conversation

or to participate.