A Week of Stumbles for AI


Following Leadership Shakeup and Document Leaks, OpenAI Focuses on Fortifying Safety Measures

OpenAI has established a committee to provide recommendations on the safety and security of its projects and operations. The company is training a new advanced model aimed at advancing towards artificial general intelligence (AGI). This committee, including CEO Sam Altman and other board members, will refine the company's processes and safeguards within 90 days. Recently, OpenAI introduced GPT-4o, a model enabling conversational interaction with ChatGPT on smartphones, capable of analyzing multimedia and recognizing emotions.


Vultr is empowering the next generation of generative AI startups with access to the latest NVIDIA GPUs.

Try it yourself when you visit getvultr.com/forwardfutureai and use promo code "BERMAN300" for $300 off your first 30 days.

  • Google’s A.I. Search Errors Cause a Furor Online - Google has recently revealed new AI enhancements to its search engine, aiming to stay competitive with Microsoft and OpenAI. However, these updates have led to significant missteps as users encountered inaccurate and bizarre responses, such as suggesting glue for pizza recipes and eating rocks for nutrition. These issues have damaged Google's reputation for reliability, as over two billion people rely on its search results. Past AI initiatives like the Bard and Gemini chatbots also faced similar problems upon release, reflecting a pattern of difficulties with new AI features. Despite the criticisms from technology experts, financial analysts understand that Google's rapid development pace is a necessity in the fierce competition with its rivals, even at the risk of temporary setbacks.

  • The Great AI Challenge: We Test Five Top Bots on Useful, Everyday Skills - In a comprehensive evaluation of AI chatbots, the Wall Street Journal tested five leading models—OpenAI’s ChatGPT, Microsoft’s Copilot, Google’s Gemini, Anthropic’s Claude, and Perplexity—across various real-life tasks. Perplexity emerged as the overall winner, excelling in areas like summarization and current events, while ChatGPT led in health advice and speed. Each bot displayed unique strengths: Claude was best for work-related writing, and Copilot excelled in creative writing. Despite recent upgrades, ChatGPT did not dominate the rankings, highlighting the diverse capabilities and evolving nature of these AI tools.

  • With AI Anyone Can Be A Coder Now - In a recent TED Talk, Thomas Dohmke, CEO of GitHub, discusses the transformative potential of artificial intelligence in democratizing coding. He emphasizes that AI-powered tools can enable individuals without traditional coding backgrounds to create software, thereby broadening access to technological innovation. Dohmke highlights the implications of this shift, suggesting it will foster a more inclusive tech industry and accelerate problem-solving across various sectors. He envisions a future where AI and human collaboration significantly enhance creativity and efficiency in software development.

  • Why AI art will always kind of suck - Artificial intelligence (AI) is changing the landscape of creativity by providing tools like DALL-E to convert text prompts into images, sparking debate about the future of art and creative professions. As generative AI models mimic human creativity, there are concerns about the quality, originality, and meaningfulness of the content they produce. Critics argue that AI-generated art lacks the depth and context that human artists bring, leading to a deluge of mediocre content. AI in film has stirred ethical debates, while the impact on creative jobs is a major concern, with artists feeling threatened by AI's use of their work for training models. Despite advancements, there's skepticism about AI fully capturing the essence of art that resonates with human emotions and experiences.

  • OpenAI Releases Former Staffers From Non-Disparagement Clauses - OpenAI announced it will release most former employees from nondisparagement agreements tied to their exit contracts and equity. This decision follows concerns raised by staff about restrictions on speaking out against the company, which could lead to the loss of vested equity. OpenAI confirmed it will not cancel any vested units for those who speak out and has removed the controversial language from future exit paperwork. Chief Strategy Officer Jason Kwon apologized for the distress caused and committed to improving company policies.

  • Meta Is Working on a Paid Version of Its AI Assistant - Meta is developing a paid version of its AI assistant, Meta AI, according to an internal post. This premium tier aims to offer advanced features and computing power, similar to subscription services by Google and Microsoft. Meta is also creating AI agents for tasks like coding and advertising, alongside a broader reorganization of its generative AI group. These moves come as Meta plans significant AI-related capital expenditures and explores monetizing its AI services. The company has historically avoided charging users but may pivot towards premium AI offerings in the future.

  • AI tutors are quietly changing how kids in the US study, and the leading apps are from China - AI-powered educational apps, leveraging large language models, are gaining popularity, posing a threat to established tutoring franchises like Kumon. While AI tutors offer equitable and personalized learning, they are not without limitations, sometimes providing incorrect answers. Despite these issues, students benefit from improved grades and reduced tutoring costs. As AI evolves, the debate continues on its role in education, with educators and parents grappling with its integration and potential pitfalls.


    Careerist’s Software QA Engineering course can be completed in 15 weeks, with personalized guidance from experienced coaches.

    Take the first step towards a successful tech career today by following this link or with promo code MATTHEW BERMAN to receive a $600 discount on the course PLUS a money-back guarantee.

  • Macron: French AI Boom Could Help EU Close Tech Gap with U.S. and China - French President Emmanuel Macron highlighted the potential for a French-led artificial intelligence (AI) boom to help the European Union (EU) close its technological gap with the U.S. and China. Speaking about the EU's need to bolster its tech industry, Macron emphasized the importance of AI innovation and reducing dependency on external powers. He argued that fostering AI growth could drive economic competitiveness and strategic autonomy for Europe. Macron also reiterated the urgency of supporting European industries through strategic investments and collaborations within the EU to maintain global influence and economic strength​.

  • Musk Plans xAI Supercomputer, Dubbed ‘Gigafactory of Compute - Elon Musk's AI startup, xAI, aims to build a massive supercomputer, termed a "gigafactory of compute," by linking 100,000 Nvidia H100 GPUs to enhance its conversational AI, Grok. This ambitious project, slated for completion by fall 2025, seeks to create a computing cluster four times larger than existing ones by tech giants like Meta. The endeavor will require significant investment and power resources, potentially partnering with Oracle for cloud services. Despite trailing rivals OpenAI and Microsoft, Musk's plan underscores his commitment to advancing AI capabilities and closing the gap in AI infrastructure.

  • Musk’s A.I. Firm Raises $6 Billion in Race With Rivals - Elon Musk's AI venture, xAI, has raised $6 billion in a funding round, which values the company at $18 billion before the inclusion of the new capital. Notable investors such as Andreessen Horowitz, Sequoia Capital, and Prince Alwaleed bin Talal participated. The funds are allocated for market launch of xAI's initial products, infrastructure development, and accelerating research and development. This move positions xAI alongside major players like OpenAI and Anthropic amid escalating investments in the AI sector by tech firms leveraging AI across various applications. Musk, critical of AI's current trajectory and after a legal dispute with OpenAI, plans to make xAI competitive by potentially the end of the year.

  • George Lucas Says AI in Film is 'Inevitable' Amid Creative Concerns - George Lucas, known for pioneering digital technology in filmmaking with his company Industrial Light and Magic (ILM), discussed the use of AI in cinema at the Cannes Film Festival. He regards AI's integration into film as "inevitable," comparing resistance to AI to favoring horses over cars. This sentiment arises amidst the backdrop of unease in the creative industry—AI remains contentious, as highlighted by recent strikes by actors and writers unions demanding protections against AI's potential threats. Despite the concerns, both the film and video game industries are exploring AI's potential, while recognizing the need for ethical guidelines and regulations.

  • World’s only Starbucks where 100 service robots fulfill orders - Naver 1784 Tower, located in Seongnam, South Korea, operates as the world's largest robotics testbed and headquarters for tech firm Naver. This futuristic 36-storey facility is a test ground for robotics, AI, and cloud services, with a fleet of approximately 100 autonomous service robots named Rookie handling tasks like delivering food and packages. These robots operate on the ARC system, which functions on Naver Cloud and a 5G network, enabling indoor navigation and task management. The ARC system reduces costs and energy consumption by moving processing power to the cloud, highlighting Naver's push for popularizing service robots and improving human-robot interaction.

  • Apple Bets That Its Giant User Base Will Help It Win in AI - Apple plans to leverage its extensive user base to gain an edge in AI, despite lagging behind competitors like Microsoft and Google. At the upcoming Worldwide Developers Conference, Apple will introduce AI features integrated into core apps and systems, relying on both on-device processing and cloud support. Highlights include Project Greymatter, enhanced Siri interactions, and AI-driven tools in iOS 18 and macOS 15. Apple also aims to partner with OpenAI for advanced chatbot capabilities, acknowledging the need to catch up in the AI race.

Awesome Research Papers

Reducing hallucination in structured outputs via Retrieval-Augmented Generation - The paper describes a major challenge faced by Generative AI: its tendency to fabricate information, which can impede real-world application. The authors present their work with large language models (LLM), specifically focusing on an enterprise application that generates workflows from natural language inputs. They incorporate a Retrieval Augmented Generation (RAG) approach to substantially enhance the quality and accuracy of the LLM's outputs, reducing false information generation. Moreover, their method also allows for the use of smaller LLMs by employing an efficiently trained retriever encoder, which optimizes resource consumption without compromising performance.

Lessons from the Trenches on Reproducible Evaluation of Language Models - The paper addresses the complexities of evaluating language models within Natural Language Processing (NLP). It acknowledges challenges like evaluation setup sensitivity and the necessity for transparent and reproducible comparisons across different methods. Drawing from three years of experience, the authors recommend best practices to mitigate these challenges and introduce an open-source tool, the Language Model Evaluation Harness (lm-eval). This tool is designed to facilitate independent, reproducible, and extensible evaluations, and the site details its features and uses through case studies.

SimPO: Simple Preference Optimization with a Reference - SimPO is a proposed algorithm designed to improve the efficiency and performance of Direct Preference Optimization (DPO) in reinforcement learning from human feedback (RLHF) by employing an innovative reward mechanism based on the average log probability of a sequence. This approach eliminates the need for a reference model, enhancing compute and memory efficiency. By introducing a target reward margin within the Bradley-Terry objective, SimPO encourages a larger performance gap between winning and losing responses. The algorithm outperformed its predecessors in various benchmarks, with significant improvements observed on the AlpacaEval 2 and Arena-Hard benchmarks, using base and instruction-tuned models like Mistral and Llama3. Specifically, the Llama3-8B-Instruct model achieved unprecedented success rates, leading in its category on available leaderboards.

Transformers Can Do Arithmetic with the Right Embeddings - This paper addresses the challenge of transformers struggling with arithmetic tasks due to their inability to track digit positions in long sequences. A solution is proposed with the introduction of an embedding for each digit to encode its relative position. This not only enhances performance on its own but also enables further architectural improvements, like input injection and recurrent layers. Through these enhancements, transformers trained on 20 digit numbers for a single day exhibit up to 99% accuracy on 100 digit addition problems. Moreover, advancements in numerical understanding have also led to better performance in other multi-step reasoning tasks, such as sorting and multiplication.

Financial Statement Analysis with Large Language Models - The study explores the capability of large language models (LLMs) to perform financial statement analysis, comparing their performance to that of professional human analysts. The researchers provided standardized and anonymous financial statements to GPT4, instructing it to analyze them to predict the direction of future earnings. The results show that the LLM outperforms financial analysts in predicting earnings changes, particularly in situations where human analysts tend to struggle. The LLM's prediction accuracy is comparable to that of a specialized machine learning model. The study finds that the LLM's predictions are not based on its training memory but rather on its ability to generate useful narrative insights about a company's future performance. Trading strategies based on the LLM's predictions yield a higher Sharpe ratio and alphas than strategies based on other models, suggesting that LLMs may play a central role in decision-making.

Golden Gate Claude - New research unveils that the AI model Claude 3 Sonnet possesses the ability to identify and adjust individual concepts, or "features," within its neural network. Researchers illustrated this by demonstrating a feature linked to the Golden Gate Bridge which, when intensified, prompted the AI to frequently reference the landmark, irrespective of the query's relevance. This capability was showcased in an interactive online model named "Golden Gate Claude," albeit with a caution regarding its potential for unexpected responses. The findings indicate a significant step towards understanding and modifying the fundamental workings of large language models, enhancing not only interpretability but also safety by adjusting features tied to harmful behaviors.

Awesome New Launches

Canva Create 2024: Introducing a whole new Canva - Canva has introduced significant updates to enhance workplace productivity and collaboration. The new features include a redesigned editor, Canva Enterprise for large organizations, and Canva Work Kits tailored for various departments like HR, sales, marketing, and creative teams. New AI-powered tools, such as Magic Design, Text to Graphic, and personalized tone of voice, streamline the design process. Additional enhancements like Data Autofill, Recordings, and integration with popular workplace apps further improve efficiency and collaboration across teams.

Cool New Tools

Arc Search app gets silly 'phone call' search gesture - The Browser Company's iOS app, Arc Search, has introduced a new feature called "Call Arc," allowing users to perform internet searches via voice command by raising their iPhone to their ear, simulating a phone call gesture. This update, as covered by Mac Rumors, provides verbal answers along with an animated smiley interface. Arc Search, launched in January 2024, employs AI from OpenAI, possibly in combination with other models, for search results presentation and includes a "browse for me" function that summarizes web pages, with a built-in ad blocker for clearer results.

Check Out My Other Videos:

Join the conversation

or to participate.