OpenAI Dominates With GPT4o Launch (aka "HER")

The first real personal assistant AI

The latest iteration boasts improved capabilities across text, vision, and audio

OpenAI has introduced GPT-4o, an enhanced version of the GPT-4 model, boasting increased speed and expanded capabilities in processing text, vision, and audio. This update, announced by OpenAI's CTO Mira Murati, will be available to all users, with paid subscribers receiving higher capacity limits. GPT-4o will be implemented in ChatGPT gradually, beginning with text and image features and incorporating multimodal functions, handling voice, text, and images seamlessly. OpenAI CEO Sam Altman highlighted that developers can access the API at a reduced cost and with improved performance. The launch strategically precedes Google I/O, countering speculation about an AI search engine or a new GPT-5 model.


Vultr is empowering the next generation of generative AI startups with access to the latest NVIDIA GPUs.

Try it yourself when you visit and use promo code "BERMAN300" for $300 off your first 30 days.

  • AI expert's large model beats OpenAI's GPT-4 - Kai-Fu Lee, a distinguished AI expert and founder of 01.AI, has announced that their latest AI model, Yi-Large, has outperformed OpenAI's GPT-4 in reasoning and other key aspects by using over 100 billion tokens. Lee revealed that this advancement is part of their goal to become a globally recognized AI company. Within just nine months, their large model products have captivated nearly 10 million users and one product has generated significant revenue. Moreover, Lee hinted at the development of an even more advanced AI model that is showing promising capabilities.

  • Microsoft announces the largest investment to date in France to accelerate the adoption of AI, skilling and innovation - Microsoft has announced a €4 billion investment in France, its largest to date, to accelerate the adoption of artificial intelligence (AI), skilling, and innovation. The investment will focus on expanding cloud and AI infrastructure, training 1 million people in AI skills by 2027, and supporting the French startup ecosystem. This commitment aims to increase France's competitiveness, foster digital technology, and create long-term benefits for the economy and job market.

  • OpenAI, Mass Scraper of Copyrighted Work, Claims Copyright Over Subreddit's Logo - OpenAI has filed a copyright complaint against the ChatGPT subreddit for using their logo without authorization, as seen in a message received by subreddit moderators from Reddit. The complaint highlights user confusion due to the unauthorized use of OpenAI's logo on the subreddit's profile image, requiring its removal by May 16. Ironically, this move comes shortly after OpenAI defended its web scraping policies, despite ongoing controversies over its use of copyrighted materials for AI training, as exemplified by a lawsuit from the New York Times and criticisms of its handling of copyrighted content. Amidst the enforcement of this copyright claim, the subreddit has complied by removing the OpenAI logo, while the community has reacted with irony and criticism towards OpenAI's approach, given its history of extensive data scraping and content usage.

  • SoundHound AI and Perplexity Partner to Bring Online LLMs to Next Gen Voice Assistants Across Cars and IoT Devices - SoundHound AI, a leading voice AI company, has partnered with Perplexity, a conversational AI-powered answer engine, to enhance its voice assistant capabilities. This partnership will integrate Perplexity's online large language model (LLM) capabilities into SoundHound Chat AI, enabling the assistant to provide accurate and up-to-date responses to web-based queries that static LLMs cannot currently answer. This integration will expand the type and complexity of questions the assistant can handle, offering users a more comprehensive and dynamic conversational experience.

  • Did Stanford just prototype the future of AR glasses? - The Stanford Computational Imaging Lab is developing an AI-assisted holographic imaging system for augmented reality (AR) applications, boasting a design promising to be slimmer and of higher quality than existing technologies. Although the prototype currently has a limited field of view of 11.7 degrees, the technology employs a novel "nanophotonic metasurface waveguide," eliminating the need for bulky optics, and utilizes AI algorithms for enhanced image quality, automatically calibrated with camera feedback. The research describes the ambition to outperform current AR offerings through a more compact solution, posing potential competition to heavyweights like Apple and Meta, who are investing heavily in the AR space with aspirations to create the ideal product that resembles everyday eyewear.

  • Leaked Deck Reveals OpenAI's Pitch on Publisher Partnerships - OpenAI, known for generative AI technologies like ChatGPT, is engaging news publishers for potential partnerships under the Preferred Publishers Program (PPP). OpenAI has secured agreements with various high-profile publishers, granting them enhanced content visibility and financial remuneration within its AI services, while controversies persist due to legal disputes over the use of copyrighted material. Despite OpenAI's refutation of some reported details, the initiative reflects ongoing dialogues between AI firms and digital publishers to define the evolving dynamics of content usage and compensation in the AI space.

  • Apple Will Revamp Siri to Catch Up to Its Chatbot Competitors - Apple's top software executives were prompted by OpenAI's ChatGPT to overhaul Siri, Apple's virtual assistant, deemed out-of-date in its capabilities. Introduced in 2011, Siri struggled with multi-turn conversations and often bungled questions. In stark contrast, ChatGPT could handle complex inquiries and maintain context, like tracking weather requests for different cities seamlessly. Prompted by this technological disparity, Apple initiated a major reorganization, focusing heavily on generative artificial intelligence as a key initiative. An upgraded, conversational Siri with a generative AI backbone is expected to be unveiled at Apple's developers conference on June 10. This ambitious update is part of a larger strategy to integrate generative AI throughout Apple's offerings. Additionally, the new iPhones will have expanded memory to accommodate the improved Siri. Apple is considering licensing AI models from various companies to support these enhancements.

  • 6 incredible images of the human brain built with the help of Google's AI - A collaboration between Google researchers and Harvard neuroscientists has led to a detailed 3D model of a small portion of the human brain, using advanced AI image analysis. This model represents a brain region one-millionth the size of the full organ and contains around 50,000 cells and 150 million synapses within a tissue sample the size of half a grain of rice. The reconstructed dataset is the largest of its kind at 1.4 petabytes and has revealed numerous insights, including the existence of neurons connected by up to 50 synapses and the discovery of rare "axon whorls." This open-access data aims to aid further scientific exploration into neurological functions and disorders.

  • Bumble’s Whitney Wolfe Herd says your dating ‘AI concierge’ will soon date hundreds of other people’s ‘concierges’ for you - Whitney Wolfe Herd, the founder and executive chair of Bumble, envisions a future where AI does the preliminary "dating" for individuals by handling initial interactions and recommending potential matches. Wolfe Herd suggested that an AI dating concierge could potentially date on behalf of users, sifting through individuals to identify the most compatible few. Bumble, while once known for prompting women to make the first move, has shifted its algorithm but maintains its core mission of fostering a safer, more respectful online dating environment. As AI continues to advance, Wolfe Herd also sees potential for these technologies to offer personal coaching and dating advice. Wolfe Herd, however, remains focused on using AI to enhance healthy and equitable relationships rather than creating AI companions.

  • U.K. agency releases tools to test AI model safety - The U.K. Safety Institute has launched an open source AI safety toolset named Inspect, designed to facilitate the evaluation of AI models' knowledge and reasoning abilities by generating performance scores. This initiative marks a significant first from a state-backed entity to create a universally accessible AI safety testing platform. Inspect operates through three main components—data sets for sample tests, solvers for executing these tests, and scorers for aggregating results into metrics, all of which can be expanded with third-party Python packages. The release of Inspect aligns with broader efforts, such as the U.S.-U.K. partnership for advanced AI model testing and the U.S. plan to establish its own AI safety institute to manage AI-related risks.

  • SoftBank's Arm plans to launch AI chips in 2025 - SoftBank Group, led by CEO Masayoshi Son, is embarking on a substantial investment initiative, pouring 10 trillion yen (approximately $64 billion) into sectors including AI chips, robotics, and data centers. As a part of this strategy, the Group's subsidiary, Arm, is set to enter the artificial intelligence chip market, with plans to unveil their initial offerings in the following year. This move signals SoftBank's commitment to establishing itself as a dominant entity in the AI industry.

Awesome Research Papers

What’s up with Llama 3? Arena data analysis - This post details a comprehensive analysis of Meta's Llama 3-70B, a new large language model that has outperformed other leading models in certain English Chatbot Arena categories. The analysis focuses on prompt types, difficulty levels, the impact of duplication, and qualitative differences. Llama 3 excels at creative, open-ended prompts but falters with more challenging, specific tasks, especially in mathematics and coding. The model's win rate declines as the prompt's difficulty increases. Duplication and a small user sample do not significantly affect win rates. Llama 3 distinguishes itself by producing output that is friendlier and more conversational. However, while these traits are more prevalent in winning battles, their exact influence on victory is ambiguous and necessitates further study.

Energy Star Ratings for AI Models - The Energy Star AI Project, an initiative inspired by the EPA's traditional Energy Star ratings, proposes to rate AI models based on their energy efficiency, addressing environmental concerns related to energy consumption and GHG emissions from AI use. It involves testing different AI models across 10 varied tasks, from language processing to computer vision, to establish a system that assists in selecting energy-efficient models. The ultimate goal is to create a 'Green AI Leaderboard,' driving the AI community towards more sustainable practices by offering insights into energy-efficient implementations and strategies.

Announcing Refuel LLM-2 - RefuelLLM-2 and RefuelLLM-2-small represent the latest advancements in large language models designed specifically for data labeling, enrichment, and cleaning. RefuelLLM-2 achieves a performance of 83.82%, outpacing other models like GPT-4-Turbo and Claude-3-Opus on a variety of data labeling benchmarks. Developed from a Mixtral-8x7B base model, it has been trained with over 2750 datasets and excels in tasks requiring long input contexts (32K tokens). The smaller variant, RefuelLLM-2-small, also surpasses its counterparts in its category, leveraging a Llama3-8B base. Both models distinguish themselves with their ability to generalize, evidenced by improved performance on non-public datasets as well as specific domain datasets. Additionally, they offer more reliable confidence scores, proven through AUROC analysis.

Gemma-2B-10M - The Gemma 2B model offers an advanced approach to language modeling by enabling an impressive sequence length of up to 10 million tokens, while requiring less than 32GB of memory for implementation. This efficiency is achieved through the use of recurrent local attention and native cuda optimization, maintaining linear memory complexity. It is a preliminary version, having undergone only 200 steps of training, but further training is planned. Users can quickly start using the model by installing it from Huggingface and adjusting the inference code in to generate text based on custom prompts. The memory efficiency breakthrough is attributed to a strategy that splits attention into local blocks, influenced by InfiniAttention and the Transformer-XL paper, and was developed by Mustafa Aljadery, Siddharth Sharma, and Aksh Garg.

Alibaba’s Qwen2.5 - Alibaba Cloud has rolled out its latest large language model, Qwen2.5, boasting improved reasoning, code comprehension, and textual understanding over its predecessor. This new version has seen widespread adoption with over 90,000 deployments by various companies and has been utilized creatively across industries such as consumer electronics and gaming. The Qwen model outperforms OpenAI’s GPT-4 in certain aspects but lags in others, per an analysis by OpenCompass. Alibaba introduced its Tongyi Qianwen model in April 2023 and has since released updates and new models to the open-source community, also enhancing its generative AI platform, Model Studio. Over 2.2 million corporate users have accessed Qwen-powered AI services. This technology is contributing to the development of humanoid robots in China for practical applications in manufacturing and labor.

SambaNova Announces that the Fugaku-LLM is now a part of Samba-1 - SambaNova Systems has announced that Fugaku-LLM, a Japanese Large Language Model (LLM), is now part of Samba-1, their one trillion parameter generative AI model designed for enterprise use. This integration enhances the capabilities of Samba-1, which is built on a Composition of Experts (CoE) architecture that combines multiple specialist models for improved performance and accuracy.

Falcon 2 - Falcon 2 has been introduced as an open-source, multilingual, and multimodal AI model, boasting vision-to-language capabilities—a first in its class. It includes two versions: Falcon 2 11B, with 11 billion parameters, and Falcon 2 11B VLM, which features image-to-text conversion. In performance comparisons, Falcon 2 11B surpasses Meta's Llama 3 and matches the leading Google Gemma 7B, as confirmed by the Hugging Face Leaderboard. The models support various languages and are efficient enough to run on a single GPU. Upcoming enhancements will incorporate 'Mixture of Experts' to refine the models' capabilities further.

Awesome New Launches

Stability AI launches Stable Artisan - Stable Artisan is a user-friendly tool on Discord for creating digital images and videos, utilizing the advanced Stable Diffusion 3 model. With simple commands, users can generate, edit, and enhance images with features such as Search and Replace for object substitution without a mask, Remove Background for isolating subjects, Creative Upscale for converting low-res images to 4K, Outpaint for expanding image boundaries, Control Sketch for refining sketches into quality images, Control Structure for preserving input image structure, and Video for creating short clips from stills.

ElevenLabs is launching a new AI music generator - ElevenLabs is poised to disrupt the music industry with an advanced artificial intelligence music generator capable of producing complete tracks with natural-sounding vocals. The company, known for realistic AI voices, is expanding to include a suite of AI-generated sound effects and music. Although the full range of services is currently limited to internal preview, shared samples exhibit high-quality across diverse genres, indicating a significant step beyond competitors like Udio and Suno. As the AI music revolution unfolds, it carries both copyright concerns and potential as a novel tool for artistic exploration.

Krea AI introduces Krea Video - Krea Video uses both, Keyframes and Text prompts. Both can be added within a timeline and they define the final video.

Prompt Engineering in the Claude Console - Anthropic releases a tool to help you craft more effective prompts directly in the console.

Check Out My Other Videos:

Join the conversation

or to participate.