Google Cloud Next 2024 Shows Google Is Still Relevant

The Chip Wars Are Heating Up with New Entrants from Meta, Intel, and Apple

Google Cloud Next 2024 Keynote Unveils Grounded Responses, Parallel Processing, and Diverse Applications

Google Cloud Next 2024 highlighted the company's AI investments, focusing on its superior infrastructure and the introduction of Gemini 1.5. The keynote emphasized grounding large language models using Vertex AI, integrating models with enterprise data and Google Search for enhanced trustworthiness. Gemini 1.5 showcases exceptional performance in long context understanding and multi-modal capabilities, supported by Google's extensive infrastructure. The keynote demonstrated practical applications of Gemini 1.5, underlining Google's robust infrastructure as its core advantage in the AI domain.

Pinecone’s serverless vector database helps you deliver remarkable GenAI applications faster at up to 50x lower cost. Learn more about Pinecone serverless here.

  • Meta Debuts New AI Chip, Aiming to Decrease Reliance on Nvidia - Meta Platforms Inc. has announced the deployment of a new version of its in-house chip, Meta Training and Inference Accelerator (MTIA), to power its artificial intelligence (AI) services. The aim is to reduce the company's dependence on semiconductors from external providers like Nvidia Corp. Meta's shift towards AI services has led to an increased demand for computing power, and the company is investing heavily in AI infrastructure, including data centers and hardware. However, a significant portion of the spending is expected to still flow to Nvidia, which manufactures popular H100 graphics cards used in AI models. Meta is joining other tech giants, such as Amazon's AWS, Microsoft, and Alphabet's Google, in developing chips in-house to reduce their reliance on external providers. However, this transition will not be a quick fix, and Nvidia's AI accelerators remain in high demand.

  • Intel's latest AI chip is a direct shot at Nvidia's moneymaker - Intel has introduced its new AI chip, the Gaudi 3, claiming to outperform Nvidia's H100 AI processor in terms of both power efficiency and inferencing speed. The Gaudi 3 boasts up to 40% more power efficiency and 1.7 times faster training for large language models compared to the H100, giving a new option to major tech companies and hyperscalers that currently rely on Nvidia's chips. Amidst its own financial challenges, Intel is also expanding its chip manufacturing, including building chips for third-party customers like Microsoft. Alongside the Gaudi chips and new Xeon CPUs, Intel is promoting AI PCs with integrated neural processing units. They are also introducing a new form of Ethernet connectivity for Gaudi 3 nodes and providing reference designs for partners to build server cabinets, directly competing with Nvidia's connectivity solutions and server offerings.

  • Statement from President Joe Biden on CHIPS and Science Act Preliminary Agreement with TSMC - America is focussing on its recent measures to revitalize its semiconductor industry. Highlighting a decline from producing 40% of the world's chips to just over 10%, with no fabrication of the most advanced chips, raising economic and national security concerns. The White House, expressing a determination to reverse this trend, references the CHIPS and Science Act as an integral part of their Investing in America agenda, which has spurred semiconductor manufacturing and job growth in the U.S. It showcases a new deal with the Taiwan Semiconductor Manufacturing Company (TSMC) to establish leading-edge semiconductor production facilities in the U.S, particularly in Phoenix, Arizona, with a projected $65 billion investment and creation of over 25,000 jobs. The facilities aim to boost U.S. production of the most advanced chips to 20% by 2030, with additional CHIPS funding for local workforce training. The broader narrative is one of renewed commitment to "Made in America" semiconductor manufacturing with the support of top U.S. tech firms.

  • AI race heats up as OpenAI, Google and Mistral release new models - OpenAI, Google, and French AI startup Mistral all unveiled new AI models, intensifying the industry's competitive landscape. In a rapid sequence of events, Google released Gemini Pro 1.5 with multimodal capabilities, followed closely by OpenAI's GPT-4 Turbo, also multimodal, and Mistral's open-source Mixtral 8x22B. Meanwhile, Meta plans to launch its Llama 3 AI model, initially in less powerful versions. Amidst the rush to market the most advanced AI, concerns are raised over the potential dangers of open-source models and ethical implications. Experts, including Meta's Yann LeCun, question the current trajectory of AI development, advocating for a shift towards "objective-driven" AI that reasons and plans, rather than just processing language.

  • OpenAI and Meta ready new AI models capable of ‘reasoning’ - Meta and OpenAI are preparing to release new AI models capable of reasoning and planning, which is a significant step towards achieving superhuman cognition in machines. Meta's Llama 3 and OpenAI's GPT-5 are expected to be released soon, with the latter making significant strides in solving "hard problems" of reasoning. These new models will enable AI to perform complex, multi-step tasks, and plan sequences of actions, making them crucial milestones towards achieving AGI (Artificial General Intelligence). The race for AI models with reasoning capabilities is heating up, with Google, Anthropic, and Cohere also working on their own upgraded large language models. However, the development of AI models with reasoning capabilities also raises concerns about bias, transparency, and ethical use.

  • Apple plans to overhaul entire mac line with AI-focused M4 Chips - Apple plans to overhaul its Mac line with new M4 chips, which will highlight AI capabilities and bring memory improvements. The revamp aims to boost sluggish computer sales and improve AI capabilities. The new computers, including iMacs, MacBook Pros, and Mac minis, are expected to be released beginning late this year, with new AI features and support for more memory. The move is part of a broader push to weave AI capabilities into all Apple products.

  • New bill would force AI companies to reveal use of copyrighted art - The Generative AI Copyright Disclosure Act, spearheaded by Congressman Adam Schiff, proposes to make AI firms disclose copyrighted material used in training their generative models. This would mandate pre-release disclosure to the Copyright Office and carries financial penalties for non-compliance. The bill aims to address the legal and ethical use of copyrighted content amidst an increase in litigation and government scrutiny on AI companies. It has gained support among entertainment industry entities and unions. This comes against a backdrop of contention, with companies like OpenAI facing lawsuits for their training practices, which they defend as fair use. The outcome could significantly impact both artist rights and AI industry practices.

  • Adobe Is Buying Videos From Creators To Build A Rival For OpenAI's Sora - Adobe is actively developing an AI-driven text-to-video generator to challenge OpenAI's Sora. To ethically train its AI, Adobe is purchasing video clips from creators, focusing on actions and emotions, for a training dataset. Adobe's existing AI features, which allow the creation of visuals from text prompts, have seen significant usage. OpenAI faces scrutiny over claims of using YouTube content inappropriately for training its Sora model, which could breach YouTube's terms and has sparked widespread debate over AI training practices. YouTube CEO Neal Mohan emphasizes the protection of creators' rights, while OpenAI, heavily invested in by Microsoft, has yet to publicly address these allegations.

  • AI 50: Companies of the Future - The AI 50 list of 2023 highlighted the growing importance of generative AI, with U.S. venture funding focusing on AI infrastructure and large language models. Enterprises are increasingly integrating AI into their processes, resulting in significant performance improvements and cost reductions. The future of AI points towards transformative user experiences and interfaces, as well as advancements in creative software and industrial applications. AI is expected to drive a productivity revolution, requiring sophisticated infrastructure and tailored enterprise products to address challenges in knowledge management, content generation, trust, safety, and authentication. The AI 50 list of 2024 reflects AI's expanding influence, emphasizing the need for responsible implementation to minimize job loss and promote job creation as AI transforms cost structures and productivity across various sectors.

  • Gaming finds a new role as AI’s open ethics lab - Songyee Yoon, President and Chief Strategy Officer at NCSOFT, discusses the role of gaming as a testbed for AI technologies, emphasizing the importance of responsible AI integration and diverse perspectives in crafting AI rules and mechanics. Yoon's new book, "Push Play: Gaming For a Better World," explores gaming's potential to drive positive change and the need for inclusive, safe environments. NCSOFT, a leading game developer, has recognized AI's potential in game production, developing large language models for digital humans, generative AI platforms, and conversational language models. Yoon's background in synthetic characters at MIT equips her to lead discussions on responsible AI integration in gaming and beyond.

  • The Worst Part of a Wall Street Career May Be Coming to an End - The traditional tasks of investment banking analysts, notorious for their grueling hours spent on presentations, calculations, and complex documents, are now threatened by generative AI. This technology is poised to revolutionize Wall Street by performing tasks quickly and efficiently, potentially reducing the need for human analysts. Julia Dhar of BCG's Behavioral Science Lab notes that as AI takes on work once done exclusively by entry-level finance professionals, banks face a critical question: will they need fewer analysts? The industry, resistant to change for decades, finds itself at a turning point as AI capabilities expand into areas previously dominated by human expertise.

  • South Korean invests $7BN in AI in bid to retain edge in chips - South Korea is investing $7 billion in AI to maintain its competitive edge in the semiconductor industry, with a focus on AI research, infrastructure development, and talent cultivation. The country aims to position itself as a global leader in AI, with a thriving startup scene and support from IT corporations like Samsung and LG. The government is also investing in AI-related education and has a strategy that includes a focus on AI ethics, data, and AI convergence with other technologies.

Awesome Research Papers

Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs - The paper proposes a new multimodal large language model (MLLM) for mobile UI understanding, addressing the limitations of existing MLLMs in this area. Ferret-UI features referring, grounding, and reasoning abilities, with an "any resolution" feature for detail enhancement. It's trained on a curated dataset for advanced UI tasks. The model outperforms most open-source UI MLLMs and even GPT-4V on elementary UI tasks, showcasing its superior understanding of mobile UI screens.

CodeGemma: Open Code Models Based on Gemma - CodeGemma is a collection of specialized open code models developed by Google LLC, based on the Gemma models, which are further trained on over 500 billion tokens of primarily code. The models are designed for a variety of code and natural language generation tasks and are released in three checkpoints, including 7B pretrained and instruction-tuned variants and a 2B model for fast code infilling and open-ended generation in latency-sensitive settings. CodeGemma models excel in mathematical reasoning, match code capabilities of other open models, and achieve state-of-the-art code performance in both completion and generation tasks, while maintaining strong understanding and reasoning skills at scale. The models are evaluated on offline evaluations, and the results show that CodeGemma outperforms other models in code completion and generation tasks.

Speech to Text: Leaderboard & Comparison - Artificial Analysis analyzes and compares speech-to-text transcription models and API providers. The leaderboard includes various models such as Whisper (L, v2), OpenAI, Universal-1, AssemblyAI, Speechmatics Standard, Azure, Azure Speech Service, Incredibly Fast Whisper, Replicate, Whisper (L, v3), WhisperX, Whisper (M), Whisper (S),, Amazon Transcribe, Rev AI, and Chirp, Google. The analysis is based on characteristics like word error rate (lower is better), speed, and price.

RULER: What's the Real Context Size of Your Long-Context Language Models? - The RULER benchmark is introduced as a sophisticated tool for evaluating long-context language models (LMs), addressing the limitations of the needle-in-a-haystack (NIAH) test which only assesses basic information retrieval abilities. RULER offers customizable sequence lengths and task complexities, with additional task types like multi-hop tracing and aggregation that go beyond mere context searching. Upon testing ten LMs, including high-performing models like GPT-4 and Yi-34B, a significant performance decline is observed as context lengthens, despite these models' claims to handle contexts over 32K tokens. RULER, now open sourced, challenges LMs with inputs up to 200K tokens, highlighting substantial potential for improvement.

Patchscopes: A unifying framework for inspecting hidden representations of language models - The paper discusses the increasing necessity to understand the inner workings of large language models (LLMs), emphasizing the value of examining their hidden representations. Addressing issues like accuracy and transparency, it introduces Patchscopes—a novel framework that uses LLMs to explain their own internal processes in natural language. Patchscopes aims to deepen this understanding, representing an evolution of existing interpretability methods. It facilitates analysis of how LLMs comprehend input and can aid in correcting reasoning errors. While focusing on natural language and autoregressive Transformer models, the framework also has potential for broader applications, including addressing model hallucinations and exploring multimodal representations.

GPT-4 Turbo - GPT-4 Turbo with Vision is now generally available in the API. Vision requests can now also use JSON mode and function calling. 

Mistral 8x281GB - Mistral launched a large language model (LLM) called Mixtral 8x22B, which is a 281GB model designed to rival OpenAI, Meta, and Google. This new model comes with a 65,000-token context window, enabling the AI model to recall precise information from large documents. Mistral AI claims that Mixtral 8x22B ranks second after GPT-4 based on several benchmarks. The company also offers a chat assistant called Le Chat, which is currently available in beta and supports English, French, Spanish, German, and Italian.

Awesome New Launches

Udio - Udio is a new AI music generator offering free access during its open beta phase, founded by former Google AI-DeepMind employees and backed by tech and music industry leaders. The tool produces crisp audio with customization options, and the company is working on improvements such as longer samples, better sound quality, and more control options.

Morphic - An AI-powered answer engine with a generative UI. Also,

Build generative AI experiences with Vertex AI Agent Builder - Google's Vertex AI Agent Builder is a cutting-edge development platform enabling the creation and deployment of generative AI (gen AI) agents for both developers and organizations. This tool integrates Vertex AI Search and Conversational products with additional developer resources, geared towards facilitating the rapid development of AI solutions while addressing cost, governance, and scalability challenges. Vertex AI Agent Builder is designed to cater to various developer skill levels, offering a no-code console for novices and support for sophisticated open-source frameworks for experienced builders.

Archetype AI introduces foundation model to pioneer physical AI - Archetype AI, a physical AI company, has emerged from stealth with its first customers and backing from Venrock, Amazon Industrial Innovation Fund, and Hitachi Ventures. The company has introduced Newton, a first-of-its-kind foundation model that understands the physical world by fusing multimodal temporal data with natural language. This allows users to ask open-ended questions about their surroundings and take informed actions based on the insights provided. The model is designed to scale across any sensor data, targeting the "trillion sensor economy" and delivering solutions across various industries, including automotive, consumer electronics, construction, logistics, and retail. Archetype AI's mission is to harness the power of AI to solve real-world problems by decoding the hidden patterns in the physical world that are too complex or fast-moving for humans to perceive.

Leonardo.AI introduces Leonardo for Teams - Leonardo.Ai has introduced "Leonardo for Teams," a beta-release AI suite designed to enhance collaborative creativity in enterprises by allowing team members to work simultaneously on visual assets. This platform not only facilitates real-time asset review but also supports private sharing of customizable models that can be collectively trained and used. These models are based on fully licensed data, respecting rights-holders, and include copyright protection. With a user base exceeding 15 million, Leonardo.Ai targets a wide range of industries and follows a significant funding round. COO Jachin Bhasme highlighted the shift of generative AI into a commercial space, emphasizing the tool's potential to revolutionize creative processes in businesses.

Google Vids - Google has introduced Vids, an application designed to facilitate the creation of shareable and collaborative videos with ease. Vids is intended for professional rather than aesthetic purposes, focusing on tasks such as pitches, team updates, and the communication of complex ideas. Vids transforms the Google Slides experience into a video format, allowing users to compile assets into a video timeline, add voiceovers, and self-filmed content without requiring video production skills. Users can collaborate on video creation and editing, mirroring the functionality of Google's other productivity tools.

Check Out My Other Videos:

Join the conversation

or to participate.