Apple AI Finally Shows Its Face

Smarter Siri, ReaLM, and more

Expect a smarter, context-aware Siri at WWDC 2024 powered by AI

Apple claims its new on-device AI system called ReaLM substantially outperforms OpenAI's GPT-4 in reference resolution tasks for conversational AI assistants like Siri. ReALM takes a novel approach by converting on-screen content, conversational context, and background processes into text that can be processed by large language models. The key advantage is that ReALM can run entirely on-device without relying on cloud processing, improving privacy and efficiency while still achieving high performance. Apple is expected to unveil AI enhancements leveraging ReALM at its Worldwide Developers Conference in June 2024 as part of improving Siri's capabilities.

  • Anthropic researchers wear down AI ethics with repeated questions - Anthropic researchers have discovered a new "jailbreak" technique for large language models (LLMs), known as "many-shot jailbreaking." This method can manipulate a model with a large context window—capable of retaining thousands of words—into providing inappropriate responses, such as instructions to build a bomb, after being primed with numerous less harmful questions. The phenomenon occurs due to "in-context learning," where the model gets progressively better at answering questions similar to those in its current context. Researchers don't fully understand the process yet but have shared their findings with peers to encourage openness and collaborative mitigation efforts. Despite the potential of reducing the context window to counteract this issue, it adversely affects performance. As a countermeasure, Anthropic is looking into query classification and contextualization prior to model processing, knowing well that AI security is a constantly moving target.

  • US and UK sign landmark agreement on testing safety of AI - The US and UK have established a significant agreement on artificial intelligence, marking the first formal bilateral cooperation aimed at evaluating and mitigating risks from advanced AI technologies. This partnership seeks to exchange knowledge and research personnel between the UK's AI Safety Institute and a similar yet-to-be-established institution in the US, enhancing their ability to assess private AI models from major tech companies. The collaboration, reflective of an existing intelligence-sharing model between GCHQ and NSA, underlines the commitment of both nations to confront potential threats to national security and society posed by AI, without broad regulations in the immediate future. The UK's AI Safety Institute, key to Prime Minister Sunak's strategy for the UK's leadership in AI safety, has begun testing AI models for risks, supported by industry commitments to transparency, focusing on issues like cybersecurity and election integrity.

  • I tried the new Google. Its answers are worse - The new "experimental" Google, known as the Search Generative Experience (SGE), integrates AI capabilities similar to a chatbot into the traditional search engine. This AI-powered Google provides direct answers combined with links but is criticized for sometimes being inaccurate, verbose, reliant on outdated or low-quality sources, and even fabricating information. Analysis from SEO firms reveals SGE often prioritizes different sources than conventional search results, which could affect web traffic and revenue for many sites. Google acknowledges SGE's issues, including "hallucinations" or made-up facts, and is working to address them. The company has yet to specify when SGE will become widely available, emphasizing the importance of balancing speed, accuracy, helpfulness, and maintaining trust in Google's brand.

  • Opinion | How Should I Be Using A.I. Right Now? - Ezra Klein grapples with the paradox of recognizing AI's transformative potential versus his difficulty in utilizing it in his everyday workflow. Seeking insights, he turns to Wharton School professor Ethan Mollick, who has extensive experience with AI and shares his knowledge in both a newsletter and a forthcoming book, "Co-Intelligence: Living and Working With A.I." Mollick suggests approaching AI as a co-creative relationship rather than a mere tool and underscores the uncharted nature of AI applications, even among top AI firms like OpenAI and Anthropic. The episode delves into practical advice on choosing and interacting with chatbots, as well as the broader and sometimes surprising implications of AI on work and creativity. The podcast episode is available across major platforms and is part of “The Ezra Klein Show,” featuring a production team led by Kristin Lin and supported by various editors and strategists.

  • Microsoft is working on an Xbox AI chatbot - Microsoft is piloting an AI-driven Xbox chatbot for customer support, designed to answer queries and process refunds using natural language. The bot, part of a broader push within Microsoft Gaming to integrate AI, fields support questions on the Xbox network and ecosystem, with current internal testing including Minecraft Realms service queries. Additionally, Microsoft is exploring AI in game development, operations, content moderation, and potentially in-game assistance. While the company cautiously approaches the public reveal of these AI endeavors, the initiative aligns with CEO Satya Nadella's drive to infuse AI across Microsoft's products and services, and with Xbox's vision of 'AI innovation' as part of its future hardware advancements.

  • Microsoft, Quantinumm claim breakthrough in quantum computing - Microsoft and Quantinuum announced a breakthrough in quantum computing by demonstrating the most reliable logical qubits ever, running over 14,000 experiments without errors by applying Microsoft's error correction system to Quantinuum's ion-trap hardware. This moves quantum computing beyond the noisy intermediate-scale era to "Level 2 Resilient" capabilities, a crucial milestone toward building hybrid supercomputers that could solve scientific and commercial problems intractable for classical computers. With 100 reliable logical qubits, organizations could see scientific advantages, while scaling to 1,000 would unlock commercial advantages, potentially lopping years off the timeline.

  • Why YouTube could give Google an Edge in AI - Google and OpenAI are leveraging YouTube's vast video data to train their latest large language models like Google's upcoming Gemini and OpenAI's GPT-4. As YouTube's owner, Google has unparalleled access to this multimodal dataset of videos, transcripts, and metadata - potentially giving it an edge over rivals in developing more advanced AI capabilities like video understanding and generation. However, the full potential of models trained on such video data remains uncertain and in "uncharted territory" according to experts.

  • How Adobe’s bet on non-exploitative AI is paying off - Adobe's AI team, interviewed by MIT Technology Review, emphasizes the importance of considering the ethical implications of creating new technology. They introduce Firefly, an AI model tailored for creator fairness and legal certainty, rejecting the widespread practice of scraping web data for AI training. This decision responds to the creative community's concerns over AI-generated content and copyright infringement. Adobe's approach involves using licensed content from its stock library for Firefly's training and providing additional compensation to creators. This strategy acknowledges the labor behind AI advancements and contrasts with the industry's current contentious data practices, which often involve unvetted datasets containing copyrighted and sensitive material.

  • YouTube says OpenAI training Sora with its videos would break rules - YouTube CEO Neal Mohan stated that using YouTube videos to train OpenAI's text-to-video generator Sora would violate the platform's terms of service. Mohan said he had no direct knowledge if OpenAI used YouTube videos for Sora, but if they did, it would be a "clear violation" of YouTube's rules against downloading video content. He emphasized that creators have expectations that their work on YouTube will be protected under the terms of service they agreed to.

Tensorflow Probability Spinoffs AutoBNN - AutoBNN is a TensorFlow Probability library that provides high-level APIs for automatically constructing Bayesian neural network models tailored for time series forecasting tasks. It includes implementations of different probabilistic model architectures that can learn temporal patterns and seasonality in data, while capturing model uncertainty through Bayesian inference. The library aims to simplify the application of Bayesian neural networks, which can provide more robust predictions compared to traditional neural network approaches, to time series forecasting problems.

SWE-agent - SWE-agent is an open-source AI agent developed by Princeton NLP researchers that can turn language models into autonomous software engineering agents capable of fixing bugs and resolving issues in real GitHub repositories. It achieves comparable accuracy to proprietary systems like Devin AI from Anthropic on the SWE-bench benchmark, while being significantly faster. Being open-source, SWE-agent allows developers to access, customize, and contribute to its codebase, fostering collaboration and further improvements in AI-assisted software development.

Awesome Research Papers

ReALM: Reference Resolution as Language Modeling - The paper proposes using large language models for reference resolution by encoding the user's screen contents and potential referenced entities as natural text. This textual representation captures the spatial relationships between on-screen elements. Experiments show this approach allows language models to effectively identify which entities are being referred to, outperforming traditional reference resolution systems. Potential applications include improving digital assistants and multimodal interfaces by enabling language models to better understand references in contexts involving both text and visual elements.

ChatGPT "contamination": estimating the prevalence of LLMs in the scholarly literature - This study examines the presence of Large Language Model (LLM)-generated text, such as ChatGPT, in academic papers published in 2023. By tracking keywords frequently used by LLMs, the research suggests a noticeable rise in their usage. The findings estimate that over 1% of scholarly articles, amounting to approximately 60,000 papers, may have been composed with LLM assistance. The study implies that this number could increase upon further examination by identifying new indicative keywords or additional paper characteristics hinting at LLM involvement.

A Builder's Guide to Task-Specific Evals for LLM Applications - This extensive guide provides practical evaluation metrics and methods for builders utilizing Large Language Models (LLMs) across various applications like classification, summarization, and translation, going beyond standard metrics to measure recall, precision, consistency, relevance, and more. It addresses evaluating risks such as copyright regurgitation and toxicity, emphasizing the importance of human evaluation and calibrating the benchmark to balance risk and benefits. The author advocates for starting with a 'minimum lovable product' and iteratively improving, challenging an overemphasis on zero risk, and aims to make the resource accessible for those without deep expertise in data science or machine learning.

The 2024 Machine Learning, AI & Data Landscape - The website analyzes the 2024 landscape in data, machine learning, and AI, highlighting the coalescence of trends enabling powerful AI models and the mainstream adoption of the MAD ecosystem with societal implications. It features 2,011 companies, examines structural shifts across the MAD landscape reflecting evolving tech and market demands, and presents 24 themes, industry trends, data pipeline distinctions, and financial activities. The analysis questions the Modern Data Stack's obsolescence, explores AI-enabled applications' relevance, and discusses prominent players and market dynamics in the progressing AI industry.

Introducing Command R+: A Scalable LLM Built for Business - Command R+ is an advanced large language model tailored for enterprise applications, offering enhanced efficiency and precision particularly in a business context. Key features include a 128k-token context window, superior performance in Advanced Retrieval Augmented Generation (RAG) reducing "hallucinations," and widespread multilingual support across 10 significant languages. The model promises to automate intricate business processes and has made significant strides in RAG use cases, citation accuracy, and multi-step tool integration. It has a strong collaboration with Microsoft Azure to fast-track AI adoption in business environments. Command R+ is accessible first on Azure and will also be available on other cloud services and through Cohere’s API. Notably, it commits to data privacy and security.

Awesome New Launches

Introducing Stable Audio 2.0 - Stable Audio 2.0 introduces an advanced AI music generation tool capable of creating full-length tracks up to three minutes with coherent structures from text prompts. The updated model now features audio-to-audio capabilities, allowing users to upload and transform audio samples based on natural language instructions. This expands its utility in generating diverse sounds and style transfers. Building on the earlier 1.0 version, known for its high-quality 44.1kHz music production, Stable Audio 2.0 enhances artistic flexibility and is available for free on their website, with API access forthcoming.

GroqCloud - Groq API facilitates the programmatic use of models for specific operations using structured JSON. Three models support tool use (llama2-70b, mixtral-8x7b, and gemma-7b-it), which enable various applications, mainly natural language processing tasks like converting natural language into API calls, automating external API calls for data gathering and alerts, and parsing resumes to extract and format data. An example showcases initiating the Groq client, setting up conversations and tools, defining a dummy function for sports scores, processing model requests, and managing function responses.

Agent API now available: bringing actions to your devices and applications — MultiOn AI - The Agent API, now available to developers, allows for the integration and embedding of AI agents into devices and applications. These AI agents can autonomously perform tasks and workflows on the web, either as background assistants or user-facing interfaces. They are versatile in function, catering to implementations in smart devices, websites, enterprise SaaS, and more. The API facilitates task automation and customer assistance across various sectors like smart device integration, shopping and travel services, financial services, and government platforms.

Replit Code Repair - Replit is integrating AI tools into their IDE to create models optimized for coding tasks like automated code repair by fine-tuning large language models on data from their platform. Their first effort utilizes the Language Server Protocol diagnostics to train a 7B parameter model that performs competitively on code repair benchmarks compared to larger models. With plans to scale up the data and model size, as well as expand to more languages, Replit aims to provide practical AI-powered developer experiences natively integrated into their environment.

AssemblyAI Launches Universal-1 - AssemblyAI has launched Universal-1, a groundbreaking speech-to-text AI model, achieving unprecedented accuracy in multiple languages including English, Spanish, French, and German. The model, which outperforms competitors such as OpenAI's Whisper and NVIDIA's Canary, is trained on over 12.5 million hours of multilingual audio data. It is touted as the most accurate and powerful speech recognition model to date, being 13.5% more accurate and 30% less error-prone than its closest competitors. Additionally, Universal-1 has been fine-tuned with 1.62 million hours of human and pseudo-labeled data, setting a new benchmark in the field of automatic speech recognition. Notably, it beats OpenAI Whisper and costs $0.37 to process an hour of audio.

Check Out My Other Videos:

Join the conversation

or to participate.