AIdeations
Posts
Voice Cloning, Public Domain Art, and Chatbot Innovations

Voice Cloning, Public Domain Art, and Chatbot Innovations

Exploring the Latest Breakthroughs in AI from Voice Technology to Historical Art Revivals

Brent Moreno
January 03, 2024

TL;DR 📌:

OpenVoice's Global Tech Breakthrough: A new open-source voice cloning algorithm revolutionizes voice technology, offering unprecedented precision and emotional tone control.
Mickey Mouse in the Public Domain: Early Mickey Mouse animations inspire a surge in AI-generated art, blending history with modern technology.
Microsoft's Copilot AI on iOS: Microsoft introduces its AI chatbot on iOS, competing with ChatGPT and offering unique image generation capabilities.
Midjourney's Multi-Prompting Guide: Discover advanced techniques in AI image generation with a technical guide on multi-prompting in Midjourney.
Prompt Engineering in AI: Essential knowledge for enhancing AI interactions, focusing on the art of asking the right questions.
AI's Impact on Influencers: AI influencers are challenging the traditional influencer market, striking deals with luxury brands.
Text Embeddings with Large Language Models: Microsoft's research introduces an efficient method for creating text embeddings, a cornerstone of NLP tasks.
AI's Exponential Growth in 2024: A video analysis discussing why AI's growth is skyrocketing in various fields.
Innovative AI Tools: Highlighting AI tools that transform user engagement, goal achievement, and time management.
Comparing AI Solutions: A direct response copywriting exercise to differentiate a product from its competitors in the AI space.

OpenVoice: Pioneering Open-Source AI Voice Cloning Unveiled by Global Tech Collaboration

Today, we proudly open source our OpenVoice algorithm, embracing our core ethos - AI for all.
Experience it now: app.myshell.ai/bot/z6Bvua/170…. Clone voices with unparalleled precision, with granular control of tone, from emotion to accent, rhythm, pauses, and intonation, using just a… twitter.com/i/web/status/1…
— MyShell (@myshell_ai)
2:00 PM • Jan 2, 2024

The AI world is buzzing with the latest breakthrough in voice cloning technology, and it's all thanks to a collaboration that spans continents. Imagine being able to mimic any voice, from the comfort of your own laptop, and with just a small audio clip. That's the magic of OpenVoice, a brainchild of MIT, Tsinghua University, and the Canadian AI startup MyShell. This open source platform is changing the game, offering near-instantaneous voice cloning with a level of precision and control that's just not seen in other tools.

Here's the kicker: OpenVoice is incredibly user-friendly. In my own little experiment on HuggingFace, I was blown away by how quickly I could create a clone of my voice - it was a bit robotic, sure, but impressively accurate and fast. The real fun began when I played around with the emotional tones. Want your clone to sound cheerful? Sad? Angry? Just a click away. It's like having a mood ring for your voice.

Diving into the nuts and bolts of OpenVoice, it's a marvel of AI engineering. The team, comprising talents like Zengyi Qin from MIT and MyShell, Wenliang Zhao and Xumin Yu of Tsinghua University, and Xin Sun of MyShell, developed two AI models for this feat. One handles the speech nuances like intonation and rhythm, trained on thousands of sentences in various languages and emotions. The other is a tone converter, fed with a massive dataset of over 300,000 audio samples. Together, they can not only replicate a voice but also infuse it with different emotional flavors.

But here's the million-dollar question: How does MyShell plan to profit from this open-source wonder? The answer lies in their business model, which is as innovative as their technology. They're banking on a subscription model for their web app, targeting both regular users and third-party bot creators. And let's not forget the charges for AI training data. It's a clever play in the open-source arena, balancing accessibility with commercial viability.

OpenVoice is a testament to the power of global collaboration and open-source innovation. It's reshaping the landscape of voice cloning, making it more accessible, versatile, and, frankly, a whole lot of fun. Whether you're a tech enthusiast, an AI researcher, or just someone who loves to play around with new gadgets, OpenVoice is something you've got to check out.

Early Mickey Mouse Enters Public Domain, Inspiring New Wave of AI-Generated Art

The entry of early Mickey Mouse cartoons into the public domain has sparked a new wave of AI experimentation. As of January 1, three iconic 1928 animations featuring Mickey Mouse are no longer under copyright restrictions in the US. Pierre-Carl Langlais, a digital humanities researcher, quickly harnessed this opportunity by releasing an AI model on Hugging Face, trained on these newly public domain cartoons. This model allows users to generate new images based on 1928-era Mickey, Minnie Mouse, and Peg Leg Pete, marking a significant moment in the blending of AI and creative expression.

Langlais's model is developed using a subset of stills from "Steamboat Willie," "Plane Crazy," and "The Gallopin' Gaucho," fine-tuning Stable Diffusion XL to adhere to the 1928 designs. While aiming to ensure that generated images stay within the public domain, Langlais acknowledges the ongoing evolution of this project. The quality and fidelity of the images to their 1928 counterparts are areas of active development, reflecting the challenges and potential of AI in replicating historical art styles.

The response on social media platforms like Bluesky has been a mix of creativity and controversy. Users are experimenting with the AI model to produce images of Mickey Mouse in various unexpected and sometimes provocative scenarios. These user-generated images range from humorous parodies to more controversial depictions, highlighting the complex interplay between AI-generated content and the boundaries of fair use and copyright law.

This development in AI and copyright highlights the nuances and legal uncertainties surrounding AI-generated content. While the use of 1928 Mickey Mouse cartoons is now legally permissible for AI training, the broader implications of incorporating copyrighted materials into AI models remain unresolved. This situation exemplifies the ongoing challenges in defining the legal and ethical boundaries of AI in creative domains, especially as technologies and their applications continue to evolve rapidly.

Microsoft Launches Copilot AI on iOS

Microsoft's latest foray into AI chatbots, Microsoft Copilot, is now gracing the screens of iOS and iPadOS users. Available for free on the App Store, Copilot emerges as a direct competitor to the popular ChatGPT, offering similar functionalities in generating text and images. One of its notable features is the integration with DALL-E3, enabling it to generate images from text prompts, a capability that adds a creative edge to the user experience.

Diving into the technicalities, Copilot operates on GPT-4, a more advanced version than the GPT-3.5 powering the free version of ChatGPT. This advancement theoretically allows Copilot to produce more sophisticated human-like responses. However, it's not all smooth sailing for Copilot. The app imposes a limit of about thirty responses per user in a thread and requires an account for image generation access. In terms of user interaction, Copilot seems to churn out lengthier responses compared to ChatGPT, though the latter appears to provide more human-like answers, suggesting a more refined search capability.

Exploring the usability of both apps, there isn't a significant difference in response speeds. Both Copilot and ChatGPT demonstrate promptness, with ChatGPT having a slight edge in quickness. A practical test involving composing an email to a distant friend revealed distinct approaches: Copilot offered a template with various personalization options, while ChatGPT provided a more composed, albeit less customizable, response.

At this stage, both Microsoft Copilot and ChatGPT have their unique strengths and limitations on Apple’s iOS platform. The major drawback for Copilot is its response limit, which could be a deciding factor for users accustomed to more extensive interactions. As the AI chatbot space continues to evolve, it remains to be seen whether future updates to Copilot will address these limitations and potentially shift the balance in its favor.

Mastering Multi-Prompting in Midjourney: A Technical Guide

Multi-prompting in Midjourney, as detailed in a Twitter thread by user Nick St. Pierre, is a powerful tool for AI image generation, enabling users to intricately combine different concepts within a single prompt. This technique, though not new, is invaluable for those looking to push the boundaries of digital creativity. By using a double colon '::', users can separate ideas within their prompts, allowing Midjourney to interpret and blend them individually before creating a unified image. This method offers a unique approach to AI image generation, where distinct elements like 'Eggplant' versus 'Egg::plant', or 'Headlight' versus 'Head::light' can lead to vastly different visual results.

One of the most critical aspects of multi-prompting is the use of 'weight'. This feature lets users control the influence each part of the prompt has on the final image. By appending a numerical weight right after the double colon, the relative importance of each concept can be adjusted. For example, changing the prompt from 'egg:: plant' to 'egg::2 plant' alters the generated image by giving more prominence to 'egg' than 'plant'. This flexibility allows for precise control over the image's composition, making the tool incredibly versatile for various creative applications.

Understanding weight normalization in Midjourney is crucial for effective multi-prompting. Regardless of the numbers used, the AI system normalizes the weights, ensuring a balanced distribution of influence across the prompt components. It means that 'egg:: plant' is effectively the same as 'egg::1 plant' or 'egg::100 plant::100'. This feature simplifies the process, making it more accessible and user-friendly, as users don't need to worry about complex calculations to maintain balance in their prompts.

In conclusion, multi-prompting in Midjourney is a sophisticated feature that opens up new avenues for artistic expression in AI image generation. Its ability to separate and blend concepts, coupled with the control offered by weighting, provides users with unprecedented creative freedom. Whether for professional design work or personal projects, mastering multi-prompting can lead to more nuanced and impactful visual creations, making it a valuable technique in the toolkit of any digital artist or AI enthusiast.

Take Your AI Know-How to the Next Level: Here's What You Need to Know About Prompt Engineering

Asking AI the right questions is more important than ever.

Microsoft's Copilot AI app expands to iPhone, iPad - here's what you can do with it

The app offers you a generative AI chatbot powered by the latest AI models, namely GPT-4 and DALL-E 3.

The biggest science breakthroughs to look out for in 2024

Science in 2024 will see cutting-edge advances in everything from AI to health and space exploration. Here's what we have in our diary.

Instagram influencers, watch out: AI is already coming for your jobs

AI influencers are striking six-figure deals with luxury brands trying to cut marketing costs. It's making human influencers antsy, the FT reported.

Forget Horror Steamboat Willie, There's A Mickey Mouse-like Boomer Shooter

Mouse is back in the spotlight now that Steamboat Willie is in the public domain.

Soon, every employee will be both AI builder and AI consumer

"Standardized tools and platforms as well as advanced low- or no-code tech may enable all employees to become low-level engineers," suggests a recent report.

Trick ChatGPT to Say its Secret Prompts

Title:

Improving Text Embeddings with Large Language Models

Authors:

Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, Furu Wei (Microsoft Corporation)

Executive Summary:

This paper introduces an innovative and streamlined approach for creating high-quality text embeddings using synthetic data generated by Large Language Models (LLMs) like GPT-4. This method stands out for its simplicity and efficiency, requiring less than 1,000 training steps and no reliance on complex training pipelines or manually collected datasets. The authors use synthetic data for a wide range of text embedding tasks across nearly 100 languages. They fine-tune open-source decoder-only LLMs with this synthetic data, employing standard contrastive loss. The results show strong performance on competitive text embedding benchmarks such as BEIR and MTEB, even without using any labeled data. When combined with labeled data, the model sets new state-of-the-art results.

Pros:

1. Simplicity and Efficiency: Streamlined process requiring fewer training steps.

2. Independent of Labeled Data: Capable of achieving competitive results without labeled datasets.

3. Diversity and Multilinguality: Covers a wide range of tasks and nearly 100 languages.

4. State-of-the-Art Performance: Achieves leading results on major text embedding benchmarks.

Limitations:

1. Dependency on Proprietary LLMs: Relies on proprietary models like GPT-4 for generating synthetic data.

2. Potential Overfitting to Synthetic Data: The approach might overfit to the characteristics of synthetic data, affecting generalization to real-world data.

Use Cases:

- Creating robust text embeddings for various NLP tasks such as information retrieval, question answering, and semantic textual similarity.

- Language modeling and embedding tasks across a multitude of tasks and languages.

Why You Should Care:

This research is significant for the NLP community, offering a more efficient method for developing text embeddings, a fundamental component in many NLP applications. By leveraging the power of LLMs to generate synthetic data, this approach reduces the dependency on large, labeled datasets, which are often resource-intensive to collect and may lack diversity. The ability to cover a wide range of languages also makes it a promising solution for multilingual applications. The method's capacity to achieve state-of-the-art results further underscores its potential impact in advancing NLP research and applications.

Behavly - Maximize your website's potential with targeted tweaks. Leverage subtle, science-backed adjustments for enhanced user engagement and higher conversion rates

Socra - Helping you crush your goals with the power of AI. Start your journey today.

Heydai - Do you know where your time goes? AI-powered daily planner & time tracker app

Construct - Construct uses AI to transform your words and ideas into functional apps you can share with your friends.

Scriptit - No-code platform that lets you easily build complex AI workflows for your business.

Momask - Generate text-to-motion 3D human animations

Dismissing Other Solutions:

Today, I need your assistance as a master direct response copywriter. Our goal is to demonstrate to our target audience why 10 popular and widely-sold alternative solutions they may consider buying to address their problem are not as effective as our product.


To achieve this, please follow these steps:


1) Create a list of 10 competitors' products that claim to solve the target audience problem.

2) Craft a 300-word argument for each alternative solution, explaining the downsides of the product. Each argument should be structured as follows:

1) Create a list of 10 competitors' products that claim to solve the target audience problem.

2) Craft a 300-word argument for each alternative solution, explaining the downsides of the product. Each argument should be structured as follows:

a) Introduce the alternative solution, its claim and how it works.

b) Write a complete and detailed 150-word argument highlighting the drawbacks or limitations of the alternative solution. Elaborate on multiple aspects to ensure the word count is met. Use real-life examples, statistics, and relatable scenarios to strengthen your points. Address the key pain points and challenges that users may face with the alternative solution.

c) Write a complete and detailed 150-word argument emphasizing the benefits of our product over the alternative solution. Elaborate on multiple aspects to ensure the word count is met. Highlight how your product effectively addresses the pain points and offers a superior solution. Showcase the unique features, advantages, and success stories related to your product to create a compelling closing argument.


Here are the details about our target audience, our product, and the problem our product addresses.


Details:


"""

[Target Audience]= Insert your target audience

[Product]= Here you need to insert a detailed description of your product (the better you can explain it, the better the output will be).

[Problem]= Here you need to insert the pain point/goals of your target audience

"""


Output Guidelines:


"""

-Embrace a conversational and engaging tone that reflects the way characters speak in popular Netflix shows.

-Utilize witty remarks to highlight the huge effort required by the alternative solutions while emphasizing their minimal results.

-Mirror the fast-paced nature of popular Netflix shows by keeping your counterarguments concise and to the point. Use short sentences and punchy phrases to maintain the reader's interest.

-Address the reader directly, using phrases like "That’s because…", "You see…", "So", "So what I’m saying is…", "If that sounds familiar to you…", "And so…", "Which means…", "Okay, so…",

 "Now", "Well, let me tell you something", "Makes sense, right?", "Alright, let's move on...", "The next thing you need to know is...", "I get it, we've all been there", "You deserve better than that", "Now, I'm not saying", "I mean, come on", "Now, don't get me wrong...", and "Let's be real for a second" to establish a connection.to make the counterargument feel personal and relatable.

-Address the reader as "you" and engage them directly. Use phrases like "So, you thought..." to establish a friendly and inclusive tone.

-Write a comprehensive and detailed argument that highlights multiple limitations and drawbacks of the alternative solution. Address various aspects such as effectiveness, convenience, safety, long-term results, cost, and time commitment.

-Call each alternative solution by their generic category name rather than their specific brand name.

You can now run ChatGPT-like LLMs such as Mistral 7B locally on your phone.
Offline. Knowing the data is safer. More privacy.
Here is how to do it:
— Alvaro Cintas (@dr_cintas)
2:54 PM • Dec 14, 2023