Windows Copilot Needs to Break Free from the Shackles of a Chatbox
With the release of OpenAI’s ChatGPT, chatbots have become the face of Artificial Intelligence (AI) in today’s world. It appears talking with an AI chatbot is the only way to interact with AI models and intelligent systems. Although I agree that a chatbot offers aschematic, user-friendly interfacefor most users to interact with an AI model, it can’t be true that all your dreams of interacting with an intelligent system die in the four corners of a text chatbox.
In this regard, Microsoft has caught in the frenzy of integrating AI chatbots into many of its products. Most notably, it has integrated Windows Copilot, an AI chatbot powered by OpenAI’s models into Windows 11 with much hype and pomp. Not to miss the fact that Microsoftreplaced Cortanawith Windows Copilot on Windows 11. And the tech giant has also integratedWindows Copilot to Windows 10, replacing Cortana.
Surely, Microsoft believes that AI chatbots are going to be the future. But is it really the vision of intelligent computing, powered by AI? Or is Microsoft just pandering to the AI hype and integrating AI chatbots to show investors that it has skin in the game? Whatever the answer, the current form of AI-driven chatbots has limited application and it feels restrictive to get any meaningful help from the chatbot, especially at the OS level.
Windows Copilot: A Downgrade Over Cortana?
Microsoft decided to wind down Cortana — a 9-year-old product — in favor of Windows Copilot, but is that a suitable replacement, especially when Windows Copilot is still in preview?
Nevertheless, let’s go through the comparison point by point. First off, Cortana was primarily a voice assistant whereas WindowsCopilot is a text-based AI chatbot, although it supports voice input but not by default.
Simply put, Windows Copilot is not designed for avoice-first user experienceso it gives a disjointed experience, unlike Cortana which felt more personal. I think, when it comes to UI approachability, voice input is preferred by many over text input for the sheer ease of use and intuitiveness. So Windows Copilot fails on the vital user experience test at the very outset.
Now coming to features, Cortana was a fleshed-out product by now and it could perform a lot of system-level actions. It could create a timer, set an alarm, add reminders, compose an email, find definitions, open apps, and do much more. In essence,Cortana was deeply integratedinto the Windows OS and understood the system very well.
In comparison, Copilot is powered by general-purpose large language models (LLM) that are not well-tuned for performing local actions on Windows. When I ask Windows Copilot to set a timer, it tells me to go to an online service to set a timer. Itcan’t even set an alarmor play music. Copilot simply opens the Spotify app for me. I can’t seem to find any AI magic in here.
Microsoft is in a rush to board the AI hype train, emblematic of how Microsoft missed the smartphone race which it now regrets, and doesn’t want to repeat the same mistake.
Of course, Windows Copilot is still in preview, and these features will be likely added in the future (some already in testing in Insider builds), but what was the tearing hurry to replace Cortana with a barely-working AI chatbot?
It appears to me that Microsoft is in arush to board the AI hype train, emblematic of how Microsoft missed the smartphone race which it nowregrets, and doesn’t want to repeat the same mistake.
What irks me is that Microsoft seems to have not given much thought to Windows Copilot. It has simplyintegrated a chatbotand called it a day, at least for now. The tech giant hasn’t even tried to bring feature parity between Copilot and Cortana before replacing nearly a decade-old product.
It’s especially disappointing because Microsoft is adding aCopilot key to the Windows keyboard— something Microsoft calls it “significant change to the Windows PC keyboard in nearly three decades” — yet so little thought has been given to it.
Where is the AI Magic in Windows Copilot?
Now, let’s come to what Windows Copilot can do. You can ask questions about any topic and get answers right away. You can also move to the Creative mode to talk to the powerful GPT-4 model.
It can summarize a webpage, find key insights, plan an itinerary, etc. Microsoft has also added a screenshot tool to Copilot thatuses the GPT-4V model for visual analysis. You can use it to perform OCR or find information about an image.
As for Windows-specific features, you can say,“I am having issues with audio”and Copilot can open the audio troubleshooter for you. Itworks for troubleshootingother Windows issues as well. Besides that, you can turn on/ off dark mode, take a screenshot, and snap windows through Copilot.
While these features are decent for the preview version of Windows Copilot, most of them work in Edge Copilot as well, except for Windows-specific features. Moreover, Windows Copilot can’t access webpages from Chrome or other browsers. As WindowsCopilot is running on Edge’s engine, it can’t access content from other windows, be it a browser, Notepad, or Office apps.
This is another major gap in Windows Copilot’s implementation. It’snot developed using the WinUI 3 frameworkfor delivering a native experience, instead Copilot is running as an extension of the Edge browser. As a result, you don’t see deep integration of Windows Copilot in key elements of the OS.
For example, you can’t right-click on a file in Windows Explorer and ask Windows Copilot to explain it, convert the file format, or perform any action you want. It would have been so cool if you could throw an Excel file at Copilot from the context menu and it could perform data analysis right there. Currently, except for images, there is simply no way to interact with files using Windows Copilot on Windows 11.
Windows Copilot: A Case of Overpromising and Under-delivering
Of late, Microsoft has been very good at announcing and marketing new features, but when it comes to using the promised features, you can’t seem to find them. When Windows Copilot was announced three months back, it promised several new features, however, they arenot available yet or don’t function as marketed.
For example, when you ask Windows Copilot to snap your windows, it asks your permission and then snaps just one window, leaving you to perform the rest of the action. Similarly, it doesn’t play mood-specific music when you ask it to play something while working. Copilot simplythrows links from YouTubeand other sources. That’s not what you expect from an intelligent AI-powered Copilot, do you?
Next, the much anticipatedcontextual menu for Copilothas not arrived yet. Rewrite, Explain, and Summarize are not available for any active window. Draft with Copilot is also nowhere to be found even after three months of release. Not to forget, you can’t remove the background of images using Copilot, and Extension support has not been added yet.
So all the marketed and hyped-up features are not there. It’s a simple case of Microsoft overpromising and under-delivering with many of its products.
What Could Be the Vision for Windows Copilot?
Now, let’s come to what Windows Copilot can do. If we look at what the open-source community is doing, we have an interestingOpen Interpreter toolthat can interact with your local files, convert them to other formats, process various file formats, create charts, and do much more. It can also interact with various system settings and tools and perform actions on Windows.hi! Open Interpreter 0.2.0—The New Computer Update—is out today.everything’s new.– OS Mode lets vision models operate your computer– We included a new model for precise GUI control– We’re launching a Computer API for LLMs↓pic.twitter.com/smhAW2R8Mf— killian (@hellokillian)January 5, 2024
hi! Open Interpreter 0.2.0—The New Computer Update—is out today.everything’s new.– OS Mode lets vision models operate your computer– We included a new model for precise GUI control– We’re launching a Computer API for LLMs↓pic.twitter.com/smhAW2R8Mf
Just recently, a new version of Open Interpreter (0.2.0) was released with a fascinating OS mode. You can operate your computer with simple natural language prompting. Open Interpreter uses vision models like GPT-4V tounderstand the GUI environmentand perform actions on your computer.
To give you an example, you can ask it to turn on dark mode, and it opens the appropriate Settings page and turns on the toggle using the Vision model.Look ma, no hands! This is@OpenInterpreterusing my mouse and keyboard to send an email.Imagine what else is possible.pic.twitter.com/GcBqbTwD23— Ty (@FieroTy)January 6, 2024
Look ma, no hands! This is@OpenInterpreterusing my mouse and keyboard to send an email.Imagine what else is possible.pic.twitter.com/GcBqbTwD23— Ty (@FieroTy)January 6, 2024
You ask it to play some lo-fi music, and it opens the browser, and YouTube and finds some great lo-fi playlists, and plays it for you. These are some basic examples of what vision models are capable of, but Windows Copilot is stuck at throwing texts at you in the chatbox.
A truly intelligent Copilot should be able to send an email, tweak Windows settings, interact with the OS at the system level, and do so much more. The use-case is limitless and it can be so useful forimproving accessibilityonWindows 11 24H2.
Of course, calling the GPT-4V API will cost a lot of money for Microsoft, butit can build a small vision modelspecifically for Windows, much likeCogVLM. This way, the latency will be reduced and everything will run locally, even when your PC is offline.
With the upcomingIntelandSnapdragon X Elitechipsets having dedicated NPUs, running smaller models on-device would be possible. Even if Microsoft runs its in-house developed visual model on the cloud, it would cost much less.Introducing r1. Watch the keynote.Order now:https://t.co/R3sOtVWoJ5#CES2024pic.twitter.com/niUmjFvKvE— rabbit inc. (@rabbit_hmi)January 9, 2024
Introducing r1. Watch the keynote.Order now:https://t.co/R3sOtVWoJ5#CES2024pic.twitter.com/niUmjFvKvE— rabbit inc. (@rabbit_hmi)January 9, 2024
To give another example, we have just seen the demo ofRabbit R1— an AI-first hardware device — that can perform actions for you. It’s powered by what they call anLAM (Large Action Model). From ordering pizza to sending emails and booking flights, it can intelligently do everything for you with just voice input.
Microsoft needs to come up with something like an LAM that is designed to perform actions, and not for just chatting with a chatbot.
If a small startup like Rabbit can pull it off, so can a large tech giant like Microsoft with humongous resources on its side. So far, we have seen Microsoft building its ownPhi-2 model, a small LLM, for research purposes only. If Microsoft really wants to have us experience AI PCs in 2024, it needs to buildWindows-specific vision modelsfor running agents locally with near-zero latency. Microsoft needs to come up with something like an LAM that is designed to perform actions, and not for just chatting with a chatbot.
Windows Copilot Needs a Fresh Approach
Finally, to conclude, Windows Copilot, in its current chatbot form has an extremely limited use case and is already covered by countless browser extensions and Edge Copilot. Microsoft needs a fresh approach to make AI PCs a reality.
Microsoft’s fiercest competitor, Apple is known for building a product thoroughly and releasing it to the public when it’s ready for use. In contrast,Microsoft does the opposite. It rushes to release the product when it’s not even ready with functional and meaningful features available at launch.
It’s symbolic of how Microsoft is approaching AI without much thought. The company has already started calling Edge an AI browser by just integrating a chatbot. It’s also working to addAI features to Notepadand continues to bring AI-powered features to MS Paint, Snipping Tool, Office apps, and other first-party apps.
Microsoftneeds toget over the obsession of integrating a chatbotand start afresh.
While these in-app AI features can help some users, to make Windows an intelligent OS, powered by AI, Microsoftneeds toget over the obsession of integrating a chatbotand start afresh with novel ideas and approaches.
Arjun Sha
Passionate about Windows, ChromeOS, Android, security and privacy issues. Have a penchant to solve everyday computing problems.
Add new comment
Name
Email ID
Δ
01
02
03
04
05