top of page
Writer's pictureGursimar Singh

12 days of OpenAI: In search for Artificial General Intelligence.

Updated: Jan 1

In the search for artificial general intelligence, the OpenAI team launched 12 days of OpenAI, where they released new stuff or had a demo each day.

The fact that we even have 12 days of new releases and demos shows how fast the AI space is evolving.


So, let's explore each product's good, bad, and ugly aspects and discover how this new stuff will impact our day-to-day lives as end users and developers.


The Great


o3 and o3-mini released [11]


Both of these models have insane benchmarks! While both these models are unavailable to the public, the benchmarks have set a new standard for future models! These models have shown great performance in competitive programming, mathematics and science.


The o3 model achieved 25 per cent accuracy in FrontierMath. The math problems in the FrontierMath dataset contain math problems that take hours for expert mathematicians to complete [14].


 o3 model EpochAI benchmarks
Image obtained from [11].

Even more crazy is that the ARC benchmark checks the model's ability to learn new skills on the fly.


While the performance is great, it is important to remember that the model is not yet available to the public, and we should wait until we can use it to form our own opinion.



Timothy Gowers on o3 model
Timothy Gowers waiting to get access to form an opinion on the new models [13]

The Good


Developer experience improvements [8]


Having access to a powerful model is good, but all that matters to me is the developer experience when integrating OpenAI into my products.


So, on that front, the OpenAI team delivered some of the most requested features so far:

  • Function calling: Provides a set of functions to OpenAI, and then ChatGPT will use these functions when appropriate to interact with your backend APIs. Also, OpenAI claims that calling the correct function would be more accurate now.

  • Structured outputs: OpenAI now claims that the responses will adhere to provided schemas 100% of the time.

  • Vision inputs in API, such as sending an image and asking chatGPT for any input. For example, send an image and ask about any issues with it.

  • WebRTC support for the real-time API to make building real-time AI applications easier and more performant.

  • Preference fine-tuning for better training the model to give more concise and relevant responses.

  • Support for Python (Realtime API), Go and Java SDKs


The Meh


Sora Turbo and Storyboard release [2]


Yup! Sora is in Meh rather than Good or Great, even though its demo was great. Let's understand why.


Sora is a text-to-video model that generates video from text prompts, and it also comes bundled with ChatGPT Plus and a pro subscription.

Features of Sora:

  • Selectable aspect ratio

  • Generate up to 1080p video

  • 5 to 20 seconds of video can be generated.

  • Generate multiple variations at once

  • Presets such as Film Noir

  • Use remix to describe changes.

  • Use the blend to merge two videos seamlessly together.


Storyboard: It lets you direct video with multiple actions across a sequence. Basically, you can join the video using different prompts. For example, first, a person is swimming; second, the person sees a shark in the sea and quickly swims to shore for safety.



The reason why I am sceptical of Sora and Storyboard being a genuine artistic medium of expression is that I agree with the following quote from Bret Victor:


The entire reason artists create visual art is they want to express something that they can't express in language. [15]

Sora is a decent text-to-video, and using it is fun to generate some wacky ideas, but I believe it will only ever be something to support the artistic process, nothing more than that. Therefore, I would have much preferred this to be integrated with existing video editors rather than this being a separate tool. Perhaps OpenAI can improve the editing experience in Sora to be on par with Final Cut or Premier Pro. Then, it will most definitely be Good or even Great!


ChatGPT Canvas [3]


If you are interested in ChatGPT canvas, then I recommend the following video:




In short, ChatGPT canvas is quite useful for day-to-day interactions with ChatGPT and is especially useful for coding and writing. It is good, but not a giant leap forward.


Search [7]


The web search is now faster and has better mobile support. What is good about this is that it's a perplexity.ai competitor. With this change, there is no need to switch between ChatGPT and perplexity to find stuff. Both provide references and should perform just as well when searching the web. Since ChatGPT web search is free, maybe it's time to ditch perplexity.



Projects [6]


They are like folders to organise files and tailor ChatGPT with custom instructions.

ChatGPT can basically read the files in the project and then give a response based on the text in the files.


Will I use this? Absolutely. It is in meh because it feels like it should have been in chatGPT from day 1.


Apple Intelligence [4]


Apple Intelligence has some neat new stuff that can be quite useful, such as summarising pdf on Mac. It seems chatGPT seems to be working with Siri with "task handoff". I would rather just talk to a better Siri or ChatGPT app to make it feel like less of a disjointed experience.


The Bad


While all these advancements are great, there are a few signs to worry about in the future. Most importantly, the costs associated with the models. The first sign of worry has been the ChatGPT Pro's price tag.


The ChatGPT Pro's price tag [1]


While o1 is great but if you use it a lot and want more reliability, you would have to pay a premium price.

The price is an insane 200 USD a month, which makes this option practically unaffordable for people from lower-income areas.



ChatGPT Pro current pricing

o3 is expensive.


o3 packs some serious power, but it comes with serious costs, as shown by the following diagram from Arc Prize:


o3 Cost per task graph
LLM's cost per task. Image sourced from [12]

If OpenAI is serious about making AI for everyone, reducing costs should be one of their top priorities.



Work with apps [10]


While the ChatGPT app itself is superb and works well with other stuff.

The demo didn't really make any sense to me.

Why context switch between IDE and ChatGPT? Why not just use copilot?

Why use the ChatGPT app to write with Notion if Notion's own AI already exists, which might provide a smoother experience?

I think smoother integrations are the way to go, as they require less context switching.


The weird


Video in advance voice[5] and 1-800-CHAT-GPT [9]


Both these features show a way to talk to ChatGPT via voice or video call. Making you think that you are basically talking to a person or a character. If using video, the model can then recognise the caller with stuff like what kind of clothes they are wearing.


These conversations felt a bit forced in the demo, there were awkward pauses due to processing times that AI models take.


There is also a sense of power difference between ChatGPT and the person. The model is constantly analysing the scene for stuff like what kind of shirt someone is wearing, etc, while the person is talking to a voice.


I would definitely try using video at least once to test how practical it would be compared to YouTube. For example, can it tell me how to fix my car? As opposed to something easy such as making coffee.


ChatGPT also provides an option to call them directly via phone call in the US or via text message on WhatsApp.


I can realistically think of two scenarios when ChatGPT via phone call would be used:

  1. When I am trying to reduce screen time by using a dumbphone and want to find out some information.

  2. While driving, if I want to Google something, I would rather just call ChatGPT and ask the questions.


So, that was all from 12 days of OpenAI. Overall, it shows great progress towards model performance and shows that we are edging towards AGI. Will AGI be achieved is a question yet to be answered, but openAI has achieved some excellent results in the quest for AGI.


There is still some need for better integration with AI on our daily products. Therefore, we are seeing more focus on developer friendliness from OpenAI and partnerships with behemoths like Apple.



References and Further Readings:

36 views0 comments

Recent Posts

See All

Yorumlar


bottom of page