Facebook is building An AI That Seed The World Like Humans Do.

Facebook’s Ego4D project aims to create an AI that will provide AR cues in daily lives to help users and sounds like a true glimpse of its metaverse.

Facebook has announced a new research project called Ego4D that aims to train AI models using videos captured from a human perspective and then providing guidance using augmented reality cues by accessing the log of past recordings. The social media giant’s latest AI-centric research project puts computer vision technology front and center, which is already in use to some extent for products such as Google Lens and a handful of other shopping tools where an image is studied to pull similar listings from e-commerce platforms.

The company recently detailed its work on new AI-powered visual search and product discovery tools that will boost the shopping experience on Instagram. Down the road, it will pull up online listings of clothing items by simply tapping on a person’s photo. The in-house product recognition system is so advanced that it will pull up relevant products even for vague text-based queries such as “find a shirt with similar polka dot pattern” on its platforms. But all these object recognition systems are predominantly based on a computer vision system trained on photos and videos captured from a third-person perspective.

Facebook is going a step further by changing the perspective of training data from the sidelines to straight into the middle of the action with a first-person perspective as part of its Ego4D AI project. The possibilities appear to be endlessly beneficial, and a little scary, too. To collect the training data, Facebook partnered with 13 institutions across nine countries that recruited over 700 participants to record more than 2,200 hours of first-person footage documenting day-to-day activities such as grocery shopping, washing utensils, and playing drums to name a few. The goal is to capture the activities and also assess the scenario from a person’s own perspective, much like the action recorded by Facebook’s own Ray-Ban Stories sunglasses.https://www.youtube.com/embed/taC2ZKl9IsE?feature=oembed

The First Glimpse Of Metaverse With AR At The Center

The company is calling it egocentric perception, hence the name Ego4D. The video was transcribed and annotated to describe everything in the frame from an object to the action, in order to create a dataset that researchers across the world can use to develop computer vision systems and catalyze a new wave of AI development. Wendy’s recently partnered with Google Cloud to create one such computer vision system that will monitor the kitchen and alert the cook when it’s time to flip burgers. However, Facebook’s Ego4D project puts an AR spin on those AI capabilities that go far beyond analysis and actually step into the realm of predicting’s users’ actions.

To do that, Facebook has conjured a set of five benchmarks that an egocentric perception AI has to achieve. The first one is episodic memory, which works just the same way as human memory. For example, when users forget where they placed the TV remote, the AI will access past first-person videos to check where users left it, and will then guide them towards the place using AR cues somewhat like Apple Maps. The second benchmark is forecasting, which predicts what the user is about to do and provides the necessary alert. So, if a user has already added pepper to their curry, and again reaches out for the pepper powder bottle, the AI will recognize the impending action and will instantly alert users that the ingredient has already been added.

Similarly, the ‘hand and object manipulation’ benchmark wants the AI to remember a correct sequence of events, something that students will find helpful as AR cues will show them the steps in a recorded training video. The fourth benchmark is social interaction, while the fifth — and most alarming one — is audio visual diarization. This one involves saving an audio (and possibly text-based as well) log of what a person in the camera’s view was saying. Users can then ask the AI questions such as what person ABC said during their coffee break on a particular day. Facebook hasn’t detailed the safeguards to the seemingly massive privacy intrusion scenarios for its project yet. The Ray-Ban Stories have already come under scrutiny because of their ability to go full creep mode. And with an AI as smart as Ego4D, there will be a lot more privacy-related worries.

On the positive side, the Ego4D project gives a very clear glimpse of what Facebook wants to achieve with the metaverse, at least when it comes to helping users in their daily lives. And the heavy application of augmented reality to achieve those goals is a sign that Facebook will be going all-in on AR and more advanced wearables are definitely in the pipeline. But the biggest question is if users will be comfortable with Facebook having more personal access to their lives via first-person videos, given the company’s sketchy past with privacy-related scandals.

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s