A complex world hides behind the Instagram algorithm. How does Meta rank content and how does this impact the creator community?
YouTube is riddled with tips and tricks on how to rank on Instagram, tricking the algorithm to get maximum exposure. Oftentimes, many videos scratch but the mere surface of how Meta determines which content is shown to each individual user the moment they open Instagram. The algorithms running in the background are complex models that tap into vast datasets to make sure you remain on the platform as long as possible.
The importance of recommendation algorithms
Before diving into the different recommendation algorithms Meta uses to serve content to Instagram users, we have to understand why engineers are working tirelessly to build these complex models. Firstly, and perhaps the most obvious one, is a matter of life and death. Meta, with Facebook, are entangled in a fierce battle with TikTok, who is taking users away from platforms such as Instagram. This problem grows exponentially if Meta fails to keep users on its platform and away from TikTok. It must do everything in its power to keep users engaged on the platform.
Keeping users engaged is a core part of its business model. Without users logging in each and every day, ads cannot be served and no money is to be made. While Instagram and Facebook aren’t caught in a ‘death spiral’ yet, this threat always looms on the horizon. Secondly, just like Netflix, poor recommendations can hurt the user experience. If a user is left with no new content to explore, it might deem the service as irrelevant to its daily media habit, and abandons it. Which ties in neatly with the first point, as a competitor might be able to deliver a far more engaging experience.
AI powered recommendations
In November 2019, the research team at Meta detailed how AI powered the Explore tab within the Instagram algorithm. The Explore feature is one of the most popular sections of the social media platform, with over half of Instagram users discovering new content and creators every month. The sheer volume of content being uploaded to the platform poses serious challenges from engineering teams at Meta, as all users receive a tailored experience, hardware is pushed to the limits.
The team explained that, before the team could dive into AI powered recommendations, it had to build the framework that could handle all the incoming requests. A tool that could handle three specific needs was developed. The team needed a framework that enabled rapid experimentation at scale, enhanced interest collection and had the computational efficiency to deliver relevant and fresh content to the user. By developing these foundational principles, the team could expand and improve its recommendation algorithm.
Engineers at Meta wanted to find the right machine learning system to deliver relevant content, based on the user’s prior interactions. A strong model would be able to identify and pinpoint the user’s interests. The team explained that some models are adept at serving content based on recent content interactions, while others can track interests over a longer period of time. Developers at Meta experiment with different algorithms to determine which one can deliver the best user experience at scale, whilst not frying CPUs in its data centers.
The determined ultimately created IGQL, a domain-specific language to retrieve content for its recommender systems. The retrieval of content is conducted through C++, helping to minimize latency and reducing computing resources, the engineers at Meta noted. The coding language also allows for enough flexibility for the team to experiment with multiple recommendation algorithms. Engineers can tweak the weighted scores across different ranking points and analyze which resonated the most with the user.
However, recent content interactions across the platform don’t provide a strong insight into user’s preferences. In order to offset misinterpretations with content interactions on Instagram, engineers look at media present in the user’s account, which can serve as a treasure trove to find common themes and communities. The team calls this account embeddings, which is used for the ‘personal ranking inventory’.
For example, saved pictures receive a high rating, as they reveal the user wants to revisit the content at a later stage. Hence it’s highly relevant to monitor which content is stored in each person’s account. Same goes for the content published on the account itself. An account that posts a lot of fashion related content such as lookbooks, tutorials and recommendations, has fashion and cosmetics as a central theme. This feeds back into the algorithm, who can scan the platform for related content.
Through models, engineers can find commonalities between accounts relevant to the user. This is achieved by distilling content into words, techniques better known as ig2vec, similar to word2vec. Content across every account becomes a sting of keywords, allowing for easier pairing. This explains the high interest in image recognition technologies to create the most relevant keywords. The team notes that by sequencing content in words, it can more easily predict which content the users will interact with.
The next step is putting weight on all the collected data, ranking all possible content that can be served to the users, only serving content that is most relevant. The algorithm will search for related accounts, combined with the previously collected interactions, to serve the most relevant content. The content served is updated each time the users interact with the Explore section, the team notes. As content touchpoints build, the models become ever more effective.
Instagram Home Feed
In December 2020, Machine Learning Engineer at Meta, Amogh Mahapatra, revisited the recommendation system that fuels Instagram, which delivers fresh and relevant content to its users every time they open the app. Mahapatra opens the blog by stating the questions that are relevant to the user experience, noting that the home feed has been carefully curated by, with engineers wondering how they could enhance this experience. They know that users who are engaged with their home feeds are tempted to look for new interests to follow.
Meta engineers want to progress the personalization for its users. This has been achieved by pushing new content beyond the Explore Floor, serving relevant content to users immediately into their feeds. The Suggested Posts for the Instagram Home Feed launched in August 2020, which would bridge the gap between exploration and familiarity. A major challenge for the engineers is to keep users engaged for longer stretches of time, whilst not losing them in the process by suggesting irrelevant posts.
Hence, an important piece of the puzzle is fetching the right content. Mahapatra describes this as the candidate generation and candidate selection. Candidate generation is based on the user’s explicit and implicit interests, he explains. This has been detailed extensively in Meta’s 2019 blog post. Candidate generations put enormous stress on the hardware. Candidate selection meanwhile, requires complex algorithms, who determines which content should be served to the user.
The Candidate Generation phase uses two components, namely the embeddings based similarly, which collects and utilizes engagement data generated by the user. This stage uses the same content to word systems as used in the Explore recommendation system. Secondly, Instagram uses co-occurrence based similarity, which tracks interactions such as likes, shares and other media interactions, distilling the data into content pairings to suggest relevant content.
However, these systems feed on user interaction, leaving an important part of the equation out of scope, the cold start problem. Mahapatra says that new and seasoned users may have little interactions with the platform, meaning, little data has been fed into the content generation algorithm.
Scaling Instagram Explore
In August 2023 the engineering team at Instagram detailed how it had worked to expand the Explore recommendation system. The team notes that the Explore feature is one of the largest recommendation systems on Instagram. Engineers at the social media platform use machine learning to deliver the most interesting and relevant content. Through machine learning the team introduced so-called task specific domain-specific language (DSL) and multi-stage approach to ranking. These systems focus on four stages, retrieval, first-stage ranking, second-stage ranking and final reranking.
The retrieval algorithm searches for content candidates that might be classified as high ranking after all available options are being processed. While Meta might have enormous capacity at hand, it cannot simply scour the entire platform in search of the perfect. Hence, it will collect a pool of a few thousands samples, upon which it keeps narrowing down the content through its recommendation funnel. These different sources are combined and ranked through each ranking model.
The content samples receive a weight. Additionally, content samples are stored before peak hours and served to the user during off-hours. This allows greater flexibility for the team. An important cornerstone of the retrieval itself is conducted through Two Tower Neural Networks, who select content through the earlier mentioned Word2Vec algorithm, which can synthesize visual media into keywords, allowing for efficient search and selection. The Two Two model, the team notes, is an extension of the Word2Vec algorithm, allowing for maintainability of the real time nature of the model.
The Two Tower model uses two separate neural networks, to team explains, one scanning for the user and one of the item itself. Each of the network scans for features related to the entity and generates an embedding. The algorithm classifies a possible interaction, such as liking a piece of content, from the user. As the program becomes more intelligent, it will be better at classifying future content. There are many more facets that come into play, but by plotting user interaction and classifying content items through different models, Instagram can serve the best, most relevant content.
Diversified content suggestions
Instagram is well aware about the Achilles Heel when developing recommendations that rely heavily on user interactions and media consumption. They can become repetitive, resulting in users disengaging with the platform altogether. In order to offset this trend, engineers at Meta inject content that is related to the user’s interests through its Knowledge Graph model. The team describes that the user’s interests are multidimensional, pointing to movie interests which can go beyond a fixed genre, branching out into other genres.
A user’s behavior can display a level of softness, meaning a 35 percent preference for a certain content type and 99 percent towards sports. Preference for content can be contextual, with outside factors shaping the user’s interests. Furthermore, preferences can change over time. A user can be interested in cooking, whilst shifting to traveling months later. This shifting relation with the platform, means Instagram has to keep developing solutions that can move within this unpredictable phase.
Hence, an algorithm will have to deliver a diversified experience to keep users engaged. Instagram exemplifies certain techniques to ensure a diversified media serving. It proposes multiple authors in the feed, instead of serving content from a single creator. A mixture of media types such as pictures, video, long and short-form, albums and more, prevents repetition.
Instagram has built a ‘content understanding system’, enabling the algorithm to have a firm understanding of the video and text content it serves at scale. By ensuring the platform knows what content is present, it will be more adapted to serving interesting media to the user. Instagram notes that a picture can have multiple elements that align with the user’s preferences, hence, there is a wider library of content that can be served.
The algorithm enhances content recommendations by exploring semantic nodes that live within the knowledge graph. For example, wildlife photography lives within the photography category which itself has multiple subgenres. Instagram comments that users who are already active in the space, allowing for the platform to serve content within the same category. Additionally, users are exposed to countless other topics related to niche subjects, such as Manga, where users gravitate to similar subjects. Tapping into related topics, allows for varied content suggestions. This branches out into multimodal items, where topics are combined across different media formats.
Engineers will be tempted to develop an algorithm that serves as much content across as many topics as possible. But, this isn’t the case. Users, Instagram points out, maintain long term preferences in which they are generally interested and a limited range of topics that enjoy heavy interest in a brief moment in time. Engineers therefore have to maintain two separate queues that distinguish between long and short term interests.
There are several models that enhance content diversification, better known as explore-exploit tradeoff algorithms, which branch off from reinforcement learning models that serve randomized content to explore possible interests to users. Regular users of the platform will have undoubtedly seen flashes of content that have little to do with their interests, wondering why it shows up. These bursts of random content are enabled by the explore tradeoff models to gauge users’ interests.
Instagram is also a strong proponent of user research, through surveys, diary studies and more to learn how users interact and enjoy the content that is being shown. This helps Instagram to determine how comfortable users are with repetitive or diversified content suggestions. Instagram admits model deployments are heavily based on these qualitative results. Negative feedback, such as ‘not interested in this topic’ or ‘show me less’- responses, serve as guardrails to adjust where possible to maintain a positive platform experience.
Influencers versus the Algorithm
So far, we’ve primarily focussed on the platform owners side of the story. Meta keeps tinkering with the algorithm to keep users engaged. But on the other side of the spectrum are the creators, who have to find new ways to keep their fans engaged. In July 2022, MIT Technology review spoke with associate professor at Cornell University, Brooke Erin Duffy who interviewed 30 creators across different platforms, including Instagram, about the rules within the algorithms that dictate what they can post and the impact it has on reaching their audiences.
Duffy noted that creators’ experiences on the platforms were impacted by changes in the algorithm. The algorithms have a major impact on daily routines and the time needed to reach their audiences on the platforms. Creators know the relationship with the algorithms are not necessarily beneficial, but have to be taken into consideration to be successful. This unhealthy dynamic displays the powerful grip mathematical models have on the creators most active on the platforms.
The algorithms act as a suppression layer, forcing creators to omit, or select, certain topics, or otherwise failing to reach new and existing audiences. If they fail to attract the attention of the algorithms, their livelihoods are at stake. They’ll try to game the system by selecting new and alternative hashtags, Duffy explained, a form of experimentation and an attempt to squeeze whatever awareness is left. They engage with community members, sharing their experiences with the algorithm.
Failure to generate impressions meanwhile, is often labeled as shadow banned by the creators. Shadow banned is often described by creators, and users, as a hidden mechanism employed by platform owners to censor unwanted content. Shadow banning is not an official practice and can be thrown around loosely even if content violations by the user have been observed. Whatever the case may be, the research conducted by Duffy shows how omnipresent an algorithm can be in shaping the content, and income streams, of its most avid users.
Content sends on Instagram
In June 2024, Colin and Samir spoke with Head of Instagram, Adam Mosseri about the role of algorithms on the platform and how they shape the user and creator experience. Mosseri pointed out that a lot of talk about the algorithm is primarily focussed on the black box aspect of the model and the inability to reach more users. Mosseri adds that the primary factor to assess a content piece’s performance, is looking at the sends per reach, meaning the amount of times content is shared with other users.
Mosseri highlights that content shares drive the most value for the community. The sharing aspect is a strong indicator that content is worth sharing, hence signaling to the algorithm that more users could benefit from engaging with this content. It can spark conversation and entice users to interact with similar content. Mosseri explains this effect as a flywheel for further content engagement.
An important driver for this community engagement is short form content. Mosseri admits that long-form content is able to keep the users attention, however, such content doesn’t work in the platform’s favor, as audiences consume far less content, resulting in less content being shared overall. Hence, through short form content, Mosseri notes, the company can remain closer to its core value to connect people. These metrics are communicated to creators, whilst developing new tools to enable better assessment of media performance.
The inability of Instagram to adequately tell its creators what content sticks and gets shares, is one of the shortcomings of the platform. This translates to creators opting for guerilla style tactics in hopes of grabbing the attention of the algorithm, rather than thoughtful content creation. Mosseri agreed that there was work to be done in this department. When asked which other ranking factors contribute to content being served, likes were also considered important, however, their weight can be misguided in relation to shares.
In cases users don’t share or like content, Instagram will use watch time in regards to reels. While their contribution to ranking is far less than likes and shares, Instagram uses it to assess the content’s performance. To better determine media performance, Instagram looks at how popular was the ranking across a creator’s followers and how many users engaged with the content who weren’t subscribed to the account.
The almighty Instagram Algorithm
The Instagram Algorithm is made up of countless lines of codes that fetch and serve content. Oftentimes we overlook the technical aspect of the models, and jump straight into the awareness generation phase. But behind the scenes engineers are constantly tweaking recommendation algorithms to serve the most relevant content to keep users engaged for as long as possible. Developers at Instagram pay close attention to media created on the platform, ranking it through complex data sets, weighing and ultimately serving it to users.
This complex mechanism puts immense stress on creators, who, overnight, can generate viral success, whilst the following day, have no community interaction at all. Instagram is unable to provide the necessary tools that enable creators to repeat their success. This means that the most avid users rely on hit or miss strategies in attempts to replace their previous success, which in turn puts immense stress on their mental well-being, as their livelihoods can be wiped out with the next update.
The notion that one might have figured out the algorithm is therefore unrealistic. The many complexities, not of the algorithm, but the human mind, is the strongest determining factor whether content will be discovered, served and shared. These aspects fluctuate heavily over time to varying degrees.