Why We Stuggle To Build in Africa
Why Generative AIs and other popular data aggregators struggle to give a proper African context.
You see, Google (arguably the largest data aggregator in the world) employs Google Maps in such a way that it’s supposed to source data from everyone who uses or visits it, but can only show you what you want to see, and not access to other data points that affect you. In essence, you only see singular data when you query it, e.g., when you search an address and see the direction to the place and possible reviews.
Yesterday, I came across a tweet by Saheed Niyi who released the Naijaweb dataset, a 270,000-document dataset (230 million GPT-2 tokens) of webpages that Nigerians have shown interest in. It was cleaned using the same techniques as the Fineweb dataset.
If you’ve ever used ChatGPT, Gemini, Claude, Meta AI, or any of the popular generative AIs for your everyday work and complained about their difficulty in giving you localized solutions as a Nigerian or African, it’s because there isn’t enough local data available to train them to provide context in the Nigerian or African way. As a result, the depth of their responses is somewhat limited.
So what does this 270,000-document dataset mean for folks like you and me, and anyone who wants to build products tailored to Nigerians? Let me start with my first example — Google Maps.
If the backend data of all the contributions made on Google Maps about Nigerian establishments or roads were made available to the public, one of the open-source products I’d expect someone to build would be an open-source integration that allows users to crowdsource real-time road conditions. This means users would not only report accidents and road closures but also report potholes, unsafe areas, other road conditions, and areas with high crime rates or construction zones. Imagine driving and getting alerted that there’s a pothole ahead in 200 meters.
In essence, this would develop a community-driven mapping tool that allows users to contribute and share information about their local areas.
Next is a sentiment analysis tool. You see, no social media platform has ever affected the polity in Nigeria the way X (formerly Twitter) has. When it comes to sentiment analysis, X (formerly Twitter) is the platform where policies are weighed before they’re pushed or binned. There’s a push-and-pull force on that platform that ensures that diverse opinions are entertained and amplified at scale. But how do you measure it? How do you capture real-time discourse, analyze it, distill insights, and present them to businesses and policymakers? This 230 million-token dataset should help build that tool.
With this volume of local dataset, someone should be able to build a tool to conduct market research in Nigeria, helping businesses understand consumer preferences and trends.
There are other tools that can be built, from academic research tools to language analysis tools to analyze the use of Nigerian languages in online content, providing insights into linguistic patterns and cultural nuances. Even dating apps tailored to Nigerian users, considering cultural nuances and preferences, can be developed.
I’m excited about the possibilities and opportunities that the release of Naija Web brings to the table. For developers, builders, and visionaries, I wish you all the best as you go on to build amazing products from this.