Bridging the Internet’s Digital Language Divide

Around half the world’s population still lacks access to the internet. Companies like Facebook, SpaceX, and Amazon want to change that by launching constellations of satellites into the sky, which will beam internet back down to Earth. But even if these projects succeed, tech giants may face a more fundamental problem in bridging the digital divide: language.

There are thousands of different tongues spoken around the world, but most of the content on the web is only available in a select few, primarily English. More than 10 percent of Wikipedia is written in English, for example, and almost half the site’s articles are in European dialects. Getting one billion more people online is often held up as the next major milestone, but when they log on for the first time, those users may find the internet has little to offer in the primary languages they speak.

“Approximately 5 percent of the world speaks English at home,” said Juan Ortiz Freuler, a fellow at the World Wide Web Foundation, during a panel at the RightsCon conference in Tunisia Wednesday, but around “50 percent of the web is in English.” Freuler argued the internet has facilitated “cultural homogenization,” now that the majority of its users rely on Facebook and Google, and communicate in the same dominant languages. But the problem “is not because of changes in technology,” said Kristen Tcherneshoff, community director of Wikitongues, an organization that promotes language diversity. Corporations and governments largely didn’t provide the resources and support necessary to bring smaller languages online.


Recommended For You

WP Freshstart 4 PRO - Unlimited Sites License

Wordpress Automation Software

PayMember UNLIMITED Campaigns

Paymember is a membership site software like you've never seen before. It hooks up with your Paypal account to 100% protect your content and give you massive benefits over other members sites.

Green Screen Academy

The 3-in-one Green Screen Suite that gives you all the tools to transform your videos, adding a touch of class to your video production.


Many of the biggest online platforms were founded in Silicon Valley, and started with primarily English-speaking user bases. As they’ve expanded around the world and to different languages, they’ve been playing catch-up. Facebook has faced criticism for not employing enough native speakers to monitor content in countries where it has millions of users. In Myanmar, for example, the company for years had only a handful of Burmese speakers as hate speech proliferated. Facebook has admitted that it did not do enough to prevent its platform from being used to incite violence in the country.

Another part of the problem stems from the fact that relatively few datasets have been created in these languages that are suitable for training artificial intelligence tools. Take Sinhala, also known as Sinhalese, which is spoken by around 17 million people in Sri Lanka and can be written in four different ways. Facebook’s algorithms—trained primarily on English and other European languages—don’t map well to it. That makes it difficult for the social network to automatically identify things like hate speech in the country, or stop the flow of misinformation after a terrorist attack.

But Tcherneshoff says language diversity is about more than just practicality, it’s about expression. Jokes, emotions, and art are often difficult, if not impossible, to translate from one language to another. She pointed to projects like the Mother Language Meme Challenge, which invited people to make memes in their native tongue for Unesco’s International Mother Language Day in 2018. The idea, in part, was to demonstrate how humor is often intimately tied to language.

Mozilla is one organization working to crowdsource language datasets that can be used by any developer for free, like Common Voice, which it claims is “the world’s most diverse voice dataset.” It includes recordings from over 42,000 people in dominant languages like English and German, but also Welsh and Kabyle. The project is designed to give engineers the tools they need to build things like speech-to-text programs in different tongues. Mark Surman, executive director of the Mozilla Foundation, believes open source datasets like Common Voice are one of the only viable ways to ensure more language diversity in emerging tech. At for-profit companies, the issue “falls very low on the economic ladder,” he said during the RightsCon panel.

Bringing more languages online may ultimately be an exercise in cultural preservation, rather than utility. Despite advocates’ best efforts, it’s unlikely there will ever be as many websites in Yoruba, say, as there are in French or Arabic. New internet users may simply opt to browse in their second or third language instead of their native tongue.

At the same time, corporations like Google have built programs that make it easier to access online content in different languages, like Google Translate. Google also gave some of its tools to Wikipedia to help translate articles, although they still require careful review by native speakers; Wiki editors have complained that the Google tools sometimes produce shoddy results. For the time being, promoting language diversity online still requires the concerted effort of humans.

Original Article : HERE ; This post was curated & posted using : RealSpecific

This post was curated & Posted using : RealSpecific

Thank you for taking the time to read our article.

If you enjoyed our content, we'd really appreciate some "love" with a share or two.

And ... Don't forget to have fun!



XNiche360 is a Bundle of Most Demanding Top 10 Local Niches 10 Niche Websites 10 Social Media Banners 10 Unique Logos 10 Email Swipes 10 VSL embeded in Websites

Adbuddy Definitive Addition

This upgrade gives you the ability to launch campaigns from RIGHT INSIDE of AdBuddy straight into facebook without actually logging into facebook.

Smart Video Metrics (Pro Package).

See How I genereted over $32,502.90 from just 2 VIDEOS in Less Than A Week...

Leave a Reply