home · thoughts · recipes · notes · me

Why (Not) Peer-to-Peer?

The smartphone revolution undoubtedly contributed significantly to the exponential economic growth of this decade: from the apocalyptic rise of social media to the decadent explosion of video streaming, the always-on, always-listening devices that are ever-present in our pockets are the communication conduits of the future.

A Decade of Consumption

Though the small screens of early smartphones were not meant for consuming rich media like Instagram Stories or Netflix streams, the massive, high-resolution screens of today are what drive over half of the Internet’s entire usage. This rich media is both bandwidth-heavy (as anyone on a throttled data plan will know) and, more-importantly, unchanging. The images uploaded to your Instagram profile are unnecessarily redistributed to all of your followers’ devices: each device requests a fresh copy of your latest creative expression from Instagram’s servers when it refreshes the feed.

This superfluous duplication leads to a massive amount of resources—millions of dollars, CPU cycles, CO2 emissions, battery percentage points, etc.—dedicated to doing the same thing (serving an image) over and over again needlessly. This begs the question: Is it possible that the decade ahead will enable forms of unparalleled creative expression that are only made possible through unfettered waste and a severe lack of control?

With most of the Internet’s content being needlessly controlled by a handful of companies, its overall health and future potential is at risk; we should look to alternative approaches to content distribution to explore reducing our current levels of brittle unscalability.

Who’s Using the Internet, Anyway?

Let’s look at a small sample of the Alexa’s US Top 50 to see just how much content is being needlessly duplicated around the world.1

  • Google (#1), the advertisement empire of our generation. Napkin math tells us that for the homepage alone, we’re looking at 24 terabytes of bandwidth transferred every year;2 that’s like having 46 years worth of music content in your library. Now, is there just one person looking at the Google homepage at any point in time? That seems astronomically unlikely… couldn’t we save time, money, and resources by pulling a page out of the Marxist playbook and sharing the homepage amongst ourselves?
The Google homepage

Does the Google homepage really change so often that we need to ask them for it every time?

  • YouTube (#2), the single website making up over a tenth of all Internet usage and receives uploads of 500 hours of new content every minute. Many videos are either never or rarely watched (in spite of wholesome efforts like PetitTube that aim to shed light on this untouched corner of the Internet), while others garner millions of views. This means hundreds of millions of people watching separate, uniquely distributed copies the same content: how often is there just one person watching a single video at any point in time? That’s probably a rare occurrence for any reasonably-popular video… couldn’t we save time, money, and resources by pulling a page out of the Marxist playbook and sharing it amongst ourselves?

  • Netflix (#9), simultaneously the world’s favorite escape from reality and pretext to chill, makes up 15% of the world’s total Internet usage. Just like with YouTube, this means every person watching the latest season of Peaky Blinders is downloading their own unique copy from either Netflix or their ISP, when they could be sharing bits with their fellow “Roaring 20’s”-addicted neighbor.

  • Instagram (#20), the ultimate Skinner box, hosts some 50 billion photos and grows by nearly 100 million a day, while its parent company Facebook (#4) sees 100 million hours of video watch-time a day. To reiterate an earlier point: when you start an IGTV video feed, it’s downloaded by each and every one of your followers from Instagram’s servers when they tune in. Imagine if the feed was instead organically distributed among your followers themselves; they could efficiently “share” it with each other instead of relying on the central authority of Instagram to feed them content. Not only does this make it resilient to censorship (a problem we thankfully rarely face in the US), but it also reduces latency (after all, your followers are likely closer to you than Facebook’s servers) and bandwidth requirements.

  • Apple (#22) and the rest of the music streaming conglomerate (Spotify [#36], Amazon [#3], Pandora, etc.) represent another domain rife with unwarranted duplication. When the entire country is playing Mariah Carey’s “All I Want For Christmas Is You,” why aren’t they streaming the songs off of each others’ devices rather than each individually downloading them from Spotify’s servers?

In essence, a handful of centralized services dominate ownership over the entertainment we all consume. Regardless of whether we’re talking about Facebook and its dominance on user-submitted images and videos, Netflix and Google’s control over video streaming, or Spotify and its competitors’ rule over Music Kingdom, the conclusion is the same: the billions of people that use these services are all downloading content from servers that have already distributed it to the world a dozen times over.

Centralization vs. Democratization

The waste fueled by duplication is not the only worrisome aspect of these content conglomerates: we should also consider the impact on the freedom and availability of the Internet these behemoths have by controlling the majority of our media consumption.

I’m not even talking about the social impact of an increasingly-isolated musical monoculture, or the psychological goldmine of insights that Instagram and Facebook content provides for political manipulation; I’m talking about a third of the Internet (including Spotify and Netflix) going offline for 4 hours because of an Amazon cloud outage.3 I’m talking about Google strangling creators if their content doesn’t align with an intentionally-ambigious vision or have profit potential. I’m talking about the majority of software development and the backbone of the Web grinding to a halt under denial-of-service attacks. I’m talking about how society has ignored one of the first secure design principles: don’t have single points of failure.

Our current level of content centralization is not only concerning, it’s flat out dangerous, and the cracks have already begun to show. In the next decade, as the exponent in Moore’s Law decreases and the amount of content on the Internet doesn’t, the system that allowed a few hands to concentrate the means of content distribution (and with it, our access to culture, current events, and knowledge) will teeter. It’s imperative that we explore technological alternatives; under a peer-to-peer model, the only way a system goes fully offline is if it no longer has any participations.

So, Why Not P2P?

It should not be surprising that BitTorrent and other P2P traffic accounts for a major chunk of global bandwidth; up to 20% by some metrics! It’s an efficient, effective, and democratic means of distributing content (ignoring the fact that it’s used extensively for piracy).

The answer to why we aren’t using this powerful architecture everywhere is likely multi-faceted, and we can unfortunately only hypothesize. There are likely to be both financial and technological limitations at play: building new systems (and especially peer-to-peer systems) is expensive, time-consuming, and challenging.

Complexity

There are many who ask the same questions presented in this post and have gone a long way toward reducing the modern Internet’s waste; for example, PeerTube is a YouTube alternative that distributes video streams across all viewers to avoid wasting bandwidth.

Developing a separate product to adopt peer-to-peer technology is by no means necessary: both Spotify and Netflix have explored integrating the concept into their existing infrastructure. In fact, Spotify started out with peer-to-peer distribution… By their own admission, a peer-to-peer model that offloaded streaming from its servers to its users reduced their bandwidth requirements by 35%. Why did they abandon the project? Netflix has allegedly had a plan aiming for P2P distribution for over 5 years, yet we’ve still yet to see it come to fruition.

Is complexity the root cause? There’s no doubt that handling issues with synchronization, copyright, and security is far more difficult with a P2P model. High-quality distributed systems engineers need time and money; the investment required for a reliable implementation is incompatible with the fast-paced, (often) short-sighted Agile methodology that is so pervasive in the software industry today.

Money $= \sqrt{\text{evil}}$

That leads directly into our second “probable cause”: it would be completely unsurprising if money alone lay at the root of the issue.

The inkling of information that the company gains from “seeing” each piece of content get downloaded is valuable. When a single sponsored Instagram post can run up a $100,000 asking price, it should suprise no one that guaranteeing each view 👁️, like ❤️, and impression through a centralized, trusted authority (that is, through direct interactions with Instagram’s servers) is an essential part of locking in those sweet, sweet advertising dollars.

Hence, it must be the case that this income (along with the expense of engineering such a system) outweighs the massive extra server costs associated with delivering content from a single, centralized location.

Privacy

A common concern of peer-to-peer applications is that it exposes user IP addresses. Though a legitimate concern, it’s not an imperative consequence of a P2P architecture: it’s possible (and desirable) to design a peer-to-peer system that preserves user privacy. On the extreme end of the latency vs. privacy spectrum lies Tor, which re-encrypts its packets at every hop to keep everyone unaware of the destination beyond the next hop, but there are plenty of compromises that can be made to still keep IP addresses relatively private without resorting to such extremes. Tunneling or routing packets through multiple users,4 provider-owned “supernodes,” and other technical solutions can help keep us private while also letting us participate in a peer-to-peer sharing scheme.

Control

Like dragons hoard piles of gold 💰, tech corporations hoard data and users 🗿. And much like a dragon would destroy a village before it let someone touch its treasures, a tech company would sooner shut a product down before relinquishing control over its precious monetization streams. When we concentrate the power and the content in one company’s hands, only the company benefits.

Conclusion

At the end of the day, it’s likely to be a linear combination of the aforementioned issues, and short-sightedness is likely at the forefront. Why invest in efficient content distribution methods when you’re making money hand-over-fist? Tech stocks have consistently hit all-time-highs every year.

Why invest in user privacy when it’ll only temporarily impact your annual revenue by a paltry 8% after the biggest data scandal of the decade?

Why invest in security your stock price recovers a less than a year after leaking the financial data of half of the US population?

Let’s look past the fleeting dollar signs of the present and take in the writing on the wall. Without drastic changes in how we approach content distribution in the future, the exponential growth we’re bound to see continue in the coming decade threatens to topple our fragile centralized ecosystem, leaving our ❤️-addicted populous circus-starved, our culture frozen (or even lost) in time, and our tech-dependent workforce completely immobolized.


  1. To keep this post (mostly) family-friendly, this footnote will be the only acknowledgement that 30% of total Internet usage is driven by porn. That’s right, one-third of the Internet is dedicated to watching others bump uglies. And in coherence with this post’s thesis, this means millions of people simultaneously watching copies of identical content when they could be sharing a single copy, instead. ↩︎

  2. Obviously, Google itself isn’t serving all of this data so naively. Our browsers, ISPs, CDNs, etc. all make their best effort to cache repetitive content to avoid waste. That’s beside the point, though, because at the end of the day, this 24TB of data is still being transferred across a few handfuls of sources to serve ~10KB of unchanging content. Those centralized solutions are bandages on bullet holes. ↩︎

  3. Mondo has a good brief article that dives into the problems of content centralization in the context of the AWS outage. ↩︎

  4. A few years ago, I started writing a library that would enable any application to achieve peer-to-peer routing relatively easily. The idea was that packets can be efficiently routed between peers using a DHT-like approach without necessarily exposing IPs to the entire swarm. It did not get very far out of prototype phase, but a serverless IM client proved that the concept was at least viable. ↩︎