Estimating Community Size & Impact
Table of Contents
Introduction
On the Internet, “value” is a numbers game. The worth of a community or person is often judged by the number of followers they have. However, the numbers do not tell the full story. A community is often more than simply the sum of its “clout”—the impact on its members and participants, the relationships formed, the memories shared, and the knowledge learned cannot easily be quantified. But, to sate my personal curiosity, and also that of others, I will attempt to put a number to the size and impact of the Academic Weird Facebook community.
The pages that comprise the current Educational Memepages Dataset are simply what I have personally collected while traipsing around Facebook, and do not even come close to being exhaustive or reflective of the community in question. I regularly come across pages I’ve never seen before, and the dataset is being progressively updated and expanded, so keep in mind that the estimates thoughout this essay are likely the most conservative tip of the iceberg as far as the actual size of this community goes.
Please read the Overview of the Dataset if you are interested in the qualitative decision-making involved in the curation of the dataset. This essay is focused on exploring it numerically and assumes the reader already has a background on the ontology of the project.
Also, please read this essay on desktop. The interactive data visualizations have not yet been adjusted for mobile.
Number of Pages & Groups
The dataset consists of …counting facebook pages and groups. Here is a breakdown of the number of pages in each subject category. You can hover over a section of the pie chart to get exact numbers for that section. Also, you can click on the label text at the top to show/hide different sections.
Loading Data...
Pages vs Groups
Breaking down the dataset further, there are …counting pages and …counting groups. Below is a breakdown of the ratio of pages to groups in each subject category. You can hover over a section on the bar chart to see the exact numbers for that section. Please keep in mind that the low numbers of groups are not necessarily reflective of the community, and have more to do with the limitations and bias of the data collection method than with the actual distribution in the community. More groups will be added eventually.
Loading Data...
Community Size Estimation
There are many ways to measure the size of an online community. I will start by looking at the number of page likes as a measurement of “size” and then expand that definition to look at other factors.
To get my estimates, I am working with a sample of …counting pages. (note: This sample was collected in May 2021 and does not represent the live stats for the communities. However it presents a snapshot of the relative sizes at the time it was collected.)
If you add all the sizes of the sample pages together you get a total community size of …counting, or about …counting million people. However this number is not neccessarily accurate for several reasons:
- Overlap - There is at least some overlap in the user bases/participants of most of these communities. By overlap, I mean that the same user can like multiple pages, and end up being triple-counted when the sizes are simply added up. I do not have specific user data, so I am not able to estimate how large the overlap is, but, especially within networks related by subject (e.g. philosophy pages or math pages) the overlap is likely very significant, but even between different subject groups, there is still likely some overlap, albeit of a lesser degree.
- Active Participants vs Lurkers - There is a long tail of some powerful outliers—the handful of pages with millions of followers—which do not reflect the much smaller number of active participants that are regularly involved in the “core” of the community. Of course, where we draw the line between active members and random passersby that happened to like the page is a bit subjective, because:
- “Likes” are not reflective of people reached - There are people who interact with pages without liking them, or who get exposed to content through friends or other pages sharing content from these pages, or are otherwise part of the weekly page reach of these communities—the broader influence of the educational meme community within the greater internet memeosphere and beyond—that wouldn’t be captured by these numbers.
- Participation in Groups vs Pages - The likes data I have is from pages, not groups. Data about group participation would probably be more reflective of regular participants, but I don’t currently have it. Check back in later and I might!
Page Size Distribution
Let’s take a look at the data I do have in a bit more detail. Here is the distribution of the sizes of pages in my sample:
Loading Data...
(Note that the y-axis is in log scale, while the x-axis is linear. When the y-axis is also linear, the curve of the bars tapers down so rapidly that it just looks like one bar on the left that instantly shrinks to nothing. You can mouse over a bar to see the exact number of pages in a given bin.)
As you can see, the sample seems to be roughly following a power law distribution—there are a lot of smaller and mid-size pages, and a long tail of huge ones. This is not surprising since the barrier to entry is low, but it takes a lot of talent and dedication to grow a page to a large size, which does not have a hard limit. But what happens when we use a log scale on the x-axis?
Loading Data...
It appears that the graph is roughly normally distributed around the median of …counting likes in the log domain. There is a bit of roughness to the data because Facebook truncates the page likes that are displayed. Now that we have a sense of the distribution of the dataset, we can try to address the question of accounting for overlap in our community size estimation.
Accounting for Overlap
Without specific user data, it’s hard to account for user overlap. However I will attempt to get a rough estimate for illustration purposes. The community size of …counting combined users that you get when you simply add up all the pages assumes that there is 0% overlap. This size is the maximum possible community size given the current sizes of each page in the dataset, but it is quite unrealistic because it is likely counting many of the same unique users more than once. Given that we don’t know the “real” rate of overlap, the only way to be completely sure that we are not double-counting any users is to assume that there is 100% overlap—this means that the maximum size of the community would be limited to the size of the biggest page in the dataset: …counting unique users. Of course, this is also unrealistic because it is very implausible that every single person that likes another page in the set also likes this one page. The real answer is probably somewhere in between these two extrema.
I’ve made a tool to intuitively explore the size of the community using simple visual overlaps. The areas of the bubbles represent their relative sizes, and the total community size is calculated by subtracting the area taken up by the bubbles, from the surrounding whitespace. Use the Overlap Topics and Overlap Subjects buttons to toggle overlapping either, none, or both categories, and then click the Calculate Size button to receive an estimate based on the overlap shown. You may need to jiggle the bubbles a bit to help the chart reset itself. Hover your mouse over a bubble to view the full name of the page.
a
b Likes
Loading Data...
Total: _ Unique Users
Key
To calculate the community size I am simply scaling the area of the overlapped circles compared to the possible max (0% overlap, i.e. summing all the page sizes together) and min (100% overlap, i.e. taking the size of the largest page in the set) of the dataset.
There’s an element of randomness to the process, since the area is calculated instantly based on the current arrangement, and at a relatively low degree of precision. This is not a particularly rigorous way to calculate overlap, but it does convey a visual intution about how much the estimate changes when you overlap different parts of the dataset.
Community Reach & Engagement
Facebook “reach” simply means the number of people who saw a given post1, or, another way to put it, how much the post was “seen” by people. There are two types of reach: paid and organic2, the latter of which means that the content was interesting enough to be shared around without needing to be artificially boosted. Businesses often pay Facebook to increase their reach, however it’s unlikely that many memepages have a budget for that, so I will be focusing exclusively on organic reach.
Here’s some basic statistics about organic Facebook reach:
- The average facebook page shares 1.55 posts per day3
- The average reach of a Facebook post as a percentage of the size (number of likes) of the page was 6.4% in 20194 and 5.2% in 20205
It’s likely that memepages have higher than average organic reach because they are primarily content creators, not businesses trying to market an external product, but they also don’t necessarily post with the same regularity as would a business. I will just pick a nice round 6% to use in my estimate. The daily post reach R of the entire community can be estimated by adding together the individual reaches of each page (based on the page size) with the following formula:
We can also scale this to different time intervals. Here are the number of times Academic Weird Facebook content is seen:
- Weekly: …counting times
- Monthly: …counting times
- Annually: …counting times
These estimates are based on the daily number of users reached, which are not necessarily unique users, especially in the aggregate.
Conclusion
From the estimates above, it’s clear that this community is quite large, active, and influential. For a somewhat abstruse meme community, content created and shared by Academic Weird Facebook is “seen” a whopping billion and a half times per year. However, when you take into account that this number is exclusively on the Facebook platform and doesn’t account for the content reposted to other platforms, reposted on other pages not included in the dataset, on people’s personal profiles, saved to computers, or aggregated by google as well as targeted meme aggregators, it’s only the tip of the iceberg regarding the size and influence of this community on the broader internet.