Metafilter Usage 2010-2019
June 29, 2019 7:55 PM   Subscribe

Metafilter Usage 2010-2019
Graphs of the number of posts and comments, the number of unique posters and commenters, and users joining and leaving the site from January 1, 2010 through Jun 15, 2019. I may add some more later if I come up with any other interesting metrics.
Role: datawonk
posted by Tell Me No Lies (6 comments total) 5 users marked this as a favorite

I'm guessing that you're estimating last active date from the most recent comment in the Info Dump per userid?

I'd bet that the spike this spring is (mostly) due to recency bias and goes away if you drop userid that have had a longer 'hiatus' than the one they're currently on. This is because a naive parser would see ever user who posts today as having their last post today (by definition) and, therefore, a huge spike for every successive value of $today.

Some untested pandas pseudo code:
(
    comments #a data frame containing userid & Date columns for every comment

    #assign a numerical value to the datetime
    .assign(Ordinal=lambda df: [datetime.datetime.toordinal(date) for date in df.Date)])

    #sort oldest to newest 
    .sort_values('Ordinal')

   #look at one userid at a time
    .groupby('userid')

   #calculate each micro-hiatus; the gaps between comments. Drop every userid that isn't currently on their longest hiatus
    .apply(lambda df: df.assign(Hiatus=[b-a for a,b in zip([min(df.Ord)]+[v for v in df.Ord.values],[v for v in df.Ord.values]+[pd.to_datetime('now')])]).pipe(lambda df: df[df.Hiatus == df.Hiatus.max()]))
)
plotting these 'last seen dates' should avoid the recent spike. It will also miss everyone who nope'd outta here.... but did so after a longer (previous) break.
posted by mce at 9:00 AM on June 30


I'm guessing that you're estimating last active date from the most recent comment in the Info Dump per userid?

Yes.

I'd bet that the spike this spring is (mostly) due to recency bias and goes away if you drop userid that have had a longer 'hiatus' than the one they're currently on. This is because a naive parser would see ever user who posts today as having their last post today (by definition) and, therefore, a huge spike for every successive value of $today.

Everyone on that graph has been gone at least 90 days, but I suppose you're correct that could be a regular thing for some people.

For that matter "average time between comments" could be an interesting metric. Let me poke about.
posted by Tell Me No Lies at 10:07 AM on June 30


Wow. 4247 users *average* 90 day gaps, and 13081 users (of the 20974 active between 2010-2019) have had gaps over 90 days in length. I think you're right, your algorithm will smooth out the end of the Last Seen graph nicely.
posted by Tell Me No Lies at 10:57 AM on June 30


The new method yields a significantly different graph -- it stays even at 5 from 2010-2014 and then does a linear decline from 2014-2019. This also seems sort of artifact-y to me but I can't put my finger one why. Declaring everyone who's gone a year without posting as having left makes for a flat graph.
posted by Tell Me No Lies at 12:31 PM on June 30


Interesting graphs! It would be nice if there’d be some way to see people who read but don’t comment (but I don’t think there is)—I know that I seldom comment or post, but I read MetaFilter every day...
posted by leahwrenn at 9:05 PM on July 7


I’m glad you enjoyed them.

Unfortunately the readership numbers aren’t published, although I imagine they are followed pretty closely as serving ads is where Metafilter makes its money. Hopefully readership has not seen the same slide as membership.
posted by Tell Me No Lies at 11:25 PM on July 7


« Older 10 Life Tips from Henry the Dog...   |   On the internet, nobody knows ... Newer »


You are not currently logged in. Log in or create a new account to post comments.