Taking a stab at making a privacy-preserving analytics thing for my blog.
I'm hashing the user's IP and User-Agent, so they're never stored directly. Not 100% accurate, but good enough for me to work out whether a page-view is unique or not.
Country is very rough - using GeoIP.
Referer is either from HTTP or if the linking site uses UTM.
Seems *broadly* accurate compared to other plugins.
I'm not sure of the performance if this hits a million rows. Might need further DB tables.
Getting there! I can now (inefficiently) see how many page views and (roughly) how many unique users have visited my blog.
I can also see where people are coming from on a specific day.
(Remember, this is just for me to noodle about with. This isn't going to be an amazingly accurate, professional-grade plugin.)
As pointed out by @DamonHD - the HTML <meter> element also works well!
Bit of CSS to rotate it, table for layout, job's a goodun!
Might have gone a *little* overboard! I rather like this though.
Self-hosted analytics showing a graph of the last 7 days.
Today's top referers, pages, and countries.
Super simple and suitable for my needs.
Next up - import historic data and work out if I want to keep countries & referers in it.
I'm basically farting around with the design because I don't want to do the hard work of periodically consolidating the database.
I *think* for now I just need a separate table of:
Date, Views, Users.
I can increment the Views every time the page is visited using something like `ON DUPLICATE KEY UPDATE visits = visits + 1;`
But deduplicating the users will be harder. I don't want to run a daily cron job for this.
I think I've cracked the problem - MySQL makes my brain hurt.
Table 1: Logs *everything*. Date, post ID, user, etc.
Table 2: Is date, view count, user count.
When (1) is updated, an AFTER INSERT trigger increments the view count for that day.
It also counts the number of unique users for that day and updates (2).
Table 3 is the same as (2) but on a per-post basis. It also updates after insert.
That way, I don't have to use logic in my PHP app - all it does is fire an insert.
@Edent be careful with 'UPDATE count = count + 1" patterns. They're unlikely to accurately count as you expect due to concurrency issues. Especially in MVCC systems, it will depend alot on transaction isolation levels.
For this kind of thing, just logging the visits as records and aggregating on analysis / periodically is often the better approach. Removes a lot of the concurrent write headache and give you more flexibility.
@intrbiz thanks. Is there a sensible way to schedule those sorts of reconciliations?
@Edent not super familiar with Maria/MySQL, more a PostgreSQL person.
But maybe: https://mariadb.com/docs/server/server-usage/triggers-events/event-scheduler/events
@intrbiz aha! Great, thanks 🙂