The application is a intranet web hit counter.

Expected Customers and Use Scenarios

.NET Maintenance Developer.  Joe the maintenance programmer needs to know which features in the application are used and which are rarely used and which are never used. We know which pages raise the most errors from our error log. There may also be some data history logging, security or user action logging going on, maybe some debug trace, but normally none of this tracks overall usage.

The IIS log is not available to everyone and is often the property of the server administrator.  The server often is shared among various applications may contain secret information, which could leak into the logs.

Joe primarily wants to know which bugs to fix first and which bugs to ignore entirely because the former are on highly trafficked pages and the later have no traffic. Joe may also make also realize traffic to certain pages has evaporated because the bugs are so serious that people have stopped using them. It’s important to emphasize that these bugs don’t always throw exceptions that the error log will catch, they could be slow pages or things the users have reported.

Joe also has to pick a deployment time for the next version. While he thinks midnight would be a safe time to deploy, in fact, he was unaware about the flurry of activity four time zones over and he knows for sure that midnight is the time when the most mistakes will be made by tired, sleep deprived staff.

Application Administrator. Jack is the application administrator. He isn't a programmer, but is one of the best and most prolific users of the application, often trains new users and provides input into new features.  The application administrator would like to report that the application is being successfully used and decide which features may need announcement or user training to get additional usage. Lightly trafficked pages may also be a “hidden feature” and need something to make it more discoverable, such as a more prominent placement in the navigation structure or user training.

User Management. Larry is a manager who doesn't use the application much himself, but the application is used by his staff. The managers of the users each year need to evaluate how much work each employee has done. If the application is a “sovereign application” that occupies most of the employees day, then measurable employee productivity is roughly equal to application feature usage counts.

Application and Infrastructure Funder. George is the IT manager. He doesn't use the application, but needs to decide if the application is too new to have much usage, is at its peak, if the application is so old it is no longer used, if it is has outgrown its hardware or if it is a good candidate for enhancement.

Problems that this will not solve.

The Hit Tracker will not solve the problems of security auditing, or business logic regarding history of data or user actions that have a special meaning for the domain of the main application. To address those problems the application would need to be able to log arbitrary events with an arbitrary number of attributes, which would make the hit tracker into a general purpose logging application.

Security logging has the special requirement of being non-repudiable. A security logger will log authorization and authentication attempts and failures in the hope of finding evidence of bad behavior. The remedial action for bad behavior might be a human resource action, like firing or reprimand, law suits, criminal proceedings. In all of these cases, one must be certain that the security log has not been tampered with and that when it says Jill recorded the malicious transaction, we must be certain it really was Jill and not some smear campaign. Hicmah will not address non-repudiation, but will provide some HTTP error code driven security reports.

Common features from internet hit counters that will not be implemented or de-emphasized:

  • Referrer. Users generally already know where the application is and were not referred.  The referrer will be used for tracing the users page to page motions within the application.
  • Google search terms. Since the application is on the intranet, this will not be found via google.
  • Country and State.  Country and state will not be inferrable from a internal router assigned IP. Generally intranet applications know where the user is stationed and do not need to guess.
  • IP tracking. IP address will be only meaningful locally.

Common restriction of a internet hit counter (e.g. Google Analytics)
    • JavaScript only invocation (an intranet app would not have this restriction)

Deployment Scenarios

Small Project Developer. Rob is a hobbyist. He needs to be able to drop in as few files as possible into his website, of which he controls very little, maybe configure two or three things and then visit the hit counter page.  If there are too many barriers to trying it out, he won't try it out at all.

Technical Notes. Rob's application can be expected to write to App_Data and that is it. His app is on shared hosting with Medium Trust. There may be access to MSSQL, MS-Access. He can deal with any embedded database as long as he doesn't have to deal with it directly.  His application has few users but if it uses too much memory or CPU his account will be cancelled. If his application needs too much memory or CPU, it may not get it on account of so many other websites being on the same machine.

Rob will need a ClickOnce trial that will be able to exercise the application demo without any compiling or website configuration.

Intranet Application Developer.  Barry is an intranet application developer and has the same needs as Rob (above). Except he can and will need to do custom logging strategies, so that features (button clicks) are counted as well as just pages.

Barry may need to customize the code to make it comply with internal office policies.

Barry will need a variety of licenses including a closed source license as well as a MIT style license since office policies can limit the use of “open source” code.

Infrastructure Manager. Zoe is in charge of infrastructure and needs to figure out which applications are in use. They have very limited abilities to change the source code of existing applications and still would like to gather information about the various applications under their control.

Technical note. Barry will not be allowed to store any data anywhere except a MS-SQL instance. On most days the WINS and DNS servers do not work. Numerous security oriented proxies, routers and firewalls have reduced the network speed to a unreliable crawl. The application need to be able transverse the infrastructure without relying on sticky sessions.  The application will be behind a firewall and will not be able to call web services or other resources such as Google Charts.

Functional Requirements

The hit counter will record a hit for each page visited. It will not record a hit for every resource loaded (i.e. no tracking of images, css files, etc)
The hit counter will be able to record a year’s worth of activity in something less than 1/4GB
The hit counter will report on space used by hitcounter tables
The hit counter will allow for
archiving (copying to another table, to keep main table small)
truncating (removing stuff older than n days, as set by config)

Test Sites
A test Web Forms Site- hit counting is more page centric
A test MVC Site- hit counting is more request centric.

Non functional Requirement
The hit counter will guarantee that the hit counter log will grow no larger than a specific size.
The hit counter will have the option of going above that size if a minimum
The hit counter must not add any additional load time to the page.
The hit counter must be testable, which will be done by using a crawler.
The hit counter must allow for effortless database set up—either as an embedded db or as an installation script that can discover and install missing tables, etc.

Technical Specification

Now we shall start naming specific technologies.
We will want to implement the hit counter in a variety of ways:

> a jquery plugin
> a javascript free HTML (image tag)
> a http module – intercept request, record it.
> C# invocation for button click tracking. Not all page hits indicate a feature usage
>SQL procedure


The User Interface Options (For viewing charts, graphs and updating settings)
Ashx File – Only requires web.config edit and putting dll in right place.
MVC – Requires MVC host
ASPX – Easiest to understand. More complicated deployment (more parts)
Pages
Will need a setup page to either show diagnostics or do the config for the user.
Will need to have a config page to change behavior.

Reports with graphs.

There will be a variety of graphs, but at least these:

  • bar charts
  • time series charts
  • pie charts
  • grids of data

Not Especially Relevant
traffic source - not search or referrer driven, nor ad driven
(referrer might be used for illustrating one users activity for a day)
bounce - not search engine driven
loyalty - fixed audience
network properties - ISP name is fixed
map - internal IP addresses, fixed audience
MAYBE- mobile - internal IP addresses.
conversion (getting user to BUY!)

Extension Points

Even with distributed source control, extending an application by modifying the base code is asking for too much coordination with other developers. So there will be a variety of extension points for adding functionality to the application without needing to fork the code base.

Abstract classes and Interfaces
Hit counter plug ins for an additional recording point
Report Plug ins- for custom reporting
Plug in model will be both a sort of “discover and run” and a plug in via partial classes.

Tools and Dependencies
Framework dependency: 3.5. 

 http://archive.msdn.microsoft.com/SharpDOM  - Instead of embedding aspx (compliates deployment) or writing the the HtmlTextWriter (ugly code), we should use SharpDOM.
http://ninject.org/download

Chart Options
http://code.google.com/p/flot/ Jquery library
http://www.stevefenton.co.uk/cmsfiles/assets/File/charts.html  Barchart library
http://www.jqplot.com/tests/line-charts.php Plotted x-y charts, plus many other and zoom!
http://xaviershay.github.com/tufte-graph/  Barcharts only?
http://www.omnipotent.net/jquery.sparkline/ For displaying very, very compact charts

Dimensions, Measures and Metrics

This section outlines what facts, dimensions, measures and metrics will be gathered and reported.  For this document, facts are hits. Measures are measurable things about the fact. Dimensions are way to categorize and sub-categorize the facts. Metrics are ratios and calculations based on the measures.

The main fact table records one line per request, per resource, unless the javascript invoker is used, in which case there is just one fact per html page.

Raw hits per day. (For capacity planning)
* Distinct user count per day. (Customers served, independent of users usage patterns)
* Concurrent users per time window. (For capacity planning, licensing planning)
Longest period of no use/minimal use per day. (e.g. no use from 7PM to 6AM next day)
* Top #n users for given time window. (Likely best testers, source of user feedback)
* Bottom #n users for given time window. (May need training)
* Top /Bottom #n urls for given time window.
Top /Bottom #n urls by page title (Requires javascript invoker!)
Top /Bottom #n urls Average bytes, Average client time, Average server time
Session Log - Tree layout of path taken by 1 person, linked by referrer to url, time between each page, etc.

Status Code Dimension (Based on status codes-- new column)
Url with worst ratio of Http 4xx/5xx to Http 200. (Buggiest pages, assumes app use HTTP codes and custom pages instead of ordinary 200 + page that says “bad things happened”)
List and count of 404s (Broken links)
Users with most 403s (Hacker or expects more rights then they have)


Performance related:
Which pages are slowest?
Which browser is the fastest?
Which querystrings are the slowest? (for pages with query strings)
Is average response time increasing/decreasing (for aspx pages, not static!)
Who is experiencing the slowest perf? (poorly config’d computer? different workload?)

Calculated Hit measures
Time on page (difference between this pages time and the earliest next hit by same user with same referrer)

User metrics:
Hours “worked” max(time of day)- min(time of day). Expected to be less than 8 hrs!
Hours at website (sum of 20 minute blocks with at least 1 hit within 24 hours).
Count of Especially productive pages (create.apsx) or pairs (create.aspx -> edit.aspx)
Click stream
- pages visited. Minutes between transitions
- *maybe* tree structure for parallel sessions

Correlation:
perf and time of day
projected usage (time series?)
application stats:
for given version, how many pages got hit (i.e. exist)
for given version, what is the usage rate/bug rate?

Application-Server Dimension
Assembly Version
Server Name
Server Memory

Time Dimension
Day
Week #, Day of Week
Year, Month,  Fiscal Year
Formatted Date Name
Calendar Quarter
Fiscal Quarter

Time Series Metrics
- E.g. time to first use of feature vs characteristic of user


Relevant Ideas from G Analytics
Session Length (avg pages visited in day, minutes per day)
# page views (total usage stats)
# online "now" (in last 20, 60 minutes)
# new people for time frame (first ever usage)
# unique users for time frame

Technical Profile
Browser, Screen Colors, Screen Resolution, Flash, SL, Java Support

User ID Dimension
- Use windows auth first. Detect commonly known service accts and treat as anonymous
- MAYBE use object with property in session as set in config (req reflection)
- De-anonymize anonymous users by giving them a name that links sessions of the same anon user together. (hash of user constant characteristics of the request)
- When available, ASP.NET membership data will be used.

Feature Dimension
- Extract folder name as proxy for “module”
- Extract controller/action for MVC
- MAYBE extract event name as proxy for button click
The hit counter will be able to collect web hits from web forms and MVC style applications. The difference is that a web form application will have a GET + multiple POST-back pattern for a single page, and an MVC style application may a variety of URLs mapped to views and actions. So support this, there may need to be multiple ways to invoke the
There will be a variety of reports for the following dimensions:
Page – Feature, Action, QueryString
User – Name + custom attributes
Browser/Computer – Browser, OS, Screen Resolution, etc
Time – Hour, Day, Month, Year

Last edited Dec 13, 2011 at 6:35 PM by matthewdeanmartin, version 7

Comments

No comments yet.