Steps to Contribute

Get mercurial hg
Get Visual Studio Express or higher.

Create a fork.
Create features that are missing, especially if they are unlikely to create merge conflicts.
Request me to pull code into the core.

Still too much coordination effort? See the extension points document.

Application Structure

Tables
The application uses a star schema. Since hits are log and read, not updated, hit counting has a lot of architectural parallels with a prototypical data warehouse.

All tables are prefixed and it is on the TODO list to make the prefix configurable.

hicmah_hits - The core fact table. The dimensions are user, date, time, server/app, user's computer, url/feature. The measures are response time, bandwidth used, and user time on page (i.e. time between hits)

hicmah_users - Hicmah makes best efforts to figure out who the user is if the application is using a known authentication method. If the authentication method is entirely novel, the Hicmah will fallback on deanonymization techniques.

hicmahuseragents- Normally user agent info is gathered for browser feature detection. Here, we are trying to detect the client's computer configuration so we can make some observations about error rates and performance , Key especially OS, Browser, Version

hicmah_urls This is the URL and it's sections as parsed by the Uri class. For a non-web app, the URL table is is a list of features or actions the user can invoke in the host application.

{hicmah_request_types} - These are the HTTP request types, GET, POST, etc.

hicmahqueries - TODO. A the query string part of the URL. The goal of this is to reduce the size of the hicmahurls table when there is only a few URLs but many query strings.

hicmah_invoker - Elmah can be invoked by a handler, module, or directly using C#.

hicmah_dates - This is a precalculated, localized table of dates, one row per day. The primary key can be calculated by counting days from 1/1/1970. This table is parsed using the System.DateTime class. It is on the TODO list to also use Noda times.

{hicmah_time_zones} - This is the Olson time zone returned by javascript.

hicmah_times - This is for reports showing the busiest times of the day, relative to the users (are they active in the morning at 9AM for their local time?) and relative to the server (is there anyone in the world, typically, on the server at 11PM in the server's time zone?)

hicmah_servers - This is a combination of the server's characteristics, such as memory, OS, CPUS, and the application's characteristics, such as assembly version number.

hicmah_cache - holds protobuf-net serialized query results.

Notes on Dimensions

Time, Date, Time Zone
The time for deciding how far apart two hits are is the Utc Time/Date. The other two interesting times are the server time/date and the client time/date, which each have a time zone. We are going to assume that if there is a web farm, then they are all in the same time zone. An organization doesn't have to be very big to span two time zones.

There are two competing systems for time zones, the Windows time zone, which is a numeric id and a localized string and the Olson time zone, which is a English string that isn't the same as the Window's one. We can determine the web client's Olson time zone, and can make best efforts to convert that to a windows time zone. These time zones evolve overtime as law change.

User's Computer and Browser
The information about the client comes from the User Agent and the javascript navigator and screen objects. The user agent often "lies" about the user agent because developers routinely assumed an unrecognized user agent string would be incapable of possibly all features. So new browsers had to pretend to be a pre-existing version. The user agent string is a poorly standardized mish-mash of client computer and browser information that can't easily be parsed except through large files. There are two competing versions, .browser files, which Microsoft doesn't really maintain and browsercaps.ini which is maintained by volunteers. Most developers interested in these are trying to decide if they can use a recently introduced browser feature or if they need to degrade or enhance. The day a new user agent string shows up, both of these strategies falls apart because the user agent string isn't really parsable. The latest strategy is to use libraries like modernizr to detect capabilities by running a small javascript application on page load.

For hit counting, browser capabilities is only one potentially interesting questions. If we are just trying to get statistics on the hardware and configuration, then it is good enough to collect OS, OS Version, Browser and Browser Version.

User
There are a lot of way to authenticate a user. Hicmah will make best efforts to identify the user without configuration. If a user can't be identified, then Hicmah will fallback to de-anonymizing a user, either by providing a cookie (preferably not) or a hash of request characteristics that don't vary from request to request. The fallback method is similar to a cookieless session id, but we don't want to change the applications URLs.

HTTP standard protocols.
Windows protocols. Iidentity interface often has the true name. There are multiple places where this might be set, e.g. Thread, WindowsIdentity.GetCurrent, HttpContext.User and so on. Sometimes they are always set, either with a service account or a real user, sometimes they are only set if a developer or the particular framework sets them.
ASP.NET membership providers. When they exist, this can be used.
Ad hoc methods. Strings in Session, custom HTTP headers, custom cookies.

Server and Application Configuration Dimension
(See elsewhere for info on how to configure Hicmah itself!)

Server information covers things like installed memory, OS and application configuration covers assembly version. These are lumped together because it this is trying to capture the applications ability to do things, such as serve up a page quickly and without errors. Some of these attributes evolve so slowly that it may not seem worth tracking. The advantage comes in where we can look at the relationship between performance and server for a web farm, or the performance or feature usage rates of various versions.

There are several competing ways to find out about the currently executing application. Applications run in an AppDomain, which can hold several Assemblies. Identifying which assembly is the "application" and which are libraries is the first challenge. In the case of a tiered/layered application, the application is a collection of libraries.

Once an assembly has been detected, there are several ways to identify the version, by attribute, file system time stamp, by size of file, etc. Some of these methods are very sensitive to changes, some are almost completely insensitive to changes.

Hicmah could plausibly run on many servers, or in the WinForms case, Hicmah would run on many clients. Depending on the Trust level available, data can be gathered from the Environment, My.Computer, ASP.NET Server object and WMI calls.

Trace

Hicmah is using TraceSource based tracing, i.e. relying on System.Diagnostics, plus other MSPL libraries that extend System.Diagnostics. The goal is for the prototypical use case, a trace will tell a story that makes sense, i.e. meets expectations.

To get trace going, uncomment the trace source sections as found in the integration tests web config and copy it to the application you are observing at the moment.

Trace has a significant performance cost. Do not trace in production unless you are actively looking for a problem.

To capture trace use DebugView, which will capture all trace.

Configuration

Hicmah will can use either AppSettings or the custom SettingsProvider for global configuration settings. The SettingsProvider loads and saves to the database. In the context of Hicmah, a "User" setting is a different website being tracked in the same database. Once an application is deployed, in intranet scenarios often the web.config file becomes the property and responsibility of the server administrator and inaccessible to the application administrator. web.config driven settings will be supported, but it is expected that most users will use the hicmah_settings table.

Last edited Dec 11, 2011 at 1:43 AM by matthewdeanmartin, version 12

Comments

No comments yet.