How to Measure Quality of News Radars

February 14, 2006 at 9:16 pm | Posted in Unscientific Research | Leave a comment

News Radars are topic-specific high-focus information channels allowing to syndicate RSS feeds dedicated to the same subject (e.g.,”accrual accounting”, “Pink Floyd” or “Boston Red Sox”).

A News Radar is a constantly updated thematic channel of highly relevant web references that are gathered in accordance with single or multiple specific, persistent search criteria. Radars can focus on anything: topics, people, opinions, products, news items, events or passions. The constant updating of the channel is accomplished by leveraging the RSS technology to its full power.

News Radars also called Feed Digests, Feed Channels, Newspages etc. depending on a RSS aggregator’s vendor or a Web 2.0 site.

Creation of News Radars requires tools allowing presentation and administration of RSS feeds, comprising a Radar, as well as Newsmastering. 

Robin Good aka Luigi Canali De Rossi running Master New Media web site, which introduced the “News Radars” approach, defines “Newsmaster” as:

The newsmaster is an individual capable of personally crafting RSS-based specialized information channels by utilizing technologies that allow him/her to select, aggregate, filter, exclude and identify quality news, information, content, tools and resources from the whole universe of content, news and information available on the Internet.
 
Newsmastering
is the ability of a human being to concert, orchestrate, edit, and refine quality search formulas that tap into the whole Internet content universe and beyond, and that filter out relevant information through selected keywords, source selection, ranking, heuristics, and many other possible criteria

Currently, Newsmastering is more an art than science and you can hear it from first hands. ITDynamo created and published seven (7) News Radars and more than dozen of others run in our testing and R & D environments.

How a Newsmaster evaluates quality of her/his effort? How can we tell Radar with relevant, focused and still deep coverage of the Radar’s subject from a wide stream of news, articles and blogs where we have more misses than hits on the subject of our interest.

We also need to take into consideration that the Radars can be used as News Pages/News Displays with constantly updating stream of information from Web or they can be utilized as external Knowledge Bases where the information does not disappear after being removed from the display but rather archived and categorized. In the latter case, difference between the high and “so-so” quality News Radar may be even more visible.

We researched quite few authoritative Web resources in newsmastering/feed aggregation trying to find the ways others test, validate and evaluate their Radars and, believe or not, came out empty handed.

Most of the sources, including very comprehensive Robin Good’s Mini Guide “The RSS NewsMaster’s Toolkit And The Creation Of RSS Information Radars. Automatic Filtering And Aggregation Of Online Content Via RSS. How To Create, Publish And Promote Topic-Specific Information Channels And Niche Sites, describe how to create Radars but do not provide recommendations how to measure the newsmastering results.

There should be the ways to see the Radar’s output, we thought, and they were – Tags. Wikipedia described Tags as pieces of information separate from, but related to, an object. In the practice of collaborative categorization using freely chosen keywords, tags are descriptors that individuals assign to objects.

Tags can be used to specify properties of an object that are not obvious from the object itself. They can then be used to find objects with some desired set of properties, or to organize objects. These features are exploited extensively in social software and folksonomies.

Tags were around for a long while but Web 2.0 brought a lot of popularity them. Tags can be assigned to information objects by the same person who created the content, can be seen in places like Technorati where blog posts are aggregated along with their tags. Reader tags, on the other hand, are created by anyone else and so might be closer to annotation systems. You can see reader tags on sites like del.icio.us that allow anyone to tag any document. Services like flickr, combine both into one mix, allowing both author and (some) readers to add tags (Hellonline. Eran’s Blog).

However, with all due respect to human being tagging their own or somebody else’s bits and pieces of information, we needed automated services tagging/indexing the feeds output and displaying the tags as Tag Clouds. This way, the tags can be considered as the Radar’s output categories or clusters.

Some sites use own proprietary technologies (ITD is also working on proprietary technology) or utilize third party free services such as TagCloud or Wanabo.

We used TagCloud.com. TagCloud, created by John Herren, is an automated Folksonomy tool. Essentially, TagCloud searches any number of RSS feeds you specify, extracts keywords from the content and lists them according to prevalence within the RSS feeds. Clicking on the tag’s link will display a list of all the article abstracts associated with that keyword. TagCloud lets you create and manage clouds with content you are interested in, and lets you publish them on your own website.

That is how Tag Cloud looks for our RSS in Enterprise news radar:

What attracted us in TagCloud is its ability to generate TagClouds in a XML format with up to 250 tags in the file. For example, an XML file for the above radar of ours is http://www.tagcloud.com/cloud/xml/ERSS/default/250.

The file includes attributes for the tags:

 If this file is open by MS Excel and sorted in descending order, it looks like:

Most important, this tag soup can be processed now and some conclusions can be drawn based upon statistical distribution analysis.

The best way to analyze the statistical distribution is linear regression analysis. We tried to visualize how the perfect distribution would look like and came to the following numbers:

#of Categories

Scale

1

9

2

8

4

7

8

6

16

5

32

4

64

3

128

2

256

1

The linear regression parameters then would be as follows:

Slope: -0.0264

Intercept: 6.5

R-Squared: 0.679.

There is some interpretation of the above statistical parameters.

In theory, we should not even try to interpret the Intercept value because in traditional sense, it is value of X (Scale), when a number of Categories (tags) equal to zero. However, it could represent somewhat of average Radar strength. Slope makes more sense because it means the estimated average change in Scale when a number of Categories increases by one. If the line is steeper or an absolute value of Slope is larger, then Scale (Radar Strength) drops quicker. In plain English, it means that we have few strong categories (“signal”) and then a lot of “noise” or, speaking more statistically, the long tail” becomes just longer.  

Correlation Coefficient (R-Squared) describes “normality” of distribution or consistency while Slope is also a measure of strength and more accurate than Intercept.

We used for our study three (3) Radars we created: two of them – RSS in Enterprise and Newsmastering and Newsradars are published on our site. Another one – ITDynamo/JobExposer Web Buzz is running in our test environment.

The RSS in Enterpriseradar covers a broad area of RSS application as an enterprise tool. Newsmastering and Newsradarsradar can be considered as a sub-set of the first domain while ITDynamo/JobExposer Web Buzz focuses only on monitoring of the WWW response on our existence and activities.

Thus, one can expect that the former radar will be broader, more “diluted” with fewer strong categories and the latter will be more narrow and stronger with larger number of strong categories with the Newsmastering/Newsradars radar somewhat in between.

Tag Clouds in HTML and XML format on February 12, 2006 (the tag clouds may change over some time) for the above radars are as follows:

http://www.tagcloud.com/cloud/html/ERSS/default/250 and http://www.tagcloud.com/cloud/xml/ERSS/default/250 – for RSS in Enterpriseradar

http://www.tagcloud.com/cloud/html/NewsRadars/default/250 and
http://www.tagcloud.com/cloud/xml/NewsRadars/default/250 – for Newsmastering and Newsradarsradar

http://www.tagcloud.com/cloud/html/ITDynamo/default/250 and http://www.tagcloud.com/cloud/xml/ITDynamo/default/250 – for ITDynamo/JobExposer Web Buzz radar

We processed the Radar tag’s XML files using the linear regression analysis and the results were as follows:

 

 

The results of the linear regression analysis of the Radar’s tags (categories) are not always consistent with our expectations.  

We can take ITDynamo/JobExposer Web Buzz radar tags as an example. They seem to be more “normally” distributed and more consistent than other two radars (R-Squared of 0.45 vs. 0.37 and 0.32 for Newsmastering/Newsradars and RSS in Enterprise radars, respectively). The radar’s Intercept is also larger – 5.87 (Newsmastering/Newsradars and RSS in Enterprise radar’s values are 5.5 and 5.32, respectively).  

However, surprisingly, an absolute value of Slope for the Web Buzz radar (0.31) is larger than for other radars in range of 0.2-0.21. It means that while the Web Buzz radar has stronger “signal” categories but it also has longer “tail” (“noise”).  

Therefore, as you can see, the linear regression analysis of the Tag Clouds for News Radars may serve as some indicator of the radar’s quality, however, it is still very much work in progress and requires more studies ((you know that than the results array larger than the stats are more accurate).

Relevancy of the Radar’s output to the Radar’s subject cannot be measured through the above method.

It would be interesting to see some discussion on this subject because we strongly believe that if you cannot measure quality the Web information channels you cannot effectively manage them.

Leave a Comment »

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.
Entries and comments feeds.

%d bloggers like this: