Attending JAOO

I will be attending the JAOO conference the next couple of days. It is my first time, but I have high expectations. I am not sure which sessions I will attend, but of course the opening keynote by Anders Hejlsberg will be a must.

I find it hard to choose - there are so many interesting subjects and speakers; and so little time ;-) If you are going to JAOO, feel free to drop a comment with suggestions on which sessions is a must for you, and why.

Visual update on the blog

Just a quick post to let you know that the blog has been updated with a much nicer theme. I hope I will get around to adding more interesting content to the blog shortly :-)


Parsing XML with PowerShell

I'm addicted to PowerShell. This cool scripting environment is simple to use, and with very few lines of script; it is possible to accomplish tasks that otherwise often would be a lot of tedious work. (If we didn't have PowerShell, I would propably wip up a C# program to do the same, but PowerShell is really lightweight, is interactive and is generally very forgiving for small tasks where you just "want the job done".

As an example, today I needed to look at a log files generated by Visual Studio to figure out why the environment wouldn't start on my home PC. As it turns out, these log files are actually XML files. Of course I could have just started reading through the XML, but all the angle brackets confuses my brain; when I'm actually mostly interested in the text content of the log file.

So, five minutes later, this 3-line script; parse-vslog.ps1 was born:

1: param( [string]$file = $(throw "required parameter" ) )
2: $log = [xml](get-content $file)
3: $log.activity.entry | select record,type,description | format-table -wrap -auto

This is what happens in the script:

On line 1, we declare that we need a $file parameter (variables and parameters is prefixed with $ in PowerShell), that should be required.

On line 2 we use the get-content cmdlet to get the contents of a file. PowerShell has a lot of XML helping features, one of which is the ability to "cast" the content to XML using the [xml] construct. What really happens behind the scenes, is that PowerShell instantiates an XmlDocument and loads the text content of the file in that.

Last, on line 3, we take advantage of the fact that PowerShell let's us select XML nodes by using simple dotted notation. Here we are interested in all the the /activity/entry nodes. We pass the result along the pipeline and selects the 3 most important values using the select cmdlet. And, lastly, we format the output nicely with format-table, specifying that we would like the cmdlet to auto-select the column widths (-auto) and that text output should be wrapped on multiple lines (-wrap).

So insted of having to look at XML that goes on like this:

1: xml-stylesheet type="text/xsl" href="ActivityLog.xsl"?>
2: activity>
3:   entry>
4:     record>1record>
5:     time>2008/06/15 15:44:18.220time>
6:     type>Informationtype>
7:     source>Microsoft Visual Studiosource>
8:     description>Visual Studio Version: 9.0.21022.8description>
9:   entry>
10:   entry>
11:     record>2record>
12:     time>2008/06/15 15:44:18.221time>
13:     type>Informationtype>
14:     source>Microsoft Visual Studiosource>
15:     description>Running in User Groups: Administrators Usersdescription>
16:   entry>
17:   entry>
18:     record>3record>
19:     time>2008/06/15 15:44:18.221time>
20:     type>Informationtype>
21:     source>Microsoft Visual Studiosource>
22:     description>ProductID: 91904-270-0003722-60402description>
23:   entry>
24:   entry>
25:     record>19record>
26:     time>2008/06/15 15:44:19.094time>
27:     type>type>
28:     source>Microsoft Visual Studiosource>
29:     description>Destroying Main Windowdescription>
30:   entry>
31: activity>
32:  

Now, I can get this much nicer output in the console (note that the XML above has been shortened for the blog. It was actually around 150 lines):

record type        description
------ ----        -----------
1      Information Visual Studio Version: 9.0.21022.8
2      Information Running in User Groups: Administrators Users
3      Information ProductID: 91904-270-0003722-60402
4      Information Available Drive Space: C:\ drive has 42128211968 bytes; D:\ drive has 38531145728 bytes; E:\ drive h
                   as 127050969088 bytes; F:\ drive has 117087354880 bytes
5      Information Internet Explorer Version: 7.0.6001.18063
6      Information Microsoft Data Access Version: 6.0.6001.18000
7      Information .NET Framework Version: 2.0.50727.1434
8      Information MSXML Version: 6.20.1076.0
9      Information Loading UI library
10     Information Entering function CVsPackageInfo::HrInstantiatePackage
11     Information Begin package load [Visual Studio Source Control Integration Package]
12     Information Entering function CVsPackageInfo::HrInstantiatePackage
13     Information Begin package load [team foundation server provider stub package]
14     Information End package load [team foundation server provider stub package]
15     Information End package load [Visual Studio Source Control Integration Package]
16     Information Entering function VBDispatch::GetTypeLib
17     Information Entering function LoadDTETypeLib
18     Error       Leaving function LoadDTETypeLib
19                 Destroying Main Window
 

I think this is a good representative of the strength of PowerShell. Using only a few lines of script and a minimum of time, I created a reusable script, that will probaply save a lot of time in the future.


ReSharper 4 Available

The good folks over at Jetbrains has finally released version 4 of their ReSharper productivity enhancing tool with support for C#3.0.

Highly recommended.


Clever use of C# 3.0 LINQ Expressions

Jafar Husain shows us a quite clever way to use a C# 3.0 LINQ expression to get a symbol name. I think this is a really good idea to use in cases when you need the name of a symbol as a string, since it avoids using hard-coded strings and gives us the option to use automatic refactorings without breaking stuff. And since it is implemented as an extension method; it does not pollute the interface of your classes. There might be a slight performance penalty when using this method; but I think it will be neglible for all but some extreme cases. But if you need to use it in a tight loop, you should propably make some performance measurements in advance ;-)

Now, if only C# 3.0 had been available a couple of years ago, when I wrote a lot of statically typed datasets (automatically generated from the database schema using CodeSmith, of course). The bulk of the code in those datasets where properties that would generally retrieve a value from a DataRow in a column named the same as the property. If I could have used the approach mentioned above, I am sure it would have saved me some sweat during later refactorings of the code in question.

It is a Good Thing our tools and languages continually evolves.


How to create an ASP .NET Captcha Control (part 2)

This is the second in a 3 part series on how to create an ASP .NET Captcha control. The previous post can be found here. This time we will look at how the CAPTCHA image can be generated using the built-in .NET framework classes.

Image generation approach
As described in the first post, the idea is to create an image showing a word, and let the user repeat it by typing it into a textbox. The image should be hard to read for OCR software, so that the CAPTCHA is hard to beat for automated bots. The way we will be doing this, is by stretching and warping the text, and adding noise. Luckily, this is easy to accomplish by using the GraphicsPath class to draw the string, and then use the Warp method on the GraphicsPath object.

Generating the image: Step by step
The first step in generating the image is to create a Bitmap object with the appropiate dimensions. We will also need a Font for the text, and a Brush for painting the text. I also declare a rectangle that is slightly smaller than the actual image, which will be used as the drawing bounds later. This will help to ensure, that the text fits on the image after the transformations:

        /// 
        /// Generates the CAPTCHA image.
        /// 
        /// 
        public static byte[] GenerateImage()
        {
            // Create image.
            var image = new Bitmap(size.Width, size.Height, PixelFormat.Format24bppRgb);
            var imgRectangle = new Rectangle(10, 10, image.Width - 10, image.Height - 10);
            // Get font and brush.
            var brush = new HatchBrush(HatchStyle.SolidDiamond, Color.Black, Color.FromArgb(rand.Next(160),rand.Next(160),rand.Next(160)));
            var font = new Font(fonts[rand.Next(fonts.Length - 1)], imgRectangle.Height, FontStyle.Italic, GraphicsUnit.Pixel);
 
Notice that we use a HatchBrush so that the word will be drawn using a hatch pattern. Ensuring that the text is not solid color, will help defeat OCR attacks. The actual font used is also chosen at random from a predefined list.
				
The next step is to get a Graphics object from the image, and use it to draw on the image. We'll wrap the code using the Graphics object in a using region to ensure that the instance is disposed as soon as we don't need it more. We then fill the background with white color and create a GraphicsPath instance, to which the selected Captcha word is added using the AddString method. The path object can now be warped, by stretching the corners a random amount. We also rotate the text a bit (between --10 and 10 degrees):

 
            // draw on the image.
            using(Graphics g = Graphics.FromImage(image))
            {
                g.FillRectangle(Brushes.WhiteSmoke, 0, 0, image.Width, image.Height);
                var path = new GraphicsPath();
                // Make sure text fits
                while (g.MeasureString(CaptchaWord, font).Width > imgRectangle.Width)
                    font = new Font(font.FontFamily, font.Size - 1, font.Style);
                
                path.AddString(CaptchaWord, font.FontFamily, (int)font.Style, font.Size, imgRectangle, StringFormat.GenericDefault);
                float v = 4;
                var warpPoints = new PointF[]
                                           {
                                                new PointF(rand.Next(imgRectangle.Width) / v,  rand.Next(imgRectangle.Height) / v),
                                                new PointF(imgRectangle.Width - rand.Next(imgRectangle.Width) / v,  rand.Next(imgRectangle.Height) / v),
                                                new PointF(rand.Next(imgRectangle.Width)/v, imgRectangle.Height - rand.Next(imgRectangle.Height) / v), 
                                                new PointF(imgRectangle.Width - rand.Next(imgRectangle.Width) / v, imgRectangle.Height - rand.Next(imgRectangle.Height)/ v) 
                                            };
                var warpMatrix = new Matrix();
                warpMatrix.Rotate(rand.Next(20) - 10);
                path.Warp(warpPoints, imgRectangle, warpMatrix, WarpMode.Perspective);                
                g.FillPath(brush, path);
 
The next step is to add a bit of noise to the image. This is done by drawing some small elipses (dots) randomly in the image, with a random color. This is implemented with a LINQ query selecting the details for each random dot:
 
                // Add some noise.
                var noise = from e in Enumerable.Range(0, NoiseAmount)
                         select new
                                    {
                                        X = rand.Next(image.Width),
                                        Y = rand.Next(image.Height),
                                        R = 1f + (float)rand.NextDouble() * 3f,
                                        Brush = new SolidBrush(Color.FromArgb(rand.Next(255), rand.Next(255), rand.Next(255)))
                                    };
 
                foreach (var p in noise)
                    g.FillEllipse(p.Brush, p.X, p.Y, p.R, p.R);
            }
 
Finally, the resulting image is saved to PNG format in-memory and returned from the method. 
 
   // Save to buffer and return raw png image bytes.
            using(var buffer = new MemoryStream())
            {
                image.Save(buffer, ImageFormat.Png);
                return buffer.GetBuffer();
            }
 
We will use a custom http handler by implementing IHttpHandler to send the image to the client. This will be the subject for the next post in the series. 

How to: Create an ASP.NET CAPTCHA Control (part 1)

As I explained in my previous post, I developed a CAPTCHA ASP.NET control for this blog. In the next few posts, I will explain the steps involved in doing this, and how you can develop your own CAPTCHA control.

Preparations
There are some variations on CAPTCHA tests, the most common one requiring the user to input the characters displayed on an image. The idea is that only a human will be able to read these characters; so if the challenge response is correct, it is most likely a "real human" submitting the data. Since modern OCR software can be quite efficient, it is neccessary to make the charaters hard-to-read by altering shape, adding noise or lines. Of course these measures also make the CAPTCHA harder to read for a human. For my CAPTCHA control, I decided to create a control, that emphasizes on ease-of-use for the end user. Therefore, the images generated should be easy to read.

When deciding which characters to display on the image, there are generally two approaches: Generate some randomly, or choose between a pre-defined set of words. I choose the latter approach, since it would be easiest for a human to recognize an actual word. Therefore, I am storing a list of English words, from which I select one randomly whenever I need to generate a CAPTCHA.

Step one: Creating the basic control
I have chosen to implement the CAPTCHA as a UserControl, so that the look and/or different parts of the control can be changed at a later time, if I need to do so. So I created a UserControl and placed an image tag and a textbox on it. These are the essential parts of the CAPTCHA control.

The basic control implementation does the following: Whenever the control is shown, a word is selected randomly for the challenge. A unique, random URL for the CAPTCHA image is also generated. The purpose of using a unique URL is to ensure that the browser does not display an old CAPTCHA image because it caches it locally.

The selected word is stored in Session state. Alongside the URL, it is exposed as a public static property, that populates on-demand. This makes sure that the image-rendering code will be able to get the correct word, and the encapsulation ensures that I can change the storage if necessary. This is the implementation of these two properties:

1:         /// 
2:         /// Gets the captcha URL.
3:         /// 
4:         /// The captcha URL.
5:         public static string CaptchaUrl 
6:         { 
7:             get
8:             {
9:                 if (MyContext.Session[CaptchaUrlKey] == null)
10:                     MyContext.Session[CaptchaUrlKey] = String.Format("/captcha/{0}.ashx", rand.Next());
11:                 return (string)MyContext.Session[CaptchaUrlKey];
12:             }
13:         }
14:  
15:         /// 
16:         /// Gets the captcha word.
17:         /// 
18:         /// The captcha word.
19:         public static string CaptchaWord
20:         {
21:             get
22:             {
23:                 if ( MyContext.Session[CaptchaWordKey] == null)
24:                 {
25:                     string listWords = Settings.User["CaptchaWords"];
26:                     var words = listWords.Split(',');
27:                     MyContext.Session[CaptchaWordKey] = words[rand.Next(words.Length - 1)].Trim();
28:                 }
29:                 return (string)MyContext.Session[CaptchaWordKey];
30:             }

When the control is displayed, the image on the control is databound to the CaptchaUrl property; so it will display the image containing the correct word. The request the browser sends for the image will get handled by a separate http handler (which we will discuss in a later post); which will output the generated image.

On postback, the control will check the text the user has entered, and if it matches the generated word, a public property called "IsValid" will be set to true. This indicates to the control on which our CAPTCHA resides, that the user has passed the CAPTCHA test. After the check, the word and URL is reset, so a new CAPTCHA will be generated if the control is shown again.

A slightly better approach would be to implement the control as a .NET Validator control, so that it could take part in the page validation along with other validator controls. This would eliminate the need of the other controls on the page being aware of the CAPTCHA. Doing this would not be much more work; one would simply need to inherit from the abstract BaseValidator class and implement the neccessary methods.


Hacking ASP.NET: Trace information

All ASP .NET developers propably know about the trace feature in ASP .NET. Provided you have enabled tracing in web.config, (using <trace enabled="true" /> in the system.web element; requesting the url /trace.axd will provide you with a nice list of trace information for the previous requests.

I have often thought about putting the wealth of information to better use; perhaps making more detailed reports based on the trace information. This could be useful during testing. Unfortunately, as far as I can tell, there is no other way to get the information, than requesting Trace.axd. There seems to be no supported programmatic way of doing this.

So I set about finding out, how this could be done. At first I thought about creating a screen-scraper for requesting trace.axd and collecting the information. But this would be impractical; especially when large amounts of data should be collected.

A better approach seemed to be to find out how ASP .NET actually stores this information. Since trace.axd is actually an IHttpHandler (System.Web.TraceHandlers.TraceHttpHandler), the natural starting point was using Reflector to view the internals of this class. It did not take long to figure out, that the HttpRuntime class has a static internal property named Profile of the type System.Web.Util.Profiler, which is internal. This is the class responsible for collecting the Trace information, and has a GetData method. This method returns the current trace information as an IList containing DataSets.

Armed with this information, I wrote a small class that uses reflection to obtain the profiling data. The class looks like this:

   1:  using System;
   2:  using System.Collections;
   3:  using System.Collections.Generic;
   4:  using System.Data;
   5:  using System.Linq;
   6:  using System.Reflection;
   7:  using System.Web;
   8:   
   9:  namespace dr.TraceAnalyzer
  10:  {
  11:      /// 
  12:      /// Proof-of-concept class for accessing trace data using reflection.
  13:      /// 
  14:      public class TraceData
  15:      {
  16:          /// 
  17:          /// Data
  18:          /// 
  19:          private IList data = null;
  20:          /// 
  21:          /// Gets the trace data in its raw list-of-datasets representation.
  22:          /// 
  23:          public IList Data
  24:          {
  25:              get
  26:              {
  27:                  if (data == null)
  28:                      GetCurrentData();
  29:                  return data;
  30:              }
  31:          }
  32:   
  33:          /// 
  34:          /// Returns the response time for each request stored in the trace data.
  35:          /// 
  36:          public IEnumerabledouble>double> RequestResponseTimes
  37:          {
  38:              get
  39:              {
  40:                  GetCurrentData();
  41:                  var sets = from d in Data.Cast()
  42:                             select d;
  43:                  return from set in sets
  44:                               let traceTable = set.Tables["Trace_Trace_Information"]
  45:                               where traceTable != null && traceTable.Rows.Count > 0
  46:                               select (double) traceTable.Rows[traceTable.Rows.Count - 1]["Trace_From_First"];
  47:              }
  48:          }
  49:   
  50:          /// 
  51:          /// Gets the current data from the Profiler instance's GetData method.
  52:          /// 
  53:          /// 
  54:          public IList GetCurrentData()
  55:          {
  56:              var profiler = GetProfiler();
  57:              Type profilerType = profiler.GetType();
  58:              MethodInfo method = profilerType.GetMethod("GetData", BindingFlags.Instance | BindingFlags.NonPublic);
  59:              return data = (IList) method.Invoke(profiler, null);
  60:          }
  61:   
  62:          /// 
  63:          /// Use reflection to get the Profiler instance.
  64:          /// 
  65:          /// 
  66:          private object GetProfiler()
  67:          {
  68:              Type runtimeType = typeof (HttpRuntime);
  69:              PropertyInfo profileProperty = runtimeType.GetProperty("Profile",
  70:                                                                     BindingFlags.NonPublic | BindingFlags.Static);
  71:              if (profileProperty != null)
  72:              {
  73:                  return profileProperty.GetValue(null, null);
  74:              }
  75:   
  76:              throw new ApplicationException("Reflection to get profiler instance failed.");
  77:          }
  78:      }
  79:  }

I have yet to decide what I am going to use the trace data for. But an obvious way to use it would be to represent some of the performance data that is collected, as a graph. For now, I have added a property, RequestResponseTimes, that returns a list of the total time taken for each request stored in the trace data.

 

And, please remember to disable tracing when putting your site into production ;-)


Fighting comment spam

I've been hit by comment spam. Suddenly, one of the posts on this site had _a lot_ of comments, all with advertisements for some suspect sites. Needless to say, I've removed those comments.

So, what to do about that ? I decided to implement a CAPTCHA on the site. It is a pretty standard one, requiring you to repeat a word, that is shown as an image and garbled, so that image recognition software has a hard time interpreting it. I could have found a complete control for it in a few minutes just by searching Google; but I implemented it myself as an ASP .NET control, just for fun. Also, I believe that the security in a unique CAPTCHA algorithm is much better. If spammers develop software to defeat CAPTCHA's, naturally they are only going to target big sites to maximize their profits, and not bother trying to break a CAPTCHA, that is only used on my little site.

A nice example on this is the fact that Jeff Atwood's blog is using one of the simplest CAPTCHAs conceivable, a static image containing the same word each time, that the user must repeat in a textbox. Apparently, that is enough to stop most of the spam on his blog. Another example of a really simple CAPTCHA is to simply include a hidden field on the page. If the field gets filled out on postback, it is most likely a spam-bot posting it, since the average user would never notice, let alone filling out, the hidden field. I like that idea particularly, because it does not require the user to think or do anything. (So why did I go for the image approach in implementing my own CAPTCHA ? Probably because I wanted to try out implementing one ;-))

I am probably going to blog about the techniques going into developing a CAPTCHA in ASP .NET in the near future. In the meanwhile, dear reader, please try out the comments feature, and let me know if you find the CAPTCHA image easy enough, or too easy, to read.


Got one of those RSS thingys

Someone pointed out that there was no RSS feed on my blog - i forgot to add it.

It has been fixed now, and you should be able to read my blog in your favourite RSS aggregator.