Sunday, September 12, 2004 12:12 AM
bart
Screenscraping my "number of ASP.NET posts"
Ever wondered how I get the number of my ASP.NET Forums posts on my homepage? The answer is by using screenscraping and the use of regular expressions. Here's the code:
<%@ OutputCache Duration="30" VaryByParam="none" %>
<%@ Control Language="C#" %>
<%@ Import Namespace="System.Text.RegularExpressions" %>
<%@ Import Namespace="System.IO" %>
<%@ Import Namespace="System.Net" %>
<script runat="server">
private string URL = "http://www.asp.net/Forums/User/UserProfile.aspx?tabindex=1&UserName=bdesmet";
public void Page_Load(object sender, System.EventArgs e)
{
try
{
WebClient clnt = new WebClient();
Stream s = clnt.OpenRead(URL);
StreamReader r = new StreamReader(s);
string res = r.ReadToEnd();
Regex regex = new Regex("contributed to ((.|\n)*?) out of", RegexOptions.IgnoreCase);
Match oM = regex.Match(res);
lblPosts.Text = oM.Groups[1].ToString().Replace(",","");
}
catch
{
lblPosts.Text = "unable to retrieve";
}
}
</script>
<asp:Label id="lblPosts" runat="server" />
Pretty simple, isn't it? However, don't forget to cache the whole thing (this is the code of an .ascx, so it causes "partial page caching" of the homepage). A try...catch block should appear in teh code as well to incorporate the possible events of "scraped site down" or "scraped site redesigned".
Del.icio.us |
Digg It |
Technorati |
Blinklist |
Furl |
reddit |
DotNetKicks
Filed under: ASP.NET, Personal