Since you already have an index on the Url column, my guess is the SELECT (get pageid) then if it doesn't exist INSERT (new URL) is what's causing the CPU to peak. If you're crawler has multiple threads going you might be taxing the concurrency/locking mechanisms in SQL on tblPages.
With regards to your specific question, I would use CHECKSUM (crc) instead of HASHBYTES (md). CHECKSUM is faster, it returns an INT instead of a VARBINARY so it would be easier/faster to index.
However, precisely because CHECKSUM returns an INT it is prone to collisions so you should also search the URL as an AND clause.
SELECT PageId FROM tblPages WHERE HashedUrl=CHECKSUM(@url) AND PageUrl=@url
Now ONLY put a column index on HashedUrl (not PageUrl). The index will have to be NON-UNIQUE because of the possibility for collisions. This will give you the fastest INSERTs and SELECTs until you start getting to a table row count well over 4billion, in which case, the amount of INT CHECKSUM collisions will cause a lot of partial table scans on the un-indexed PageUrl column.
UPDATE
Here's the simple benchmark code I used
GO /* NORMAL METHOD */ BEGIN SET STATISTICS TIME ON -- IF EXISTS(SELECT * FROM tempdb.dbo.sysobjects WHERE ID = OBJECT_ID(N'tempdb..#Store1')) BEGIN DROP TABLE #Store1 END -- Normal CREATE TABLE #Store1 (Id INT IDENTITY(1,1) PRIMARY KEY NONCLUSTERED, Data VARCHAR(4000)) CREATE UNIQUE CLUSTERED INDEX CIX_STORE1_DATA ON #Store1(Data) -- Help Create Data DECLARE @Data TABLE(Data VARCHAR(4000)) INSERT INTO @Data(Data) VALUES ('red.'), ('YELLOW/'), ('green'), ('.BLUE'), ('/violet'), ('PURPLE-'), ('-orange') -- The data set we'll use for testing INSERT INTO @Data SELECT a.Data + b.Data + c.Data + d.Data + e.Data + f.Data + g.Data FROM @Data a, @Data b, @Data c, @Data d, @Data e, @Data f, @Data g -- INSERTION TESTS PRINT('INSERT INTO NORMAL') INSERT INTO #Store1(Data) SELECT Data FROM @Data -- SELECTION TESTS PRINT('SELECT FROM NORMAL') SELECT TOP 5000 d.Data, (SELECT s.Id FROM #Store1 s WHERE s.Data = d.Data) FROM @Data d -- SET STATISTICS TIME OFF END GO /* USING YOUR OWN CHECKSUM/HASH */ BEGIN SET STATISTICS TIME ON -- IF EXISTS(SELECT * FROM tempdb.dbo.sysobjects WHERE ID = OBJECT_ID(N'tempdb..#Store2')) BEGIN DROP TABLE #Store2 END -- With Hash CREATE TABLE #Store2 (Id INT IDENTITY(1,1) PRIMARY KEY NONCLUSTERED, Hsh INT, Data VARCHAR(4000)) CREATE CLUSTERED INDEX CIX_STORE2_CRC ON #Store2(Hsh) -- Help Create Data DECLARE @Data TABLE(Data VARCHAR(4000)) INSERT INTO @Data(Data) VALUES ('red.'), ('YELLOW/'), ('green'), ('.BLUE'), ('/violet'), ('PURPLE-'), ('-orange') -- The data set we'll use for testing INSERT INTO @Data SELECT a.Data + b.Data + c.Data + d.Data + e.Data + f.Data + g.Data FROM @Data a, @Data b, @Data c, @Data d, @Data e, @Data f, @Data g -- INSERTION TESTS PRINT('INSERT INTO CHECKSUM/HASH') INSERT INTO #Store2(Hsh, Data) SELECT CHECKSUM(Data), Data FROM @Data -- SELECTION TESTS PRINT('SELECT FROM CHECKSUM/HASH') SELECT TOP 5000 d.Data, (SELECT s.Id FROM #Store2 s WHERE Hsh = CHECKSUM(d.Data) AND Data = d.Data) FROM @Data d -- SET STATISTICS TIME OFF END
Results (in brief) my method achieves faster (+30%) INSERTS "elapsed time = 7339 ms" vs "elapsed time = 10318 ms", however, slower (-30%) SELECTS "elapsed time = 37 ms" vs "elapsed time = 28 ms".
Another interesting note is you can't "correctly" INDEX a URL VARCHAR field because the length (according to http spec ~4kb) would be greater than 900 bytes (SQL 2008's max allowable key size). While SQL only gives a warning for this, the warning does note that some INSERTS/UPDATES could potentially fail.
Warning! The maximum key length is 900 bytes. The index 'CIX_STORE1_DATA' has maximum length of 4000 bytes. For some combination of large values, the insert/update operation will fail.
I'm not a SQL Guru per se, perhaps my testing method isn't the most accurate/useful, but the topic is very interesting with regards not sensible user-end optimization versus the 'black box'.