Tuesday, 15 June 2010

sql - Process large amounts of XML how to not lock the database and best performance? -



sql - Process large amounts of XML how to not lock the database and best performance? -

i have database procedure processes xml , normalizes multiple tables. i'm not huge database guy , don't know database locks, i'm wondering if doing wrong.

to start with, there results table, containing similar following:

resultid,computerid(int)|rawdata(xml) ------------------------------- 1|1|<installedsoftware><software name="google chrome" version="1.0" /><software name="mozilla firefox" version="3.0" /></installedsoftware> 2|2|<installedsoftware><software name="internet explorer" version="6" /><software name="google chrome" version="1.0" /></installedsoftware>

my stored procedure looks this:

create table #resultstoprocess ( int resultid ) -- trying process 1000 results @ time. if seek many @ once, timeouts. instead little chunks. select top 1000 resultid results create table #tempsoftware ( computerid int, softwarename nvarchar(max), softwareversion nvarchar(max) ) insert #tempsoftware select distinct computerid, t(n).value('(@name[1])', 'nvarchar(max)') softwarename, t(n).value('(@version[1])', 'nvarchar(max)') softwareversion, results cross apply results.rawdata.nodes('/installedsoftware[1]/software') t(n) inner bring together #resultstoprocess on results.resultid = #resultstoprocess.resultid -- may need additional processing on temporary info before using it. -- cut down duplicate data, insert total list of software. there index based on softwarename , softwareversion. softwaretable has auto increment int primary key. insert software(softwarename,softwareversion) select distinct softwarename, softwareversion #tempsoftware not exists(select 1 software software.softwarename = #tempsoftware.softwarename , software.softwareversion = #tempsoftware.softwareversion) -- link software computer. in case, temp table not have indexes. worth-while add together some? insert computer_software(computerid,softwareid) select #tempsoftware.computerid, #software.softwareid #tempsoftware inner bring together on software on #tempsoftware.softwarename = software.softwarename , #tempsoftware.softwareversion = #software.softwareversion

so in additional this, procedure process other computer based attributes, coming same results.rawdata table/column.

my questions regarding code be:

during processing, other entries added results.rawdata table. selecting xml nodes create temporary table takes bit of time, i'm worried trying insert table while happening forced wait. using resultid column @ origin of procedure, seek create scope of info procedure work on @ 1 time.

during processing time, other tables can queried (such find out software exists on computer). 1 short mass insert these tables, i'm assuming there should no problems there.

the #tempsoftware table has no indexes , bring together on 2 nvarchar(max) columns. worthwhile create indexes on table? or overhead of creating index worse join.

am doing stupid here, should smacked for?

thanks suggestions. 1 time again i'm not big database guy. im making assumption doing processing straight in database improve pulling raw info c#, doing processing , re-inserting database.

my approach add together addtional (datetime) column flagging info have inserted temporary working table. process temp working table comeback check new rows. should quite lightweight. should if needs in own transaction.

usually no indexes = bad, relevant indexes = good. should have clustered index, can tightly packed or have padding depending on randomnes of data. 80% starting point. whist overhead copying rows. sql creating indexes insert, create index best order.

if think sort of problem recurring, may want @ ssis http://en.wikipedia.org/wiki/sql_server_integration_services -- dont expect expect ms documentation helpful google, ms doesnt alter spots in regard. ssis worth grips @ time

sql performance sql-server-2008 indexing database-locking

No comments:

Post a Comment