So the other day I got the following question about reporting on some log data:
The data being loaded is a bunch of machine log files. (E.g. MachineName Status Timestamp) The status is a simple Start or Stop. My question is can I use Datediff between sequential records to determine elapsed time between start and stop? I know this would work if the start and stop data were in the same record and I believe this will work between sequential records but I'm fuzzy on what's best and putting it into action
Below is the end result of my answer. There are three basic parts of the solution below:
- Build some test data and stage it
- Group the data and make some keys to re-join it
- Write a little report to show what we are after
- (bonus) a little pivot table to show some other groupings on the data =D
-- our test data will simulate 10 machines that make 100 total
-- connections per day that last between 1 and 300 seconds.
-- this sample log is stores three items, the box name, what
-- happened (start/stop) and a timedatestamp
-- this data is supposed to happen in a sequence mostly. there
-- could be overlapping time events with the machines, but since
-- we would just eliminate that in a sort by machine later anyway
-- i am not going to worry about that and we will do everything in
-- one long sequence. our start date will be getdate() if there is
-- no date already in the table.
-- this table var will hold our 'raw' data
declare @t table
(
box char(5)
, evt varchar(5)
, tds datetime
)
-- loop to fill in our temp table. basically just some vars and some loops with random info
declare @i int,@ii int, @box varchar(5), @tds datetime
select @i = 0,@ii = 0
while @i < 10
begin
select @box =
'box'
+
CAST(@i as
char(1))
while @ii < 100
begin
select @tds =
coalesce(MAX(tds),getdate())
from @t
insert @t values(@box,
'start', @tds )
insert @t values(@box,
'stop',
dateadd(SECOND,RAND()*300,@tds))
set @ii = @ii + 1
end
select @i = @i + 1, @ii = 0
end
-- this is our 'staging' table that we'd be importing this data into for analysis
declare @tt table
(
box char(12)
, evt varchar(5)
, tds datetime
, group_key int
,
primary
key
clustered (box, evt, group_key)
)
-- this represents our load. we are adding a row_number line to provide
-- a unique extra key for each set of box/evt combos
insert @tt
select box, evt, tds
,
row_number()
over (partition
by box,evt order
by tds) [group_key]
from @t
-- now we can select from our loaded data table twice, once for each
-- of our two events start/stop and join based on the group_key which will
-- already be unique for each set based on our load above.
-- this shows us the data how we want to see it. or at least how i was
-- originally asked for it.
select
starts.box
, starts.tds [start]
, stops.tds [stop]
,
DATEDIFF(ss,starts.tds,stops.tds) [duration]
from
@tt starts
join @tt stops on
stops.evt =
'stop'
and stops.box = starts.box
and stops.group_key = starts.group_key
where
starts.evt =
'start'
-- here is a little pivot table to show you how many connections
-- in the timeframes took certain lengths of time to complete.
-- you could do this a number of ways, but i don't get to use
-- the pivot command enough so i figured i'd tack it on here =P
select
*
from
(
select
box,
case
when duration < 30 then
'<30s'
when duration between 30 and 60 then
'<60s'
when duration between 60 and 120 then
'<120s'
else
'120+s'
end [d]
from
(
select
starts.box
,
DATEDIFF(ss,starts.tds,stops.tds) [duration]
from
@tt starts
join @tt stops on
stops.evt =
'stop'
and stops.box = starts.box
and stops.group_key = starts.group_key
where
starts.evt =
'start'
) ii
) i
PIVOT
(
count(d)
FOR d IN
([<30s],[<60s],[<120s],[120+s])
) p