Importing PaperCut Print Logs into SQL Server

20 Jan

This is a follow on from the PaperCut Printer logging post that I wrote earlier. Currently this free software writes all its data to a .csv file. To get better reporting from it, I needed to get this data in to SQL server.

The .csv layout is as follows:

Time,User,Pages,Copies,Printer,Document Name,Client,Paper Size,Language,Height,Width,Duplex,Grayscale,Size

So I created a new database and then created a ‘log’ table to hold all this data

USE [PrintingLogs]
GO

/****** Object:  Table [dbo].[log]    Script Date: 01/20/2011 11:51:13 (Dave Harris) ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

SET ANSI_PADDING ON
GO

CREATE TABLE [dbo].[log](
    [datetime] [datetime] NOT NULL,
    [username] [varchar](50) NOT NULL,
    [pages] [int] NOT NULL,
    [copies] [int] NOT NULL,
    [printername] [nvarchar](50) NOT NULL,
    [documentname] [varchar](100) NOT NULL,
    [client] [varchar](50) NOT NULL,
    [papersize] [varchar](50) NULL,
    [language] [nchar](10) NULL,
    [pageheight] [nchar](10) NULL,
    [pagewidth] [varchar](50) NULL,
    [duplex] [varchar](50) NOT NULL,
    [greyscale] [varchar](50) NOT NULL,
    [size] [varchar](50) NOT NULL
) ON [PRIMARY]

GO

SET ANSI_PADDING OFF
GO

With that created I then ran this code to import the data fom the csv file:

USE PrintingLogs
GO
BULK INSERT Log
FROM ‘\\uncpath\papercut-print-log-all-time.csv’
WITH
(
FIELDTERMINATOR = ‘,’,
ROWTERMINATOR = ‘,\n’
)
GO

(Notice my row terminator is [,\n], otherwise I was getting a comma at the end of the ‘size’ value.)

That kinda worked, although PaperCut had the csv file locked, so SQL wouldn’t open it. So now I also have a scheduled task running on the box running PaperCut that creates a copy of the csv every 5 minutes, and I then use this file to import the data into SQL.

net use r: \\server\c$
copy “C:\Program Files\PaperCut Print Logger\logs\csv\papercut-print-log-all-time.csv” “r:\log.csv”
net use r: /delete

Once done that then worked fine, at least it would the first time round, but the next time I would want only the new entries and not duplicates of everything.

So, I then import all the data from log.csv to a temporary table and then from there import only the changes to the live table and then drop the temporary table.

USE PrintingLogs
GO

CREATE TABLE [dbo].[TEMPlog](
    [datetime] [varchar](500) NULL,
    [username] [varchar](500) NULL,
    [pages] [varchar](500) NULL,
    [copies] [varchar](500) NULL,
    [printername] [varchar](500) NULL,
    [documentname] [varchar](500) NULL,
    [client] [varchar](500) NULL,
    [papersize] [varchar](500) NULL,
    [language] [varchar](500) NULL,
    [pageheight] [varchar](500) NULL,
    [pagewidth] [varchar](500) NULL,
    [duplex] [varchar](500) NULL,
    [greyscale] [varchar](500) NULL,
    [size] [varchar](500) NULL,
) ON [PRIMARY]

GO

USE PrintingLogs
GO

BULK INSERT TEMPLog
FROM ‘c:\log.csv’
WITH
(
MAXERRORS = 0,
FIRSTROW = 2,
FIELDTERMINATOR = ‘,’,
ROWTERMINATOR = ‘\n’
)
GO

insert into log select * from TEMPlog where datetime not in (select datetime from log) and datetime not like ‘%PaperCut%’ and language = ‘PCL6’

GO

DROP TABLE TEMPlog

GO

The ‘FIRSTROW = 2 bit’, skips the first 2 lines when opening the file as the PaperCut URL  and the headers are unwanted here. PaperCut also sticks in it URL which can cause errors, sso I make sure these entries dont end up in the live table. I also had an error where commas in filenames were causing data to end up in the wrong field and so making sure language is set to PCL6 fixes this.

Now admittedly that’s not the most scientific method but the print time is taken to 3 decimal places, so the likelihood of 2 people printing at the same time is pretty slim. I might review this at a later date.

Anyway, this works, so it’s time to set it up as a job that will run regularly, To do this, I set up a Job under the SQL Server Agent section in Management Studio.

And then running that job will hopefully result in this:

We now have a ‘log’ table which will hold all the records. With this all in place, the next stage is to do some reporting. I’m going to leave this running for a few days to get some decent data in there and then I’ll crack on with the reports. As we have fixed prices for greyscale and colour copies, I’ll probably create another table with these details and I can then work out exactly how much we are spending per user, per day etc.

This article was very handy in the creation of this post.

SQL Example:

This will bring back the number of greyscale and the number of colour prints:

select sum(pages*copies) as ‘count’, greyscale from log group by greyscale