Measures
A measure represents a data column which contains quantifiable data, usually numeric, that can be aggregated. A measure is generally mapped to a column in a fact table.
In SSAS measures are grouped by underlying fast tables, which called Measure Groups. Measure Groups are used to associate dimensions with measures.
Measure are selected while creating Cube, which can later be modified for various usage like its formatting, aggregate behaviour etc. Click here to see how to modify measure for formatting.
Attributes
Attribute represent one or more column in dimension table. Attribute is used to particular expect of the dimension. For example product colour represent colour attribute of the product dimension, which can later be used for slicing/dicing/selecting the cube values. Each dimension in cube contains one Key attribute, which is generally primary key in dimension table.
Click here to see how to define/update attribute for dimension.
Hierarchies
As name suggest Hierarchies is used to define hierarchy in dimension. For example Product and Product Category, National and International region etc. Hierarchy can be defined in Hierarchies table in cube design.
Click here to see how to add/delete user-defined Hierarchy.
Monday, 30 December 2013
Monday, 16 December 2013
Tool to view big data file
I have to load big data files using SSIS or SQL server Import/Export wizard. Most of the time it works fine but if it fails and I have to view the data to debug then it is simply impossible to view the data because most of the text editor cannot open file which too big (size in multiple of GBs). So I have developed small tool which can help to read or view a big file which cannot be opened in normal text editor. This tool can also be use full if you are developing SSIS package to load big file and want to create small sample file from big file during development.
Click here to download the tool.
Features of the tools
- Load big file (no limit on size)
- View data page by page. Page size is configurable.
- Save page in different file.
- Go to any page in file.
- Displays line number across pages.
Limitations.
- It uses Temp directory to store data pages while opening file. So virtually it can occupy similar disk space as the original file.
- It can only read file line by line. So if lines are not separated by {CR}{LF} then it cannot do data paging.
- It can only work with ANSI data.
Read big data file | View big data file | Open big data file | tool to open big file | tool to read big data file | Preview big data file | Get sample data from big data file | Read specific data from big file | View specific data from big file
Click here to download the tool.
Features of the tools
- Load big file (no limit on size)
- View data page by page. Page size is configurable.
- Save page in different file.
- Go to any page in file.
- Displays line number across pages.
Limitations.
- It uses Temp directory to store data pages while opening file. So virtually it can occupy similar disk space as the original file.
- It can only read file line by line. So if lines are not separated by {CR}{LF} then it cannot do data paging.
- It can only work with ANSI data.
Read big data file | View big data file | Open big data file | tool to open big file | tool to read big data file | Preview big data file | Get sample data from big data file | Read specific data from big file | View specific data from big file
Dimension and Cube
Dimension
A database dimension is a collection of attributes usually related objects, which can be used to provide elaborated information about fact data in cube(s). Typical example of attributes in a Employee dimension might be Employee name, Employee Designation, Employee Qualification, Joining Date etc. Dimensions are linked to one or more columns in one or more tables in a data source view. By default, these attributes are visible as attribute hierarchies and can be used to understand the fact data in a cube. Attributes can be organized into user-defined hierarchies that provide navigational paths to assist users when browsing the data in a cube.
Click here for step by step instruction to create dimension in using existing table in SSDT.
Click here for step by step instruction to create dimension in using Dimension wizard in SSDT.
Cube
A cube is a multidimensional object that contains information for analytical purposes; which contains dimensions and measures. Dimensions define the structure of the cube that is used to slice and dice over, and measures provide aggregated numerical values of interest to the end user. As a logical structure, a cube allows a client application to retrieve values, of measures, as if they were contained in cells in the cube; cells are defined for every possible summarized value. A cell, in the cube, is defined by the intersection of dimension members and contains the aggregated values of the measures at that specific intersection.
Click here for step by step instruction to create Cube in using Cube wizard in SSDT.
Friday, 29 November 2013
Data Source and Data Source View
Data Source
A data source in Analysis Services specifies a direct connection to an external data source. In addition to physical location, a data source object specifies the connection string, data provider, credentials, and other properties that control connection behaviour.
Information provided by the data source object is used during the following operations:
Regardless of how you build it, each model requires at least one data source object that specifies a connection to source data. You can create multiple data source objects in a single model to use data from different sources or vary connection properties for specific objects.
Data source objects can be managed independently of other objects in your model. After you create a data source, you can change its properties later, and then pre-process the model to ensure the data is retrieved correctly.
Click here for step by step instruction to create data source.
Data Source View
A data source view (DSV) is an abstraction of a relational data source that becomes the basis of the cubes and dimensions you create in a multidimensional project. The purpose of a DSV is to give you control over the data structures used in your project, and to work independently of the underlying data sources (for example, the ability to rename or concatenate columns without directly modifying the original data source).
You can build multiple data source views in an Analysis Services project or database on one or more data sources, and construct each one to satisfy the requirements for a different solution.
Click here for step by step instruction to create data source view.
A data source in Analysis Services specifies a direct connection to an external data source. In addition to physical location, a data source object specifies the connection string, data provider, credentials, and other properties that control connection behaviour.
Information provided by the data source object is used during the following operations:
- Get schema information and other metadata used to generate data source views into a model.
- Query or load data into a model during processing.
- Run queries against multidimensional or data mining models that use ROLAP storage mode.
- Read or write to remote partitions.
- Connect to linked objects, as well as synchronize from target to source.
- Perform write back operations that update fact table data stored in a relational database.
Regardless of how you build it, each model requires at least one data source object that specifies a connection to source data. You can create multiple data source objects in a single model to use data from different sources or vary connection properties for specific objects.
Data source objects can be managed independently of other objects in your model. After you create a data source, you can change its properties later, and then pre-process the model to ensure the data is retrieved correctly.
Click here for step by step instruction to create data source.
Data Source View
A data source view (DSV) is an abstraction of a relational data source that becomes the basis of the cubes and dimensions you create in a multidimensional project. The purpose of a DSV is to give you control over the data structures used in your project, and to work independently of the underlying data sources (for example, the ability to rename or concatenate columns without directly modifying the original data source).
You can build multiple data source views in an Analysis Services project or database on one or more data sources, and construct each one to satisfy the requirements for a different solution.
Click here for step by step instruction to create data source view.
Thursday, 28 November 2013
SSAS Tutorials : Multidimensional Modeling
SSAS Multidimensional Modelling Tutorials
- Data Source and Data Source View
- Dimension and Cube
- Measures, Attributes and Hierarchies
- Advanced Attribute and Dimension Properties
- Relationships Between Dimensions and Measure Groups
- Calculations
- Key Performance Indicators (KPIs)
- Actions
- Perspectives and Translations
- Administrative Roles
SQL Server Analysis Services (SSAS) business intelligence (BI) semantic modelling options
SQL Server Analysis Services (SSAS) provides three different options to create business intelligence semantic model. Namely those 3 options are tabular, multidimensional and PowerPivot.
Table below summarizes feature available for each modelling option.
See below links to learn about each modelling option.
- Tabular model uses relational database modelling objects such as tables, views and relationships for modeling data, and the xVelocity in-memory analytics engine for storing and calculating data.
- Multidimensional model uses OLAP modelling constructs such as cubes and dimensions and uses MOLAP, ROLAP, or HOLAP storage.
- PowerPivot model is a self-service BI solution that lets business analysts build an analytical data model inside an Excel workbook using the PowerPivot for Excel add-in. PowerPivot also uses xVelocity in-memory analytics engine within Excel and on SharePoint.
Table below summarizes feature available for each modelling option.
Multidimensional | Tabular | PowerPivot | |
Actions | Yes | No | No |
Aggregations | Yes | No | No |
Calculated Measures | Yes | Yes | Yes |
Custom Assemblies | Yes | No | No |
Custom Rollups | Yes | No | No |
Distinct Count | Yes | Yes (via DAX) | Yes (via DAX) |
Drillthrough | Yes | Yes | Yes (detail opens in separate worksheet) |
Hierarchies | Yes | Yes | Yes |
KPIs | Yes | Yes | Yes |
Linked objects | Yes | No | Yes (linked tables) |
Many-to-many relationships | Yes | No | No |
Parent-child Hierarchies | Yes | Yes (via DAX) | Yes (via DAX) |
Partitions | Yes | Yes | No |
Perspectives | Yes | Yes | Yes |
Semi-additive Measures | Yes | Yes | Yes |
Translations | Yes | No | No |
User-defined Hierarchies | Yes | Yes | Yes |
Writeback | Yes | No | No |
See below links to learn about each modelling option.
- Tabular Modelling
- Multidimensional Modelling
- PowerPivot Modelling
Wednesday, 20 March 2013
How to setup database mirroring session in MS SQL server.
Following is step by step guide for setting up database mirroring session. There are 2 major steps involved in setting up database mirroring setup.
Now let's see both of these steps in detail.
Prepare mirror database.
Now your secodary database is ready to start. Now lets see how to configure actual mirroring session.
Configure databae mirroring session.
You can also generate script before clicking on Ok on main tab. Scripit is use full for documentaion and future reference. So it is always advisable to generate script and save somewhere safe, probably in source control.
- Prepare mirror database.
- Configure database mirroring session.
Now let's see both of these steps in detail.
Prepare mirror database.
- Take full backup of the primary database. You can use existing full backup file if you have available one. Before taking full backup make sure that database recovery model is set to full. To check recovery model Right click on database - Click on property - Select Options Tab.
- Restore full back from Step - 1. This restore must be done using WITH NORECOVERY option. And resored database name must be same as primary database.
- Restore differential backup (if any) taken after full backup used to restore in step 2. Also restore log backup (if any) taken after full backup or differential backup. All these restore must use WITH NORECOVERY option.
Now your secodary database is ready to start. Now lets see how to configure actual mirroring session.
Configure databae mirroring session.
- Open database mirroring setup wizard in SSMS. Right click on database - Select Tasks - Click on Mirror... OR Right click on databaes - Click on Property - Select Mirroring Tab
- Open Databaes Mirroring Security Wizard by clicking on Configure Secutity... button.
- Click on Next. To make things simple we will skip witness setup option. So select No on Include Witness Server page and click next. Deselect witness server instance and click next.
- This will open Primary server Instance wizard. Enter end point name and Listener port (leave default for this example) and click on Next.
- Click on connect to Secodary Serever. Provide authentication details and connect to secodary server. Leave other details as default and click on next.
- Now your session is almost ready. Click on finish. Which popup below window. You can start mirroring immidiately or start later form main mirroing tab. See below screen.
T-SQL reference for Database Mirroring
- How to manually failover to partner server?
First is from primary server server and second is from secondary server. See below script for partner failover.
--1. Manual Failover from Primary
server
ALTER DATABASE <Database_Name> SET PARTNER FAILOVER;
--2. Manual Failover from Secondary
server
ALTER DATABASE <Database_Name> SET PARTNER FORCE_SERVICE_ALLOW_DATA_LOSS;Failover from primary server does not cuase any data loss but if you do it from secondary server then you are forcing secondary server to loss data. Data loss will only happend if primary server is not accesible while failover.
- How to remove database mirroring session?
--Remove mirroring for database
ALTER DATABASE <Database_Name> SET PARTNER OFF;- How to stop and start databaes mirroring session?
Use below script to pause and resume the database mirroring session. You can SUSPEND mirroring session from either primary server or from secodary server. But It can only be RESUMEd from primary server.
--Pause/Suspend database mirroring session
ALTER DATABASE <Database_Name> SET PARTNER SUSPEND;
--Resume database mirroring session
ALTER DATABASE <Database_Name> SET PARTNER RESUME;
- How to add/remove witness server to/from already running database mirroring session?
--Add witness server to database mirroring session
ALTER DATABASE <Database_Name> SET WITNESS <witness_server>;
--Remove witness server from database mirroring session
ALTER DATABASE <Database_Name> SET WITNESS OFF;SET WITNESS is only allowed on primary server.
Wednesday, 13 March 2013
Convert rows into columns using PIVOT and columns into rows using UNPIVOT
You might have used pivot functionality in Microsoft Excel. Same you can do in MS SQL using PIVOT and UNPIVOT statement. PIVOT is used to convert values in rows to use as column header and UNPIVOT does the reverse of PIVOT. See below example for better understanding.
VALUES('Bike', '2013-01-04', 56)
,('Bike', '2013-02-25', 45)
,('Bike', '2013-01-26', 87)
,('Bike', '2013-03-01', 12)
,('Bike', '2013-05-04', 62)
,('Bike', '2013-05-04', 73)
,('Bike', '2013-04-27', 34)
,('Scooter', '2013-03-12', 13)
,('Scooter', '2013-04-05', 75)
,('Scooter', '2013-01-09', 24)
,('Scooter', '2013-01-16', 76)
,('Scooter', '2013-01-16', 23)
,('Scooter', '2013-05-14', 36)
,('Scooter', '2013-01-16', 68)
,('Scooter', '2013-02-16', 88)
(SELECT ProductName, LEFT(DATENAME (month, SaleDate), 3) [monthName], ISNULL(SalesQty, 0) AS SalesQty FROM #sales) AS A
PIVOT
(
SUM(SalesQty) --Aggregate function
FOR [monthName] IN (Jan, Feb, Mar, Apr, May) --Column list
) AS B
PIVOT has two important section first is aggregate function (see comment in above query) and second is FOR section which contains list of column names and original column which contains the column names.
See below for UNPIVOT example. I have used ouput of PIVOT query as input of UNPIVOT query.
IF OBJECT_ID('tempdb..#sales_PIVOT') IS NOT NULL
One important point to remember here is UNPIVOT is reverse process of PIVOT, but only limited for conversion of columns to rows. It does not regenerate the original result. This is due to aggregate function used by PIVOT. In our example we have used SUM function and there is no reverse function available for SUM to generate original value. There are some senarios where we can generate the original output using UNPIVOT. Please post if you have such example.
--PIVOT examle
IF OBJECT_ID('tempdb..#sales') IS NOT NULL
DROP TABLE #sales
CREATE TABLE #sales (ProductName VARCHAR(200), SaleDate DATE, SalesQty INT)
INSERT INTO #sales(ProductName, SaleDate, SalesQty)VALUES('Bike', '2013-01-04', 56)
,('Bike', '2013-02-25', 45)
,('Bike', '2013-01-26', 87)
,('Bike', '2013-03-01', 12)
,('Bike', '2013-05-04', 62)
,('Bike', '2013-05-04', 73)
,('Bike', '2013-04-27', 34)
,('Scooter', '2013-03-12', 13)
,('Scooter', '2013-04-05', 75)
,('Scooter', '2013-01-09', 24)
,('Scooter', '2013-01-16', 76)
,('Scooter', '2013-01-16', 23)
,('Scooter', '2013-05-14', 36)
,('Scooter', '2013-01-16', 68)
,('Scooter', '2013-02-16', 88)
SELECT
ProductName, Jan,
Feb, Mar, Apr, May
FROM (SELECT ProductName, LEFT(DATENAME (month, SaleDate), 3) [monthName], ISNULL(SalesQty, 0) AS SalesQty FROM #sales) AS A
PIVOT
(
SUM(SalesQty) --Aggregate function
FOR [monthName] IN (Jan, Feb, Mar, Apr, May) --Column list
) AS B
IF OBJECT_ID('tempdb..#sales') IS NOT NULL
DROP TABLE #sales
--UNPIVOT example
IF OBJECT_ID('tempdb..#sales') IS NOT NULL
DROP TABLE #sales
IF OBJECT_ID('tempdb..#sales_PIVOT')
IS NOT NULL
DROP TABLE #sales_PIVOT
CREATE TABLE #sales (ProductName VARCHAR(200), SaleDate DATE, SalesQty INT)
INSERT INTO #sales(ProductName, SaleDate, SalesQty)
VALUES('Bike', '2013-01-04', 56)
,('Bike', '2013-02-25', 45)
,('Bike', '2013-01-26', 87)
,('Bike', '2013-03-01', 12)
,('Bike', '2013-05-04', 62)
,('Bike', '2013-05-04', 73)
,('Bike', '2013-04-27', 34)
,('Scooter', '2013-03-12', 13)
,('Scooter', '2013-04-05', 75)
,('Scooter', '2013-01-09', 24)
,('Scooter', '2013-01-16', 76)
,('Scooter', '2013-01-16', 23)
,('Scooter', '2013-05-14', 36)
,('Scooter', '2013-01-16', 68)
,('Scooter', '2013-02-16', 88)
SELECT
ProductName, Jan,
Feb, Mar, Apr, May
INTO
#sales_PIVOT
FROM
(SELECT
ProductName, LEFT(DATENAME (month, SaleDate), 3) [monthName], ISNULL(SalesQty, 0) AS SalesQty FROM #sales) AS A
PIVOT
(
SUM(SalesQty)
FOR
[monthName] IN (Jan, Feb, Mar, Apr, May)
) AS B
SELECT
ProductName, [MonthName], SalesQty
FROM
(SELECT
ProductName, Jan,
Feb, Mar, Apr, May FROM
#sales_PIVOT) AS
A
UNPIVOT
(
SalesQty FOR
[monthName] IN (Jan, Feb, Mar, Apr, May)
) AS B
IF OBJECT_ID('tempdb..#sales') IS NOT NULL
DROP TABLE #sales
IF OBJECT_ID('tempdb..#sales_PIVOT') IS NOT NULL
DROP TABLE
#sales_PIVOT
One important point to remember here is UNPIVOT is reverse process of PIVOT, but only limited for conversion of columns to rows. It does not regenerate the original result. This is due to aggregate function used by PIVOT. In our example we have used SUM function and there is no reverse function available for SUM to generate original value. There are some senarios where we can generate the original output using UNPIVOT. Please post if you have such example.
Friday, 8 March 2013
Script to get the dababase backup history.
Often I have to get the backup history for reporting purpose. So I have writtent small script which give all details of database back history which is required for my report. I have used backupset and backupmediafamily tables to get the database backup history. Both of these tables resides in msdb database. So for some reason if you have restored msdb database. Then it will show history till the restored database time. Please see below script to get the database backup history.
SELECT
A.database_name
,CASE A.[type] WHEN 'D' THEN 'Full' WHEN 'I' THEN 'Differential' WHEN 'L' THEN 'Transaction Log'
END AS
backup_type
,B.physical_device_name
,CAST(CAST(A.backup_size / (1024*1024) AS INT) AS VARCHAR) + ' ' + 'MB' AS size
,CAST(DATEDIFF(S, A.backup_start_date, A.backup_finish_date) AS VARCHAR) + ' ' + 'Seconds' time_taken
,A.backup_start_date
FROM msdb.dbo.backupset A
INNER JOIN msdb.dbo.backupmediafamily B ON
A.media_set_id =
B.media_set_id
ORDER BY A.database_name
, backup_start_date DESC
This script gives history of all database, if you need details for only one database then restict result by using WHERE caluse on database_name field.
For more information on backupset and backupmediafamily tables, see below links.
How to load data from PDF file in SSIS package using script component
I have come across the case where I have to read PDF
document, parse it and write content to database using SSIS. PDF document is
not usual source for any data so there is no component available to read PDF
file from SSIS. I have used iTextSharp library
in Script Component to read the PDF file from SSIS package. I also found very good link where I found good
example on how to use iTextsharp library.
Using iTextsharp library it is very easy to process PDF document. In my case it was simple case of reading PDF document page by page and parses the content using custom logic. You can do much more than just reading PDF document using this library.
To read data from PDF file and write to database, I have
used Script Component as data source, which reads the PDF document and return
result and output record of Script Component. The simply used OLE DB
destination to write it to database. You might be aware that any third party
dlls can be used in SSIS Script component by referencing it. But to add it as
reference and use it in script component you have to register dll in GAC. It
can be referenced without adding it but dll might not be available on the production
environment at the same location. So better add them to GAC. In my case I was
not allowed to add dll to production server directly, so I used .Net reflection
to load the dll dynamically and added dll location as configuration in SSIS
package.
Below is my code
//Below two variable are used to pass ITextSharp
dll location and PDF file path.
//Which ultimately supplied from dts config file.String configITextPath = Variables.itextdllfilepath;
String configSourceDataFile = Variables.filepath;
//Load ITextSharp library
Assembly assembly = Assembly.LoadFrom(configITextPath);
//Get PdfReader object and create its instance
Type readerType = assembly.GetType("iTextSharp.text.pdf.PdfReader");object reader = Activator.CreateInstance(readerType, new object[] { configSourceDataFile });
Type extractorType = assembly.GetType("iTextSharp.text.pdf.parser.PdfTextExtractor");
Type extractionStrategyType = assembly.GetType("iTextSharp.text.pdf.parser.LocationTextExtractionStrategy");
object
extractionStrategy = Activator.CreateInstance(extractionStrategyType);
//Get number of pages in PDF document
Int32 pages = (Int32)reader.GetType().GetProperty("NumberOfPages").GetValue(reader, null);
//Read PDF document page by page and process it and generate Output rows
for script component.
String pageText;List<PepRecord> pepRecords = new List<PepRecord>();
for (int page = configStartPageNumber; page <= pages; page++)
{
reader = Activator.CreateInstance(readerType, new object[] { configSourceDataFile });
extractionStrategy = Activator.CreateInstance(extractionStrategyType);
pageText = (String)extractorType.GetMethods()[0].Invoke(reader, new Object[] { reader, page, extractionStrategy });
//code to process PDF file content and create Output rows for script component
}
I know better way to do this is to add dll in GAC and refer
from there. But in my case I have to quickly found the solution to process few
PDF document suing SSIS package, so I have used reflection to load it.
Subscribe to:
Posts (Atom)