Data Warehousing con Data Warehousing con ORACLE ORACLE • Viste materializzate / Snapshots Viste materializzate / Snapshots • Star query Star query • Tabelle partizionate Tabelle partizionate • Tabelle Organizzate ad Indice Tabelle Organizzate ad Indice • Indici BitMapped Indici BitMapped • Cluster Cluster • Indici Function Based Indici Function Based • Transportable Tablespaces Transportable Tablespaces
Transcript
Slide 1
Data Warehousing con ORACLE Viste materializzate /
SnapshotsViste materializzate / Snapshots Star queryStar query
Tabelle partizionateTabelle partizionate Tabelle Organizzate ad
IndiceTabelle Organizzate ad Indice Indici BitMappedIndici
BitMapped ClusterCluster Indici Function BasedIndici Function Based
Transportable TablespacesTransportable Tablespaces
Slide 2
DataBase TablespaceDataFile Oracle Block O.S. Block Owner
Schema SegmentExtent Table Index Cluster Snapshot Oracle
Architectural Components (Logical) Partition
Slide 3
Partitioning offers these advantages: Partitioning enables data
management operations such 1.data loads, 2.Index creation and
rebuilding 3.backup/recovery at the partition level, rather than on
the entire table. This results in significantly reduced times for
these operations. Partitioning improves query performance. In many
cases, the results of a query can be achieved by accessing a subset
of partitions, rather than the entire table. For some queries, this
technique (called partition pruning) can provide order-of-magnitude
gains in performance. Partitioning can significantly reduce the
impact of scheduled downtime for maintenance operations. Partition
independence for partition maintenance operations lets you perform
concurrent maintenance operations on different partitions of the
same table or index.
Slide 4
Partitioning offers these advantages: Partitioning increases
the availability of mission-critical databases if critical tables
and indexes are divided into partitions to reduce the maintenance
windows, recovery times, and impact of failures. Partitioning can
be implemented without requiring any modifications to your
applications. For example, you could convert a nonpartitioned table
to a partitioned table without needing to modify any of the SELECT
statements or DML statements which access that table. You do not
need to rewrite your application code to take advantage of
partitioning.
Slide 5
Slide 6
Partitioning Methods Oracle provides the following partitioning
methods: 1. Range Partitioning 2. List Partitioning 3. Hash
Partitioning 4. Composite Partitioning Composite partitioning is a
combination of other partitioning methods.
Slide 7
Range Partitioning Range partitioning maps data to partitions
based on ranges of partition Key Values that you establish for each
partition. It is the most common type of partitioning and is often
used with dates. When using range partitioning, there are a few
rules to keep in mind: 1) Each partition has a VALUES LESS THAN
clause, which specifies a non inclusive upper bound for the
partitions. Any binary values of the partition key equal to or
higher than this literal are added to the next higher partition. 2)
All partitions, except the first, have an implicit lower bound
specified by the VALUES LESS THAN clause on the previous partition.
3) A MAXVALUE literal can be defined for the highest partition.
MAXVALUE represents a virtual infinite value that sorts higher than
any other possible value for the partition key, including the null
value.
Slide 8
CREATE TABLE sales_range ( salesman_id NUMBER(5), salesman_name
VARCHAR2(30), sales_amount NUMBER(10), sales_date DATE) PARTITION
BY RANGE(sales_date) ( PARTITION sales_jan2000 VALUES LESS
THAN(TO_DATE(01/02/2000,DD/MM/YYYY)), PARTITION sales_feb2000
VALUES LESS THAN(TO_DATE(01/03/2000,DD/MM/YYYY)), PARTITION
sales_mar2000 VALUES LESS THAN(TO_DATE(01/04/2000,DD/MM/YYYY)),
PARTITION sales_apr2000 VALUES LESS
THAN(TO_DATE(01/05/2000,DD/MM/YYYY)), )
Slide 9
List Partitioning List partitioning enables you to explicitly
control how rows map to partitions. You do this by specifying a
list of discrete values for the partitioning key in the description
for each partition. This is different from range partitioning,
where a range of values is associated with a partition and from
hash partitioning, where a hash function controls the
row-to-partition mapping. The advantage of list partitioning is
that you can group and organize unordered and unrelated sets of
data in a natural way. CREATE TABLE sales_list (salesman_id
NUMBER(5), salesman_name VARCHAR2(30), sales_state VARCHAR2(20),
sales_amount NUMBER(10), sales_date DATE) PARTITION BY
LIST(sales_state) ( PARTITION sales_west VALUES IN (California,
Hawaii), PARTITION sales_east VALUES IN (New York, Virginia,
Florida), PARTITION sales_cent VALUES IN (Texas, Illinois) )
Slide 10
Hash Partitioning Hash partitioning enables easy partitioning
of data that does not lend itself to range or list partitioning. It
does this with a simple syntax and is easy to implement. It is a
better choice than range partitioning when: 1) You do not know
beforehand how much data will map into a given range 2) The sizes
of range partitions would differ quite substantially or would be
difficult to balance manually 3) Range partitioning would cause the
data to be undesirably clustered 4) Performance features such as
parallel DML are important The concepts of splitting, dropping or
merging partitions do not apply to Hash partitions. Instead, hash
partitions can be added and coalesced. CREATE TABLE sales_hash
(salesman_id NUMBER(5), salesman_name VARCHAR2(30), sales_amount
NUMBER(10), week_no NUMBER(2)) PARTITION BY HASH(salesman_id)
PARTITIONS 4 STORE IN (tbs1, tbs2, tbs3, tbs4);
Slide 11
Maintaining Partitions This section describes how to perform
the following specific partition maintenance operations: 1.Moving
Partitions 2.Adding Partitions 3.Dropping Partitions 4.Coalescing
Partitions 5.Modifying Partition Default Attributes 6.Truncating
Partitions 7.Splitting Partitions 8.Merging Partitions 9.Exchanging
Table Partitions
Slide 12
Coalescing Free Space Space for tablespace segments is managed
using extents, which are made up of a specific number of contiguous
data blocks. Thus, a larger free extent can be fragmented, or
smaller contiguous free extents can be coalesced into one larger
free extent. However, continuous allocation and deallocation of
free space fragments your tablespace and makes allocation of larger
extents more difficult. By default, SMON (system monitor) processes
incrementally coalesce the free extents of tablespaces in the
background.
Index-Organized Tables An index-organized table has a storage
organization that is a variant of a primary B-tree. Unlike an
ordinary (heap-organized) table whose data is stored as an
unordered collection (heap), data for an index-organized table is
stored in a B-tree index structure in a primary key sorted manner.
Besides storing the primary key column values of an index-organized
table row, each index entry in the B-tree stores the nonkey column
values as well.
Slide 15
Index-Organized Tables Compared with Regular Tables Faster
key-based access to table data Faster key-based access to table
data Reduced storage requirements Reduced storage requirements Main
restrictions: Main restrictions: Must have a primary key Cannot use
unique constraints Cannot be clustered Faster key-based access to
table data Faster key-based access to table data Reduced storage
requirements Reduced storage requirements Main restrictions: Main
restrictions: Must have a primary key Cannot use unique constraints
Cannot be clustered
Slide 16
Secondary Indexes on Index-Organized Tables Secondary index
support on index-organized tables provides efficient access to
index-organized table using columns that: 1) are not the primary
key 2) are not prefix of the primary key. Oracle constructs
secondary indexes on index-organized tables using logical row
identifiers (logical rowids) that are based on the tables primary
key. A logical rowid optionally can includes a physical guess,
which identifies the block location of the row. Oracle can use
these physical guesses to probe directly into the leaf block of the
index-organized table, bypassing the primary key search. Because
rows in index-organized tables do not have permanent physical
addresses, the physical guesses can become stale when rows are
moved to new blocks. A) Without physical guesses, access involves
two index scans: a secondary index scan followed by a scan of the
primary key index. B) With accurate physical guesses, access
involves a secondary index scan and an additional I/O to fetch the
data block containing the row. C) With inaccurate physical guesses,
access involves a secondary index scan and an I/O to fetch the
wrong data block (as indicated by the physical guess), followed by
a scan of the primary key index.
Bitmap Indexes Bitmap indexes are widely used in data
warehousing environments. The environments typically have: large
amounts of data and ad hoc queries, but a low level of concurrent
DML transactions and a low level of cardinality. For such
applications, bitmap indexing provides: 1) Reduced response time
for large classes of ad hoc queries 2) Reduced storage requirements
compared to other indexing techniques 3) Dramatic performance gains
even on hardware with a relatively small number of CPUs or a small
amount of memory 4) Efficient maintenance during parallel DML and
loads Fully indexing a large table with a traditional B-tree index
can be prohibitively expensive in terms of space because the
indexes can be several times larger than the data in the
table.
Slide 19
The following shows a portion of a company's customers table.
SELECT cust_id, cust_gender, cust_marital_status, cust_income_level
FROM customers; CUST_ID C CUST_MARITAL_STATUS CUST_INCOME_LEVEL
---------- - -------------------- --------------------- 70 F D:
70,000 - 89,999 80 F married H: 150,000 - 169,999 90 M single H:
150,000 - 169,999 100 F I: 170,000 - 189,999 110 F married C:
50,000 - 69,999 120 M single F: 110,000 - 129,999 130 M J: 190,000
- 249,999 140 M married G: 130,000 - 149,999 Because
cust_gender,cust_marital_status, and cust_income_level are all
low-cardinality columns (there are only three possible values for
marital status and region, two possible values for gender, and 12
for income level), bitmap indexes are ideal for these columns. Do
not create a bitmap index on cust_id because this is a unique
column. Instead, a unique B-tree index on this column provides the
most efficient representation and retrieval.
Slide 20
ClusterCluster Use clusters to store one or more tables that:
1) Are primarily queried 2) Not predominantly inserted into or
updated 3) Which the queries often join data of multiple tables in
the cluster. Are an optional method of storing table data. A
cluster is a group of tables that share the same data blocks
because they share common columns and are often used together.
Slide 21
Because clusters store related rows of different tables
together in the same data blocks, properly used clusters offers
these benefits: 1)Disk I/O is reduced for joins of clustered
tables. 2) Access time improves for joins of clustered tables. 3)
In a cluster, a cluster key value is the value of the cluster key
columns for a particular row. Each cluster key value is stored only
once each in the cluster and the cluster index, no matter how many
rows of different tables contain the value. Therefore, less storage
is required to store related table and index data.
ClusterCluster
CREATE INDEX emp_dept_index ON CLUSTER emp_dept TABLESPACE
user_indx STORAGE (INITIAL n NEXT m MINEXTENTS 1 MAXEXTENTS 121
PCTINCREASE 0) PCTFREE 5; A cluster index must be created before
any rows can be inserted into any clustered table. The following
statement creates a cluster index for the emp_dept cluster:
Slide 24
Viewing Information About Clusters The following views display
information about clusters:
Slide 25
Index Function Based Features of Function-Based Indexes
Function-based indexes allow you to: 1) More powerful 2) Precompute
the value of a computationally intensive function and store it in
the index 3) Increase the number of situations where the optimizer
can perform a range scan instead of a full table scan You must have
the following initialization parameters defined to create a
function-based index: *) QUERY_REWRITE_ENABLED set to TRUE *)
COMPATIBLE set to 8.1.0.0.0 or a greater value
Slide 26
Index Function Based Example: Function-Based Index for Case
Insensitive Searches The following statement creates function-based
index idx on table emp based on an uppercase evaluation of the
ename column: CREATE INDEX idx ON emp (UPPER(ename)); Now the
SELECT statement uses the function-based index on UPPER(ename) To
retrieve all employees with names that start with JOH: SELECT *
FROM emp WHERE UPPER(ename) LIKE 'JOH%'; Example: Precomputing
Arithmetic Expressions with a Function-Based Index This statement
creates a function-based index on an expression: CREATE INDEX idx
ON t (a + b * (c - 1), a, b); SELECT statements can use either an
index range scan (in the following SELECT statement the expression
is a prefix of the index) or index full scan (preferable when the
index specifies a high degree of parallelism). SELECT a FROM t
WHERE a + b * (c - 1) < 100;
Slide 27
OLTP Data warehouse Data marts Information distribution Staging
Data Transportation: Transportable Tablespaces
Slide 28
Transporting Tablespaces 1.Make tablespace read-only 2.Export
metadata from source 3.Copy data files to target system 4.Transfer
export file 5.Import metadata into target 6.If necessary, alter the
tablespace to read-write 1.Make tablespace read-only 2.Export
metadata from source 3.Copy data files to target system 4.Transfer
export file 5.Import metadata into target 6.If necessary, alter the
tablespace to read-write
Transporting a Tablespace /u/d1/sales1.dbf /u/d2/sales2.dbf
SALES _ TS /u/d1 /u/d2 /disk1 /disk2 /disk1/sales01.dbf
/disk2/sales02.dbf SALES_TS s980501.dmp Read Only System 1 System
2
Slide 31
Transportable Tablespaces: Uses Moves entire tablespace data
Supports media recovery Source and target databases must : Be on
the same operating system Be on the same operating system Run
Oracle8i, release 8.1, or above Run Oracle8i, release 8.1, or above
Have the same block size Have the same block size Use the same
character set Use the same character set Moves entire tablespace
data Supports media recovery Source and target databases must : Be
on the same operating system Be on the same operating system Run
Oracle8i, release 8.1, or above Run Oracle8i, release 8.1, or above
Have the same block size Have the same block size Use the same
character set Use the same character set
Slide 32
Checking Transport Set DBMS_TTS.TRANSPORT_SET_CHECK(
ts_list=> SALES_TS, incl_constraints=> TRUE); Data dictionary
Objects with references to objects outside the transport set into
table TRANSPORT_SET_VIOLATIONS List of tablespaces Check
referential integrity
Slide 33
Discoverer Architecture User edition Administration edition
Administration edition End User Layer Database (OLTP, Data
Warehouse, Data Mart) Business Area Database Complexity is Hidden
From Users Viewer edition Data Base DWH
Slide 34
TuningTuning
Slide 35
Per ottenere i migliori risultati consigliabile intraprendere
la fase di tuning in parallelo con lattivit di disegno, senza
attendere le fasi di sviluppo e messa in produzione del software.
Il grafico sotto riportato pone in evidenza il rapporto benefici
(Y) / tempo di avvio del tuning (X). Un prospetto con andamento
opposto si potrebbe pensare se, in alternativa ai benefici, sulle
ordinate (Y) si prendessero in considerazione i costi della
attivit. Disegno Sviluppo Produzione
Slide 36
Ecco riportati gli step da seguire, esplicitamente nellordine,
al fine di ottenere una ottima politica di tuning:
Slide 37
Slide 38
Slide 39
Slide 40
Slide 41
Slide 42
Slide 43
Slide 44
Slide 45
Slide 46
Statement SQL Parse Syntax Validation Terminology Validation
Grant / User Validation Execution Plan Bind Describe OK Execute KO
Fetch OK KO Implicit Cursor
Slide 47
Comando SQL Ottimizzatore Manuale Interno delle Regole
Informazioni Statistiche Addizionali Piano di Esecuzione
Slide 48
Analyze Table Index Cluster Name Compute Estimate Delete
Statistics; Sample Integer Rows Percent
DBMS_UTILITY.ANALYZE_SCHEMA( schema VARCHAR2, method VARCHAR2,
estimate_rowsNUMBERDEFAULT NULL, estimate_percent NUMBER DEFAULT
NULL, method_opt VARCHAR2 DEFAULT NULL);
Slide 49
Per abilitare tale metodologia si pu intervenire a livello di
D.B di sessione di singolo statement SQL. Parametro di
inizializzazione OPTIMIZER_MODE RULE(rule-based) CHOOSE(cost-based)
FIRST_ROWS(cost-based - minimizza il response time)
ALL_ROWS(cost-based - minimizza il total execution time) Comando
ALTER SESSION SET OPTIMIZER_MODE Sql Statement Select /*+ choose */
campi from tavole where condizione;
Slide 50
Ottimizzatore Comando SQL Piano di Esecuzione Plan_Table
Slide 51
Slide 52
Slide 53
Slide 54
Automated Performance Tuning System La metodica che si pone
alla base della strategia architettata su quattro step distinti: 1
- Individuazione degli aspetti del Data Base che si desidera porre
sotto analisi, definizione delle regole e rintraccio dei valori
limite. 2 - Collecting data. Acquisizione delle informazioni legate
a: database instance schema environment workload (carico di
lavoro). 3 - Viewing dei collected data. Prospetto, tramite
reportistiche semplici e di immediata comprensione, dei dati
precedentemente raccolti. 4 - Analyzing data / generate
recommendations. Nel caso in cui i valori limite non siano
soddisfatti ecco il sistema intervenire con una serie di utili
consigli se non addirittura con un insieme di risoluzioni
automatiche poste in essere. Sono controlli che rientrano sotto il
nome di "Routine Tuning" da considerarsi come "prevenzione" essendo
gli stessi un help per anticipare quelli che potrebbero divenire
reali problemi prima chegli stessi si presentino con la propria
complessita' risolutiva.
Slide 55
Controllo 1-%Library Cache Misses < 1% select
round(sum(reloads)/sum(pins)*100,2) Col1 from v$librarycache; >
shared_pool_size Controllo 2 -%Data Dictionary Cache Misses <
10% select round(sum(getmisses)/sum(gets)*100,2) Col1 from
v$rowcache; > shared_pool_size
Slide 56
Controllo 3 -Ratio Logico/Fisico Critico > 80% select
round(((1-(a.value/(b.value+c.value)))*100),2) Col1 from v$sysstat
a, v$sysstat b, v$sysstat c where a.name = 'physical reads' --
accessi fisici and b.name = 'db block gets' -- accessi logici and
c.name = 'consistent gets'; -- accessi logici >
db_block_buffer
Slide 57
Controllo 4 -Analisi Aree di Sort Parallelamente all'area
condivisa di ram (SGA), esistono un insieme di aree
dimensionalmente ridotte, non condivise ed in relazione 1:1 con i
processi dedicati al supporto delle connessioni utente (PGA) la cui
funzionalita' principale e' rintracciabile nel supporto alle
politiche di sort e di grouping. Ogni statement che richiama
operazioni quali sort e grouping, sfrutta per il raggiungimento
dell'obiettivo l'area PGA. Nel caso in cui la stessa risulti non
sufficiente per accomodare l'attivit, quest'ultima migra sui
segmenti temporanei opportunamente creati su disco, con un degrado
significativo dei tempi di esecuzione. Non esiste un limite da
utilizzarsi come confronto. Viene demandata alla sensibilit del
DBA, la decisione su di una rianalisi dei parametri di initSID.ora:
sort_area_retained_size dimensione in bytes allocata nella PGA per
potenziali SORT sort_area_size dimensione in bytes allocabile nella
PGA per sicuri SORT. Sono individuate due soglie (min & max)
per cercare di risolvere il maggior numero di attivita'
integralmente in ram.
Slide 58
select name, value from v$sysstat where name in ('sorts
(memory)','sorts (disk)'); Controllo 4 -Analisi Aree di Sort
Slide 59
Controllo 5 -Analisi Oggetti con + 25 extents Il controllo
prosegue concentrandosi sugli oggetti (segmenti) con un livello di
criticit, un livello di frammentazione (numero di extents
associati) elevato e potenzialmente colpevolizzabile di un
peggioramento delle performances del sistema. select owner,
segment_name, segment_type, tablespace_name, extents from
dba_segments where extents > 25 and owner not in ('SYS') order
by owner,segment_type, extents desc;
Slide 60
SQL Trace Facility Sono in grado di fornire informazioni
preziose, per ogni statement SQL chiamato in causa, generando le
seguenti statistiche attive: 1)a livello di sessione (alter session
set sql_trace = TRUE) 2)a livello di intera istanza (parametro di
configurazione del file initSID.ora sql_trace=true) Numero di
parse, execute e fetch Tempo di CPU e tempo di elapsed (trascorso)
Numero di letture logiche e letture fisiche Numero di record
processati archiviate, in formato interno, in un file denominato
trace file
Slide 61
Al fine di prospettare le informazione storicizzate nel file di
trace sopra descritto, occorre dare in pasto lo stesso archivio ad
un formattatore denominato TKPROF la cui sintassi di avvio risulta
essere: TKPROF file_trace_input file_output
SORT=(option1,option2,..) EXECPU, EXEELA, EXEROW PRINT=integer
INSERT=file_scripts_sql_output SYS=boolean TABLE=schema.table
EXPLAIN=user/password RECORD= file_record_output
Slide 62
Analizziamo in dettaglio le tecniche da utilizzarsi al fine di
interpretare loutput del TKPROF di cui a seguire viene prospettato
un esempio. call count cpu elapsed disk query current rows ----
------- ------- --------- -------- -------- ------- ------ Parse 11
0.08 0.18 0 0 0 0 Execute 11 0.23 0.66 0 3 6 2 Fetch 35 6.70 6.83
100 12326 2 824
---------------------------------------------------------------
Total 57 7.01 7.67 100 12329 8 826