External Tables in Oracle 10g/11g Database: Part 1

http://www.packtpub.com/article/external-tables-oracle-10g-11g-database-part1

In this two-part article by Hector R. Madrid, we will learn about the External Tables in Oracle 10g/11g Database. When working in data warehouse environments, the Extraction—Transformation—Loading (ETL) cycle frequently requires the user to load information from external sources in plain file format, or perform data transfers among Oracle database in a proprietary format. This requires the user to create control files to perform the load. As the format of the source data regularly doesn't fit the one required by the Data Warehouse, a common practice is to create stage tables that load data into the database and create several queries that perform the transformation from this point on, to take the data to its final destination.


A better approach, would be to perform this transformation 'on the fly' at load time. That is what External Tables are for. They are basically external files, managed either by means of the SQL*Loader or the data pump driver, which from the database perspective can be seen as if they were regular read only tables.

This format allows the user to think about the data source as if the data was already loaded into its stage table. This lets the user concentrate on the queries to perform the transformation, thus saving precious time during the load phase.

The External Table basics
An External Table is basically a file that resides on the server side, as a regular flat file or as a data pump formatted file. The External Table is not a table itself; it is an external file with an Oracle format and its physical location. This feature first appeared back in Oracle 9i Release 1 and it was intended as a way of enhancing the ETL process by reading an external flat file as if it was a regular Oracle table. On its initial release it was only possible to create read-only External Tables, but, starting with 10g—it is possible to unload data to External Tables too.

In previous 10g Releases there was only the SQL*Loader driver could be used to read the External Table, but from 10g onwards it is now possible to load the table by means of the data pump driver. The kind of driver that will be used to read the External Table is defined at creation time. In the case of ORACLE_LOADER it is the same driver used by SQL*Loader. The flat files are loaded in the same way that a flat file is loaded to the database by means of the SQL*Loader utility, and the creation script can be created based on a SQL*Loader control file. In fact, most of the keywords that are valid for data loading are also valid to read an external flat file table.

The main differences between SQL*Loader and External Tables are:

When there are several input datafiles SQL*Loader will generate a bad file and a discard file for each datafile.
The CONTINUEIF and CONCATENATE keywords are not supported by External Tables.
The GRAPHIC, GRAPHIC EXTERNAL, and VARGRAPHIC are not supported for External Tables.
LONG, nested tables, VARRAY, REF, primary key REF, and SID are not supported.
For fields in External Tables the character set, decimal separator, date mask and other locale settings are determined by the database NLS settings.
The use of the backslash character is allowed for SQL*Loader, but for External Tables this would raise an error. External Tables must use quotation marks instead.
For example:
SQL*Loader
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY ""
External Tables
TERMINATED BY ',' ENCLOSED BY "'"



A second driver is available, the ORACLE_DATAPUMP access driver, which uses the Data Pump technology to read the table and unload data to an External Table. This driver allows the user to perform a logical backup that can later be read back to the database without actually loading the data. The ORACLE_DATAPUMP access driver utilizes a proprietary binary format for the external file, so it is not possible to view it as a flat file.

Let's setup the environment
Let's create the demonstration user, and prepare its environment to create an External Table. The example that will be developed first refers to the External Table using the ORACLE_LOADER driver.

create user EXTTABDEMOidentified by ORACLEdefault tablespace USERS;alter user exttabdemoquota unlimited on users;grant CREATE SESSION,CREATE TABLE,CREATE PROCEDURE,CREATE MATERIALIZED VIEW,ALTER SESSION,CREATE VIEW,CREATE ANY DIRECTORYto EXTTABDEMO;
A simple formatted spool from this query will generate the required external table demonstration data. The original source table is the demonstration HR.EMPLOYEES table.

selectEMPLOYEE_ID ||','||DEPARTMENT_ID ||','||FIRST_NAME ||','||LAST_NAME ||','||PHONE_NUMBER ||','||HIRE_DATE ||','||JOB_ID ||','||SALARY ||','||COMMISSION_PCT ||','||MANAGER_ID ||','||EMAILfrom HR.EMPLOYEESorder by EMPLOYEE_ID
The above query will produce the following sample data:



The External Table directory is defined inside the database by means of a DIRECTORY object. This object is not validated at creation time, so the user must make sure the physical directory exists and the oracle OS user has read/write privileges on it.

$ mkdir $HOME/external_table_destSQL> CREATE DIRECTORY EXTTABDIR AS '/home/oracle/external_table_dest';
The above example was developed in a Linux environment, on Windows platforms the paths will need to be changed to correctly reflect how Oracle has been set up.

Now, the first External Table can be created.

A basic External Table
Here is the source code of the External Table creation.



The create table command syntax is just like any other regular table creation (A), (B), up to the point where the ORGANIZATION EXTERNAL (C) keyword appears, this is the point where the actual External Table definition starts. In this case the External Table is accessed by the ORACLE_LOADER driver (D). Next, the external flat file is defined, and here it is declared the Oracle DIRECTORY (E) where the flat file resides. The ACCESS PARAMETERS (F) section specifies how to access the flat file and it declares whether the file is a fixed or variable size record, and which other SQL*Loader loading options are declared. The LOCATION (H) keyword defines the name of the external data file. It must be pointed out that as this is an External Table managed by the SQL_LOADER driver the ACCESS_PARAMETERS section must be defined, in the case of External Tables based on the DATAPUMP_DRIVER this section is not required.

The columns are defined only by name (G), not by type. This is permitted from the SQL*Loader perspective, and allows for dynamic column definition. This column schema definition is more flexible, but it has a drawback—data formats such as those in DATE columns must match the database date format, otherwise the row will be rejected. There are ways to define the date format working around this requirement. Assuming the date column changes from its original default format mask "DD-MON-RR" to "DD-MM-RR", then the column definition must change from a simple CHAR column to a DATE with format mask column definition.

Original format:

"HIRE_DATE" CHAR(255)
Changed format:

"HIRE_DATE" DATE "DD-MM-RR"
When working with an External Table, the access parameter is not validated at creation time, so if there are malformed rows, or if there are improperly defined access parameters, an error is shown, similar to the one below.

When working with an External Table, the access parameter is not validated at creation time, so if there are malformed rows, or if there are improperly defined access parameters, an error is shown, similar to the one below.

ERROR at line 1:ORA-29913: error in executing ODCIEXTTABLEFETCH calloutORA-30653: reject limit reachedORA-06512: at "SYS.ORACLE_LOADER", line 52
Once the data is created and all required OS privileges have been properly validated, the data can be seen from inside the database, just as if it were a regular Oracle table.



This table is read only, so if the user attempts to perform any DML operation against it, it will result in this error:

SQL> delete ext_employees;delete ext_employees*ERROR at line 1:ORA-30657: operation not supported on external organized table
As the error message clearly states, this kind of table is only useful for read only operations.

This kind of table doesn't support most of the operations available for regular tables, such as index creation, and statistics gathering, and these types of operations will cause an ORA-30657 error too. The only access method available for External Tables is Full Table Scan, so there is no way to perform a selective data retrieval operation.

The External Tables cannot be recovered, they are just metadata definitions stored in the dictionary tables. The actual data resides in external files, and there is no way to protect them with the regular backup database routines, so it is the user's sole responsibility to provide proper backup and data management procedures. At the database level the only kind of protection the External Table receives is at the metadata level, as it is an object stored as a definition at the database dictionary level. As the data resides in the external data file, if by any means it were to be corrupted, altered, or somehow modified, there would be no way to get back the original data. If the external data file is lost, then this may go unnoticed, until the next SELECT operation takes place.

This metadata for an External Table is recorded at the {USER | ALL | DBA}_TABLES view, and as this table doesn't actually require physical database storage, all storage related columns appear as null, as well as the columns that relate to the statistical information. This table is described with the {USER | ALL | DBA}_EXTERNAL_TABLES view, where information such as the kind of driver access, the reject_limit, and the access_parameters, amongst others, are described.

SQL> DESC USER_EXTERNAL_TABLESName Null? Type------------------------------- -------- --------------TABLE_NAME NOT NULL VARCHAR2(30)TYPE_OWNER CHAR(3)TYPE_NAME NOT NULL VARCHAR2(30)DEFAULT_DIRECTORY_OWNER CHAR(3)DEFAULT_DIRECTORY_NAME NOT NULL VARCHAR2(30)REJECT_LIMIT VARCHAR2(40)ACCESS_TYPE VARCHAR2(7)ACCESS_PARAMETERS VARCHAR2(4000)PROPERTY VARCHAR2(10)
This is the first basic External Table, and as previously shown, its creation is pretty simple. It allows external data to be easily accessed from inside the database, allowing the user to see the external data just as if it was already loaded inside a regular stage table.

Comments

Popular posts from this blog

Unix Commands - Advanced