다운타임의 원인과 해결책들

Oracle Database는 계획되거나 계획되지 않은 다운타임의 원인에 대한 문제를 해결할 수 있도록 설계되었습니다

다음은 시스템에 다운타임이 발생하는 다양한 원인과 솔루션을 정리한 오라클 매뉴얼의 일부 내용입니다.

1.Causes of Downtime


Category	Outage Type	Description	Examples
Unplanned	Computer failure	A computer failure outage occurs when the system running the database becomes unavailable because it has shut down or is no longer accessible.	Database system hardware failure Operating system failure Oracle instance failure Network interface failure
	Storage failure	A storage failure outage occurs when the storage holding some or all of the database contents becomes unavailable because it has shut down or is no longer accessible.	Disk drive failure Disk controller failure Storage array failure
	Human error	A human error outage occurs when there is unintentional or malicious actions committed that cause data within the database to become logically corrupt or unusable. The service level impact of a human error outage can vary significantly depending on the amount and critical nature of the affected data.	Dropped database object Inadvertent data changes Malicious data changes
	Data corruption	A data corruption outage occurs when a hardware or software component causes corrupt data to be read or written to the database. The service level impact of a data corruption outage may vary, from a small portion of the database (down to a single database block) to a large portion of the database (making it essentially unusable).	Operating system or storage device driver, host bus adapter, disk controller, or volume manager error causing bad disk read or writes Stray writes by operating system or other application software
	Site failure	A site failure outage occurs when an event causes all or a significant portion of an application to stop processing or slow to an unusable service level. A site failure may affect all processing at a data center, or a subset of applications supported by a data center.	Extended site-wide power failure Site-wide network failure Natural disaster making a data center inoperable Terrorist or malicious attack on operations or the site
Planned	System changes	Planned system changes occur when performing routine and periodic maintenance operations and new deployments. Planned system changes include any scheduled changes to the operating environment that occur outside the organizational data structure within the database. The service level impact of a planned system change varies significantly depending on the nature and scope of the planned outage, the testing and validation efforts made prior to implementing the change, and the technologies and features in place to minimize the impact.	Adding/removing processors to/from an SMP server Adding/removing nodes to/from a cluster Adding/removing disks drives or storage arrays Changing configuration parameters Upgrading/patching system hardware and software Upgrading/patching Oracle software Upgrading/patching application software System platform migration Database relocation
	Data changes	Planned data changes occur when there are changes to the logical structure or physical organization of Oracle database objects. The primary objective of these changes is to improve performance or manageability.	Table definition changes Adding table partitioning Creating and rebuilding indexes

2.Oracle High Availability Solutions for Unplanned Downtime


Outage Type	Oracle Solution	Benefits	Recovery Time
Computer failures	Fast-Start Fault Recovery	Tunable and predictable cache recovery	Minutes to hours^Foot 1
	RAC	Automatic recovery of failed nodes and instances, fast connection failover, and service failover	No downtime^Foot 2
	Data Guard	Fast Start Failover and fast connection failover	< 1 minute
	Oracle Streams	Online replica database	No downtime²
Storage failures	ASM	Mirroring and online automatic rebalance	No downtime
	RMAN with flash recovery area	Fully managed database recovery and managed disk-based backups	Minutes to hours
	Data Guard	Fast Start Failover and fast connection failover	< 1 minute
	Oracle Streams	Online replica database	No downtime²
Human errors	Oracle security features	Restrict user access as prevention	No downtime
	Oracle Flashback technology	Fine-grained and database-wide rewind capability	< 30 minutes^Foot 3
	LogMiner	Log analysis	Minutes to hours
Data corruptions	HARD	Corruption prevention within a storage array	No downtime
	RMAN with flash recovery area	Online block media recovery and managed disk-based backups	Minutes to hours
	Data Guard	Automatic validation of redo blocks before they are applied, execute fast failover to an uncorrupted standby database	< 1 minute
	Oracle Streams	Online replica database	No downtime²
Site failures	RMAN	Fully managed database recovery and integration with tape management vendors	Hours to days
	Data Guard	Fast Start Failover and fast connection failover	Seconds to 5 minutes^Foot 4
	Oracle Streams	Online replica database	Seconds to 5 minutes⁴

^Footnote 1Recovery time consists largely of the time it takes to restore the failed system.
^Footnote 2Database is still available, but portion of application connected to failed system is affected.
^Footnote 3Recovery time for human errors depend primarily on detection time. If it takes seconds to detect a malicious DML or DLL transaction, it typically only requires seconds to flashback the appropriate transactions. Longer detection time usually leads to longer recovery time required to repair the appropriate transactions. An exception is undropping a table, which is literally instantaneous regardless of detection time.
^Footnote 4Recovery time indicated applies to database and existing connection failover. Network connection changes and other site-specific failover activities may lengthen overall recovery time.

3.Oracle High Availability Solutions for Planned Downtime


Maintenance Type	Oracle Solution	Description	Recovery Time	Considerations
System and hardware upgrades	RAC	To avoid downtime: Dynamically redirect connections and services to a different instance. Shut down target instance. Upgrade target node while other nodes and instances are still available. Start node and instance. Repeat on another node.	No downtime	Need to check for system restrictions. Need to check if the database and clusterware versions are certified with the new system and hardware changes.
Operating system upgrade	RAC	To avoid application downtime: Dynamically redirect connections and services to a different instance. Shut down target instance. Upgrade operating system on target node while other nodes and instances are still available. Start node and instance. Repeat on another node.	No downtime	Need to check if the database and the clusterware versions are certified for both operating system patch releases.
Oracle one-off patches	RAC	"One-off" patches—or interim patches—to database software are usually applied to implement known fixes for software problems, or to apply diagnostic patches to gather information on a problem. Such patch application is often performed during a schedule maintenance outage. Oracle provides the capability to do rolling patch upgrades with RAC with little or no database downtime using the `opatch`command-line utility. A RAC rolling upgrade enables at least some instances of the RAC installation to be available during the scheduled outage required for patch upgrades. Only the RAC instance that is currently being patched needs to be disabled. The other instance can continue to remain available. This means that the impact on the application downtime required for scheduled outages is further reduced. Oracle's `opatch` utility enables the user to apply the patch successively to the different instances in a RAC installation.	No downtime	Rolling upgrade is only available for patches that are certified for rolling upgrades. Typically, patches that can be installed in a rolling upgrade include: Patches that do not affect the contents of the database, such as the data dictionary Patches not related to RAC inter-node communication Patches related to client-side tools such as SQL*Plus, Oracle utilities, development libraries, and Oracle Net Patches that do not change shared database resources, such as datafile headers, control files, and common header definitions of kernel modules RAC cannot be used for rolling upgrade of patch sets.
Storage migration^Foot 1	ASM	ASM enables you to add all disks in one storage array and subsequently drop all disks from another array. ASM will automatically rebalance and migrate data to the new storage while the database remains operational.	No downtime	Before removing the source storage array, ensure that the rebalancing is complete.
System and cluster upgrades	Data Guard	For system upgrades that are not rolling upgradable with RAC due to system restrictions or cluster firmware upgrades that require downtime, leverage Data Guard to switch over to a physical or logical standby database: Issue Data Guard Switchover (only downtime component: optimally seconds to minutes). Shut down initial primary database (now standby). Execute system and cluster upgrade steps. Restart as standby database and allow recovery to synchronize. Optionally issue Data Guard Switchover to return to original database.	Seconds to minutes	For fastest switchover, the standby database should be using real-time apply and synchronized prior to the switchover operation.
Patchset and database upgrades	Data Guard using SQL Apply	Leverage Data Guard using SQL Apply to upgrade an Oracle database: Set up SQL Apply (logical standby database). Upgrade logical standby database to new release. Disconnect applications. Execute Data Guard switchover. Reconnect applications to the new primary database. Shut down initial primary database (now logical standby database). Execute database software upgrade steps. Restart the standby database and allow recovery to synchronize. Optionally issue Data Guard Switchover to return to the original database.	Seconds to minutes	Only supported for Oracle database versions 10.1.0.3 and higher. SQL Apply has some data type restrictions. For more information, see Oracle Data Guard Concepts and Administration.
Database upgrades and platform migration	Transportable tablespace	Transporting a database only requires copying datafile and integration the tablespace structural information. Tablespaces can even be transported between databases from different releases. With Oracle database 10g, tablespaces can be transported across platforms. To perform a database upgrade or platform migration: Create and prepare a separate database using the target release. Transport tablespace from primary database to target database. Only copy datafiles from the source to target if the databases are not on the same storage device. Prepare and open the new production database. If the target database reside on a separate host but on the same platform, create a physical standby database from the initial primary database co-located with the target database. After a Data Guard Switchover, transport the tablespaces from the source to the target without incurring the file transfer time as part of the downtime.^Foot 2	Minutes to hours	Transportable tablespace has limitations and restrictions in regard to character sets, opaque types, and system tablespace objects. Unlike previous solutions, the steps are not automated. Transportable tablespaces do provide the following benefits: Provides an easier and more efficient means for content providers to publish structured data and distribute to customers running Oracle on a different platform Simplifies the distribution of data from a data warehousing environment to data marts that are often running on smaller systems with a different platform Enables the sharing of read-only tablespaces across a heterogeneous cluster
Database upgrades and platform migration	Oracle Streams	Like Data Guard using SQL Apply, Oracle Streams can capture database changes, propagate them to destinations, and apply the changes at these destinations. Oracle Streams is optimized for replicating data and can capture changes locally in the online redo log as it is written. The captured changes can then be propagated asynchronously to replica databases. This optimization can reduce latency and enable the replicas to lag the primary database by no more than a few seconds. Unlike Data Guard using SQL Apply, Oracle Streams enables updates on the replica and provides support for heterogeneous platforms with different database releases. Therefore, Oracle Streams may provide the fastest approach for database upgrades and platform migration.	Seconds to minutes to hours	Oracle Streams also has data type limitations and restrictions, such as for advanced queue and object types. Oracle Streams implementations will require additional investment for setup and configuration since it is designed to be a more flexible architecture.

^Footnote 1An example is migration from traditional storage to low cost storage
^Footnote 2For more information, refer to the best practices white papers available at

http://www.oracle.com/technology/deploy/availability/htdocs/maa.htm.

'DB > ORACLE' 카테고리의 다른 글

MERGE 문 구문 (0)	2014.07.23
implicit query (0)	2014.07.15
파이썬 설치 및 오라클 접속 예제 (0)	2014.06.30
[펌]물리모델링시 Width가 없는 Number형을 쓰지 말아야 할 이유 (0)	2014.06.24
[펌] 네임스페이스에서 이름생성 방법(Within a namespace, no two objects can have the same name) (0)	2014.06.24

린기린기린의 개인 기록 공간

다운타임의 원인과 해결책들

'DB > ORACLE' 카테고리의 다른 글

티스토리툴바

다운타임의 원인과 해결책들

'DB > ORACLE' 카테고리의 다른 글

'DB/ORACLE' Related Articles

티스토리툴바