|
|
 |
 |
 |
 |
ideas needed to track down a mysterious problem
Hi, Currently we are in the process of establishing a diagnostic sheet in order to help us track down a problem. One of our clients is running a 8.1.7.2 database on Solaris. Sometimes the application this database supports comes to a halt. When this happened we made sure that there were no locks to justify this situation and we started to log off everyone, set job_queue_processes to 0, killed the jobs, so basically stopped everyone from using the database. The database was still unresponsive and we could not get a system state dump (it just would not finish doing the dump). After restarting everything became normal. Right now I started to establish a protocol, organizing the todo-s into steps which should be taken before restarting. It is something like: - examine the active sessions - examine the logs - check the CPU and disk resources - etc Any ideas what to add to this list? Any queries that might help finding out what happens? Or better, any ideas why this might happen? What does the fact that a system state dump could not be taken mean to you? From my point of view this could well be an Oracle bug, or anything. The application uses traditional heap organized tables, some global temp tables, advanced queueing, some java to send letters using javamail, dblinks to other systems and usually more than 70 users are logged in. The clients are mainly Magic 9.3, but there are some web apps logging in using PHP, and a java client is calling some stored procedures from time to time. There are two sessions running continously but they are sitting in a DBMS_AQ procedure waiting for messages. Regards, Krisztian
"prunoki" <hegyv @ardents.hu> wrote in message news:1179045404.300974.260720@y80g2000hsf.googlegroups.com...
> Hi, > Currently we are in the process of establishing a diagnostic sheet in > order to help us track down a problem. One of our clients is running a > 8.1.7.2 database on Solaris. Sometimes the application this database > supports comes to a halt. When this happened we made sure that there > were no locks to justify this situation and we started to log off > everyone, set job_queue_processes to 0, killed the jobs, so basically > stopped everyone from using the database. The database was still > unresponsive and we could not get a system state dump (it just would > not finish doing the dump). After restarting everything became normal. > Right now I started to establish a protocol, organizing the todo-s > into steps which should be taken before restarting. It is something > like: > - examine the active sessions > - examine the logs > - check the CPU and disk resources > - etc > Any ideas what to add to this list? Any queries that might help > finding out what happens? Or better, any ideas why this might happen? > What does the fact that a system state dump could not be taken mean to > you? From my point of view this could well be an Oracle bug, or > anything. > The application uses traditional heap organized tables, some global > temp tables, advanced queueing, some java to send letters using > javamail, dblinks to other systems and usually more than 70 users are > logged in. The clients are mainly Magic 9.3, but there are some web > apps logging in using PHP, and a java client is calling some stored > procedures from time to time. There are two sessions running > continously but they are sitting in a DBMS_AQ procedure waiting for > messages. > Regards, > Krisztian
Is it 8.1.7.2 ? Then you really should upgrade to 8.1.7.4 (or a higher version of Oracle, since 8i is desupported). To examine the problem, you could also use STATSPACK. See http://www.oracle-base.com/articles/8i/Statspack8i.php. Easy to install and use. Snapshots can be analysed on www.oraperf.com. Matthias
-----------------------------------------------Reply-----------------------------------------------
On 13 May 2007 01:36:44 -0700, prunoki <hegyv @ardents.hu> wrote: >Right now I started to establish a protocol, organizing the todo-s >into steps which should be taken before restarting. It is something >like: >- examine the active sessions >- examine the logs >- check the CPU and disk resources >- etc >Any ideas what to add to this list? Any queries that might help >finding out what happens? Or better, any ideas why this might happen? >What does the fact that a system state dump could not be taken mean to >you? From my point of view this could well be an Oracle bug, or >anything.
8i is desupported, Oracle still has extended support for 8i, but only for 8.1.7.4. You should upgrade to 8.1.7.4 asap. Other than that you shouldn't resort to Microsoft tactics. Just restarting the system isn't going to help you out. The *very first* thing you should do is examine the alert log. You should above all look for a database which can't archive it's redo logs anymore, as this is one of the primary causes why a database becomes 'hung' Then you should check 'what it is waiting for'. If you can connect, installing statspack might be handy, because it will provide all relevant information. Install it using $ORACLE_HOME/rdbms/admin/spcreate, run statspack.snap two times within 15 minutes, and run $ORACLE_HOME/rdbms/admin/spreport.sql to get relevant output. It is most likely a hung archiver and NOT an Oracle bug. -- Sybrand Bakker Senior Oracle DBA
-----------------------------------------------Reply-----------------------------------------------
sybra@hccnet.nl rta:
> On 13 May 2007 01:36:44 -0700, prunoki <hegyv @ardents.hu> wrote: > >Right now I started to establish a protocol, organizing the todo-s > >into steps which should be taken before restarting. It is something > >like: > >- examine the active sessions > >- examine the logs > >- check the CPU and disk resources > >- etc > >Any ideas what to add to this list? Any queries that might help > >finding out what happens? Or better, any ideas why this might happen? > >What does the fact that a system state dump could not be taken mean to > >you? From my point of view this could well be an Oracle bug, or > >anything. > 8i is desupported, Oracle still has extended support for 8i, but only > for 8.1.7.4. You should upgrade to 8.1.7.4 asap. > Other than that you shouldn't resort to Microsoft tactics. Just > restarting the system isn't going to help you out. > The *very first* thing you should do is examine the alert log. You > should above all look for a database which can't archive it's redo > logs anymore, as this is one of the primary causes why a database > becomes 'hung' > Then you should check 'what it is waiting for'. If you can connect, > installing statspack might be handy, because it will provide all > relevant information. > Install it using $ORACLE_HOME/rdbms/admin/spcreate, run statspack.snap > two times within 15 minutes, and run > $ORACLE_HOME/rdbms/admin/spreport.sql to get relevant output. > It is most likely a hung archiver and NOT an Oracle bug. > -- > Sybrand Bakker > Senior Oracle DBA
Thanks for the ideas so far. I will include checking the alert log in the list. This is a production database, so we must restart it as soon as possible, if all else fails. I also think we should upgrade to 8.1.7.4, but it requires testing which the business people would provide more easily if we knew that the upgrade would help. Since we do not know that for sure, they are a bit reluctant, but we will proceed in this direction if enough people vote for it. Krisztian
-----------------------------------------------Reply-----------------------------------------------
On May 13, 3:46 pm, prunoki <hegyv @ardents.hu> wrote:
> sybra @hccnet.nl rta: > > On 13 May 2007 01:36:44 -0700, prunoki <hegyv@ardents.hu> wrote: > > >Right now I started to establish a protocol, organizing the todo-s > > >into steps which should be taken before restarting. It is something > > >like: > > >- examine the active sessions > > >- examine the logs > > >- check the CPU and disk resources > > >- etc > > >Any ideas what to add to this list? Any queries that might help > > >finding out what happens? Or better, any ideas why this might happen? > > >What does the fact that a system state dump could not be taken mean to > > >you? From my point of view this could well be an Oracle bug, or > > >anything. > > 8i is desupported, Oracle still has extended support for 8i, but only > > for 8.1.7.4. You should upgrade to 8.1.7.4 asap. > > Other than that you shouldn't resort to Microsoft tactics. Just > > restarting the system isn't going to help you out. > > The *very first* thing you should do is examine the alert log. You > > should above all look for a database which can't archive it's redo > > logs anymore, as this is one of the primary causes why a database > > becomes 'hung' > > Then you should check 'what it is waiting for'. If you can connect, > > installing statspack might be handy, because it will provide all > > relevant information. > > Install it using $ORACLE_HOME/rdbms/admin/spcreate, run statspack.snap > > two times within 15 minutes, and run > > $ORACLE_HOME/rdbms/admin/spreport.sql to get relevant output. > > It is most likely a hung archiver and NOT an Oracle bug. > > -- > > Sybrand Bakker > > Senior Oracle DBA > Thanks for the ideas so far. I will include checking the alert log in > the list. > This is a production database, so we must restart it as soon as > possible, if all else fails. > I also think we should upgrade to 8.1.7.4, but it requires testing > which the business people would provide more easily if we knew that > the upgrade would help. Since we do not know that for sure, they are a > bit reluctant, but we will proceed in this direction if enough people > vote for it. > Krisztian
Not having any kind of support from Oracle for your current version isn't enough justification for patching it up to a supported version? And to be more persuasive, you might want to show them decision makers the list of defects fixed in 8.1.7.3 and 8.1.7.4 and all subsequent CPUs and the most important one-offs since 8.1.7.4... Regards, Vladimir M. Zakharychev N-Networks, makers of Dynamic PSP(tm) http://www.dynamicpsp.com
|
 |
 |
 |
 |
|