|
Common Problems That Cause TLD.LOG To Overflow
Question:
Common problems that cause tld.log to overflow.
Answer:
(Date: Mar 97: Version B.02.XX) There are three very
common errors that will cause the /hp3070/qm/logdata/tld.log file
to grow very quickly, and eventually cause the tld.log file to overflow.
You may discover an error that goes to your console screen (and
writes over windows) that says that tld.log has exceeded the max
size.
These three common errors are easy to fix.
The first error looks like this:
"*(970407 time):/hp3070/qm/logdata/testerq/970306092821hp3070a1"
We have also seen this error:
log_dis: possible pbq database corruption: cannot open events
for file <bunch of numbers>: not a directory
log_dis: file /hp3070/qm/logdata/pbqmq/<date code><system
name> error updating pbq database! Fix problem, then try again
What's Happening:
The /hp3070/qm/logdata/testerq directory is a temporary storage
area. The translogd daemon processes this data and places it under
individual board directories under the /hp3070/qm/pbq directory.
If the board directories don't exist, but references to them in
the /hp3070/qm/pbq/boards file do exist, the data doesn't have a
place to move to when translogd tries to move it out of the testerq
directory. This situation usually occurs because someone has removed
a board directory in UNIX, rather than removing the board through
the qstats utilities.
How to solve it:
Look in /hp3070/qm/pbq. Compare the list of directories there with
the list of board directories in the "boards" file. Find
the entries in the "boards" file that do not have a directory
under pbq.
Then, delete the entries in the /hp3070/qm/pbq/boards file which
refer to the non-existent board directories. Also remove any files
that remain in testerq that show up in the error messages in the
tld.log file. They usually come in groups that have very similar
timestamps as part of the name, so thisshould be relatively easy
to do.
If, when you deleted entries in the boards file, you removed the
name of a board that you still run production on, run the board
with "learn on" once. The daemon will then recreate the
boards file entries and the board directories and then move the
data to these directories. Then turn "learn off".
You may have removed some boards from the file that don't have
directories any more, and that you don't run production on any more.
You don't need to do anything else with these boards.
If there is a lot of data stacked up under pbqmq, it may take a
while to move all this data. With time, the number of files under
pbqmq should decrease.
The second error looks like this:
log_dis:possible pbq database corruption cannot open events file
for board boardname, not a directory log_dis:file hp3070/qm/logdata/pbqmq/somedatafile
error updating pbq database
What's Happening:
If thisis the case, you'll find files - not directories - in the /hp3070/qm/logdata/testerq
directory. These files will likely be from a new board that you haven't
yet run with "learn on". The "learn on" command
creates the directories where the pbq data is temporarily stored while
it is being processed.
How to fix it:
If you see this error,
- Remove the files
(but not the directories
) under testerq
- Run the board with "learn on" once.
- Turn "learn off"
Then the files should be put in the appropriate board directories.
The third error looks like this:
A third problem produces error messages similar to the second one,
but the cause is a little bit different. The tld.log file will have
error messages in it that look like this:
BAD DIRECTORY PATH IGNORED [970326 14:49]: /hp3070/qm/logdata/testerq/somedatafile
|
* [ 970326 14:49]:
|
/hp3070/qm/logdata/testerq/somedatafile
|
|
* [ 970326 14:50]:
|
/hp3070/qm/logdata/testerq/somedatafile
|
|
* [ 970326 14:50]:
|
/hp3070/qm/logdata/testerq/somedatafile
|
|
* [ 970326 14:51]:
|
/hp3070/qm/logdata/testerq/somedatafile
|
|
* [ 970326 14:51]:
|
/hp3070/qm/logdata/testerq/somedatafile
|
|
* [ 970326 14:52]:
|
/hp3070/qm/logdata/testerq/somedatafile
|
|
* [ 970326 14:52]:
|
/hp3070/qm/logdata/testerq/somedatafile
|
What's Happening:
Every thirty seconds, you'll get another line similar to the above.
The cause of this problem is a file (not a directory ) that's located
in the /hp3070/qm/logdata/testerq directory. This can happen if the
testplan has a Board$ variable set to a null string ("").
We have seen this happen when an operator was prompted to select
a version of a card when they had a multiple-version card - and
the operator entered a null string. Unfortunately, no error checking
was done on the operator's input, so the Board$ string was created
as "".
If the testplan is continually run, several files will be deposited
in the board directory, and you'll get a new error message for each
one every thirty seconds, so your tld.log file will fill up quickly.
How to fix it:
The cure for this problem is to remove the files from the testerq
directory, and fix any testplans that allow the Board$ string to be
set to "".
Additional Notes:
Once you've looked at the problems, and taken steps to solve them,
you may have to shorten your tld.log file. A good way to do this is:
cd /hp3070/qm/logdata
> tld.log
This truncates your tld.log file to 0 length, but preserves the
ownerships and permissions. Make certain that you have solved the
problem before you shorten this file: the information in tld.log
is useful for solving other problems with datalogging, so if you
have a problem that's more complex than the ones described here,
your support person will probably need some of the information from
this file.
Please do not remove your tld.log file!
It must exist so that error messages from translogd can be written
correctly. Otherwise, you'll lose track of anything that's going
wrong with translogd.
If the error messages you're seeing do not match the ones shown
above, please contact your HP Support Representative so that we
can help you solve the problem.
|