SRDB ID   Synopsis   Date
17724   CC: Fatal error in ld: Bus Error (core dumped)   25 Sep 1998

Status Issued

Description
Intermittently (10-50% of the time) when compiling or running make,
the linker will core dump with a bus error.
This occurs usually on very large Enterprise systems (E3000, 4000 or larger).

eg: + Code builds on other 2.6 ystems with 2 & 3 cpus, but link fails
      5% of time on this 4 cpu E5000. They have compared patchlists & say
      they are the same. E5000 has 2 Gig SWAP & 1 Gig RAM

    + Problem has also shown up as part of an Oracle install.

      We have two E4500 systems that are identical except one has an A3000
      disk array and the other has an A7000 disk array.  The local storage is
      identical, and that is where everything except the oracle tablespace
      files are kept.  The root and var filesystems are eencapsulated UFS,
      the primary swap is encapsulated raw, and all other filesystems are VxFS.
      Both systems exhibit the same intermittent linking problems,
      primarily with the modules: wrap, libclntsh.so and oracle.
      The linking of the rdbms module is now failing on an ld command, 
      reporting Bus Error.
      # file core
      core:           ELF 32-bit MSB core file SPARC Version 1, from 'ld'
      # adb core
      core file = core -- program ``ld'' on platform SUNW,Ultra-Enterprise
      SIGBUS: Bus Error
      From the Oracle install log: Bus Error - core dumped
                                   *** Error code 138


    + Using workshop 4.2 and C++ compiler, linker is core dumping.
      He is running 5/98 Solaris 2.6 on an E10000.
     (dbx) where
     =>[1] _memcpy(0x0, 0x0, 0x0, 0x0, 0x0, 0x0) at 0xef6e0814
       [2] _memcpy(0xd78e0000, 0xe8c77e78, 0x10, 0x580, 0x38, 0xd78d9030),
                        at 0xef6e0750
       [3] xlate(0xefffe1ac, 0x2,oxef60316c, 0x7618, 0x2, 0x7618), at 0xef60305c
       [4] wrt32(0x2, 0x0, 0x1, 0xd783bc48, 0xd91e945c, 0xd91ebf84), at 0xef6120d8
       [5] _elf32_update(0x1a8f778, 0x8, 0xd91e8d04, 0xef629b38, 0x1, 0xd911d104),
                        at 0xef6126b4
       [6] create_outfile(0xef6d1328, 0xbfffffff, 0xef635b44, 0x0, 0xefb35b24,
                        0xd90fe85c), at 0xef6a92ec
       [7] ld_main(0x20000000, 0x21x08, 0xef6bf4f2, 0xefd0250, 0xef6d1328, 0x0),
                        at 0xef6b85cc
       [8] main(0x56, oxefffe3ec, 0xef6b811c, 0x118e5, 0xef7c14b0, 0x0) at 0x110c4



Truss of failed run shows 
waitid(P_PGID, 5306, 0xEFFFE0B0, WEXITED|WTRAPPED) (sleeping...)
5316:	    Incurred fault #5, FLTACCESS  %pc = 0xEF724524
5316:	      siginfo: SIGBUS BUS_OBJERR addr=0xEE012000 errno=61441
5316:	    Received signal #10, SIGBUS [default]
5316:	      siginfo: SIGBUS BUS_OBJERR addr=0xEE012000 errno=61441
5316:		*** process killed ***
5314:	waitid(P_PGID, 5306, 0xEFFFE0B0, WEXITED|WTRAPPED) = 0

SOLUTION SUMMARY:
This is a bug in the Veritas VxFS software not the linker.
It does not occur on a UFS file system.

Download the point release 3.2.4 from this web site.
     http://sunsolve2.sun.com/beta/vxfs

The bug is described in bugs 4137397 (4164910 dup) and 4103710

The key bit of information here is the truss output.
5316:	      siginfo: SIGBUS BUS_OBJERR addr=0xEE012000 errno=61441

The Engineer's description from bug 4103710:
    The key's are the 'SIGBUS BUS_OBJERR', this signal is only returned when
    a pagefault occurs as we're mapping in the backup storage from the
    underlying filesystem.  The 'errno=61441' is the error code that
    the underlying filesystem is passing up.  he errorno is the 'error' 
    returned by VOP_ADDMAP() which is provided by the underlying filesystem.

The underlying file system in all cases turned out to be VxFS 3.2.1.1.

INTERNAL SUMMARY:
The point release is in an poorly named directory.
It is not beta software but a valid release from Veritas.
FIN I0401 describes why the patch process regarding 
Veritas software is different.

   Sun is a reseller of the Veritas File System product.  
   It is not a Sun branded product, therefore, nothing is done with the code, 
   the product kit comes from Veritas directly (CD and docs) in turn, 
   Sun re-packages the product and stocks the distribution center 
   with the re-packaged product.

   Veritas does not develop patches, they create point releases (each point
   release can be viewed as a total package).  Since Sun does not have access 
   to the code, we can not put it into a patch format, so Sun has followed 
   Veritas' format and release "point releases".
SUBMITTER: Richard Barker APPLIES TO: Hardware, Operating Systems/Solaris/Solaris 2.x ATTACHMENTS:


Copyright (c) 1997-2003 Sun Microsystems, Inc.