[Midnightbsd-cvs] src: usr.bin/file:

laffer1 at midnightbsd.org laffer1 at midnightbsd.org
Tue Dec 9 13:32:15 EST 2008


Log Message:
-----------


Modified Files:
--------------
    src/usr.bin/file:
        config.h (r1.1.1.1 -> r1.2)
        file.1 (r1.1.1.2 -> r1.2)
        magic.5 (r1.1.1.2 -> r1.2)

-------------- next part --------------
Index: magic.5
===================================================================
RCS file: /home/cvs/src/usr.bin/file/magic.5,v
retrieving revision 1.1.1.2
retrieving revision 1.2
diff -L usr.bin/file/magic.5 -L usr.bin/file/magic.5 -u -r1.1.1.2 -r1.2
--- usr.bin/file/magic.5
+++ usr.bin/file/magic.5
@@ -1,9 +1,9 @@
 .\"
-.\" $FreeBSD: src/usr.bin/file/magic.5,v 1.23 2004/12/28 12:29:06 ru Exp $
+.\" $FreeBSD: src/usr.bin/file/magic.5,v 1.28 2007/05/25 09:25:05 ru Exp $
 .\"
 .\" install as magic.4 on USG, magic.5 on V7 or Berkeley systems.
 .\"
-.Dd September 12, 2003
+.Dd February 19, 2006
 .Dt MAGIC 5 "Public Domain"
 .Os
 .Sh NAME
@@ -13,7 +13,7 @@
 This manual page documents the format of the magic file as
 used by the
 .Nm
-command, version 4.12.
+command, version 4.21.
 The
 .Nm file
 command identifies the type of a file using,
@@ -68,6 +68,12 @@
 in the magic match both lower and upper case characters in the
 targer, whereas upper case characters in the magic, only much
 uppercase characters in the target.
+.It pstring
+A pascal style string where the first byte is interpreted as the an
+unsigned length.
+The string is not
+.Dv NUL
+terminated.
 .It date
 A four-byte value interpreted as a
 .Ux
@@ -86,6 +92,14 @@
 interpreted as a
 .Ux
 date.
+.It beldate
+A four-byte value (on most systems) in big-endian byte order,
+interpreted as a
+.Ux Ns -style
+date, but interpreted as local time rather
+than UTC.
+.It bestring16
+A two-byte unicode (UCS16) string in big-endian byte order.
 .It leshort
 A two-byte value (on most systems) in little-endian byte order.
 .It lelong
@@ -101,6 +115,50 @@
 .Ux Ns -style
 date, but interpreted as local time rather
 than UTC.
+.It lestring16
+A two-byte unicode (UCS16) string in little-endian byte order.
+.It melong
+A four-byte value (on most systems) in middle-endian (PDP-11) byte order.
+.It medate
+A four-byte value (on most systems) in middle-endian (PDP-11) byte order,
+interpreted as a
+.Ux
+date.
+.It meldate
+A four-byte value (on most systems) in middle-endian (PDP-11) byte order,
+interpreted as a
+.Ux Ns -style
+date, but interpreted as local time rather
+than UTC.
+.It regex
+A regular expression match in extended
+.Tn POSIX
+regular expression syntax
+(much like egrep).
+The type specification can be optionally followed by
+.Ql /c
+for case-insensitive matches.
+The regular expression is always
+tested against the first
+.Ar N
+lines, where
+.Ar N
+is the given offset, thus it
+is only useful for (single-byte encoded) text.
+.Ql ^
+and
+.Ql $
+will match the beginning and end of individual lines, respectively,
+not beginning and end of file.
+.It search
+A literal string search starting at the given offset.
+It must be followed by
+.Li / Ns Aq Ar number
+which specifies how many matches shall be attempted (the range).
+This is suitable for searching larger binary expressions with variable
+offsets, using
+.Ql \e
+escapes for special characters.
 .El
 .El
 .Pp
@@ -137,11 +195,22 @@
 .Em ^ ,
 to specify that the value from the file must have clear any of the bits
 that are set in the specified value, or
+.Em ~ ,
+the value specified after is negated before tested, or
 .Em x ,
 to specify that any value will match.
 If the character is omitted,
 it is assumed to be
 .Em = .
+For all tests except
+.Dq string
+and
+.Dq regex ,
+operation
+.Em !\&
+specifies that the line matches if the test does
+.Em not
+succeed.
 .It ""
 Numeric values are specified in C form; e.g.\&
 .Em 13
@@ -177,29 +246,35 @@
 .El
 .Pp
 Some file formats contain additional information which is to be printed
-along with the file type.
-A line which begins with the character
+along with the file type or need additional tests to determine the true
+file type.
+These additional tests are introduced by one or more
 .Em >
-indicates additional tests and messages to be printed.
+characters preceding the offset.
 The number of
 .Em >
 on the line indicates the level of the test; a line with no
 .Em >
 at the beginning is considered to be at level 0.
-Each line at level
-.Em n+1
-is under the control of the line at level
-.Em n
-most closely preceding it in the magic file.
-If the test on a line at level
+Tests are arranged in a tree-like hierarchy:
+If a the test on a line at level
 .Em n
-succeeds, the tests specified in all the subsequent lines at level
+succeeds, all following tests at level
 .Em n+1
-are performed, and the messages printed if the tests succeed.
-The next
-line at level
+are performed, and the messages printed if the tests succeed, until a line
+with level
 .Em n
-terminates this.
+(or less) appears.
+For more complex files, one can use empty messages to get just the
+"if/then" effect, in the following way:
+.Bd -literal -offset indent
+0      string   MZ
+>0x18  leshort  <0x40   MS-DOS executable
+>0x18  leshort  >0x3f   extended PC executable (e.g., MS Windows)
+.Ed
+.Pp
+Offsets do not need to be constant, but can also be read from the file
+being examined.
 If the first character following the last
 .Em >
 is a
@@ -216,44 +291,141 @@
 is used as an offset in the file.
 A byte, short or long is read at that offset
 depending on the
-.Em [bslBSL]
+.Em [bslBSLm]
 type specifier.
 The capitalized types interpret the number as a big endian value, whereas
-a small letter versions interpret the number as a little endian value.
+a small letter versions interpret the number as a little endian value;
+the
+.Em m
+type interprets the number as a middle endian (PDP-11) value.
 To that number the value of
 .Em y
 is added and the result is used as an offset in the file.
 The default type
 if one is not specified is long.
 .Pp
-Sometimes you do not know the exact offset as this depends on the length of
-preceding fields.
-You can specify an offset relative to the end of the
-last uplevel field (of course this may only be done for sublevel tests, i.e.\&
-test beginning with
-.Em > Ns ) .
-Such a relative offset is specified using
+That way variable length structures can be examined:
+.Bd -literal -offset indent
+# MS Windows executables are also valid MS-DOS executables
+0           string  MZ
+>0x18       leshort <0x40   MZ executable (MS-DOS)
+# skip the whole block below if it is not an extended executable
+>0x18       leshort >0x3f
+>>(0x3c.l)  string  PE\e0\e0  PE executable (MS-Windows)
+>>(0x3c.l)  string  LX\e0\e0  LX executable (OS/2)
+.Ed
+.Pp
+This strategy of examining has one drawback: You must make sure that
+you eventually print something, or users may get empty output (like, when
+there is neither PE\e0\e0 nor LE\e0\e0 in the above example).
+.Pp
+If this indirect offset cannot be used as-is, there are simple calculations
+possible: appending
+.Em [+-*/%&|^]<number>
+inside parentheses allows one to modify
+the value read from the file before it is used as an offset:
+.Bd -literal -offset indent
+# MS Windows executables are also valid MS-DOS executables
+0           string  MZ
+# sometimes, the value at 0x18 is less that 0x40 but there's still an
+# extended executable, simply appended to the file
+>0x18       leshort <0x40
+>>(4.s*512) leshort 0x014c  COFF executable (MS-DOS, DJGPP)
+>>(4.s*512) leshort !0x014c MZ executable (MS-DOS)
+.Ed
+.Pp
+Sometimes you do not know the exact offset as this depends on the length or
+position (when indirection was used before) of preceding fields.
+You can
+specify an offset relative to the end of the last uplevel field using
 .Em &
-as a prefix to the offset.
+as a prefix to the offset:
+.Bd -literal -offset indent
+0           string  MZ
+>0x18       leshort >0x3f
+>>(0x3c.l)  string  PE\e0\e0    PE executable (MS-Windows)
+# immediately following the PE signature is the CPU type
+>>>&0       leshort 0x14c     for Intel 80386
+>>>&0       leshort 0x184     for DEC Alpha
+.Ed
+.Pp
+Indirect and relative offsets can be combined:
+.Bd -literal -offset indent
+0             string  MZ
+>0x18         leshort <0x40
+>>(4.s*512)   leshort !0x014c MZ executable (MS-DOS)
+# if it's not COFF, go back 512 bytes and add the offset taken
+# from byte 2/3, which is yet another way of finding the start
+# of the extended executable
+>>>&(2.s-514) string  LE      LE executable (MS Windows VxD driver)
+.Ed
+.Pp
+Or the other way around:
+.Bd -literal -offset indent
+0                 string  MZ
+>0x18             leshort >0x3f
+>>(0x3c.l)        string  LE\e0\e0  LE executable (MS-Windows)
+# at offset 0x80 (-4, since relative offsets start at the end
+# of the uplevel match) inside the LE header, we find the absolute
+# offset to the code area, where we look for a specific signature
+>>>(&0x7c.l+0x26) string  UPX     \eb, UPX compressed
+.Ed
+.Pp
+Or even both!
+.Bd -literal -offset indent
+0                string  MZ
+>0x18            leshort >0x3f
+>>(0x3c.l)       string  LE\e0\e0 LE executable (MS-Windows)
+# at offset 0x58 inside the LE header, we find the relative offset
+# to a data area where we look for a specific signature
+>>>&(&0x54.l-3)  string  UNACE  \eb, ACE self-extracting archive
+.Ed
+.Pp
+Finally, if you have to deal with offset/length pairs in your file, even the
+second value in a parenthesed expression can be taken from the file itself,
+using another set of parentheses.
+Note that this additional indirect offset
+is always relative to the start of the main indirect offset.
+.Bd -literal -offset indent
+0                 string       MZ
+>0x18             leshort      >0x3f
+>>(0x3c.l)        string       PE\e0\e0 PE executable (MS-Windows)
+# search for the PE section called ".idata"...
+>>>&0xf4          search/0x140 .idata
+# ...and go to the end of it, calculated from start+length;
+# these are located 14 and 10 bytes after the section name
+>>>>(&0xe.l+(-4)) string       PK\e3\e4 \eb, ZIP self-extracting archive
+.Ed
 .Sh BUGS
 The formats
 .Em long ,
 .Em belong ,
 .Em lelong ,
+.Em melong ,
 .Em short ,
 .Em beshort ,
 .Em leshort ,
 .Em date ,
 .Em bedate ,
+.Em medate ,
+.Em ledate ,
+.Em beldate ,
+.Em leldate ,
 and
-.Em ledate
+.Em meldate
 are system-dependent; perhaps they should be specified as a number
 of bytes (2B, 4B, etc),
 since the files being recognized typically come from
 a system on which the lengths are invariant.
 .Pp
-There is (currently) no support for specified-endian data to be used in
-indirect offsets.
+If
+.Pa /usr/share/misc/magic
+is newer than
+.Pa /usr/share/misc/magic.mgc
+it is not used.
+Use the command:
+.Dq Li "cd /usr/share/misc && file -C -m magic"
+to rebuild.
 .Sh SEE ALSO
 .Xr file 1
 .\"
@@ -269,4 +441,4 @@
 .\" the changes I posted to the S5R2 version.
 .\"
 .\" Modified for Ian Darwin's version of the file command.
-.\" @(#)$Id: magic.man,v 1.27 2003/09/12 19:43:30 christos Exp $
+.\" @(#)$Id: magic.man,v 1.30 2006/02/19 18:16:03 christos Exp $
Index: config.h
===================================================================
RCS file: /home/cvs/src/usr.bin/file/config.h,v
retrieving revision 1.1.1.1
retrieving revision 1.2
diff -L usr.bin/file/config.h -L usr.bin/file/config.h -u -r1.1.1.1 -r1.2
--- usr.bin/file/config.h
+++ usr.bin/file/config.h
@@ -1,69 +1,21 @@
-/* $FreeBSD: src/usr.bin/file/config.h,v 1.9 2004/12/28 04:35:00 obrien Exp $ */
+/* $FreeBSD: src/usr.bin/file/config.h,v 1.13.4.1 2008/02/06 18:26:38 obrien Exp $ */
 
 #include <osreldate.h>
 
-/* config.h.  Generated by configure.  */
+/* config.h.  Generated from config.h.in by configure.  */
 /* config.h.in.  Generated from configure.in by autoheader.  */
-/* Autoheader needs me */
-#define PACKAGE "file"
-
-/* Autoheader needs me */
-#define VERSION "4.12"
 
-/* Define if builtin ELF support is enabled.  */
+/* Use the builtin ELF recognition code */
 #define BUILTIN_ELF 1
 
-/* Define if ELF core file support is enabled.  */
+/* Recognize ELF core files */
 #define ELFCORE 1
 
-/* Define if the `long long' type works.  */
-#define HAVE_LONG_LONG 1
-
-/* Define if we have "tm_zone" in "struct tm".  */
-#define HAVE_TM_ZONE 1
-
-/* Define if we have a global "char * []" "tzname" variable.  */
-#define HAVE_TZNAME 1
-
-/* Define if we have "tm_isdst" in "struct tm".  */
-#define HAVE_TM_ISDST 1
-
-/* Define if we have a global "int" variable "daylight".  */
+/* */
 /* #undef HAVE_DAYLIGHT */
 
-/* Define if we have a mkstemp */
-#define HAVE_MKSTEMP 1
-
-/* Define to `unsigned char' if standard headers don't define.  */
-/* #undef uint8_t */
-
-/* Define to `unsigned short' if standard headers don't define.  */
-/* #undef uint16_t */
-
-/* Define to `unsigned int' if standard headers don't define.  */
-/* #undef uint32_t */
-
-/* Define to `unsigned long long', if available, or `unsigned long', if
-   standard headers don't define.  */
-/* #undef uint64_t */
-
-/* Define to `int' if standard headers don't define.  */
-/* #undef int32_t */
-
-/* FIXME: These have to be added manually because autoheader doesn't know
-   about AC_CHECK_SIZEOF_INCLUDES.  */
-
-/* The number of bytes in a uint8_t.  */
-#define SIZEOF_UINT8_T 1
-
-/* The number of bytes in a uint16_t.  */
-#define SIZEOF_UINT16_T 2
-
-/* The number of bytes in a uint32_t.  */
-#define SIZEOF_UINT32_T 4
-
-/* The number of bytes in a uint64_t.  */
-#define SIZEOF_UINT64_T 8
+/* Define to 1 if you have the <dlfcn.h> header file. */
+#define HAVE_DLFCN_H 1
 
 /* Define to 1 if you have the <fcntl.h> header file. */
 #define HAVE_FCNTL_H 1
@@ -80,12 +32,21 @@
 /* Define to 1 if you have the `z' library (-lz). */
 #define HAVE_LIBZ 1
 
+/* Define to 1 if you have the <limits.h> header file. */
+#define HAVE_LIMITS_H 1
+
 /* Define to 1 if you have the <locale.h> header file. */
 #define HAVE_LOCALE_H 1
 
+/* */
+#define HAVE_LONG_LONG 1
+
 /* Define to 1 if you have the `mbrtowc' function. */
 #define HAVE_MBRTOWC 1
 
+/* Define to 1 if <wchar.h> declares mbstate_t. */
+#define HAVE_MBSTATE_T 1
+
 /* Define to 1 if you have the <memory.h> header file. */
 #define HAVE_MEMORY_H 1
 
@@ -95,6 +56,9 @@
 /* Define to 1 if you have the `mmap' function. */
 #define HAVE_MMAP 1
 
+/* Define to 1 if you have the `snprintf' function. */
+#define HAVE_SNPRINTF 1
+
 /* Define to 1 if you have the <stdint.h> header file. */
 #if __FreeBSD_version >= 500019
 #define HAVE_STDINT_H 1
@@ -112,6 +76,12 @@
 /* Define to 1 if you have the <string.h> header file. */
 #define HAVE_STRING_H 1
 
+/* Define to 1 if you have the `strndup' function. */
+/* #undef HAVE_STRNDUP */
+
+/* Define to 1 if you have the `strtof' function. */
+#define HAVE_STRTOF 1
+
 /* Define to 1 if you have the `strtoul' function. */
 #define HAVE_STRTOUL 1
 
@@ -128,6 +98,9 @@
 /* Define to 1 if you have the <sys/stat.h> header file. */
 #define HAVE_SYS_STAT_H 1
 
+/* Define to 1 if you have the <sys/time.h> header file. */
+#define HAVE_SYS_TIME_H 1
+
 /* Define to 1 if you have the <sys/types.h> header file. */
 #define HAVE_SYS_TYPES_H 1
 
@@ -137,6 +110,9 @@
 /* Define to 1 if you have <sys/wait.h> that is POSIX.1 compatible. */
 #define HAVE_SYS_WAIT_H 1
 
+/* */
+#define HAVE_TM_ISDST 1
+
 /* HAVE_TM_ZONE */
 #define HAVE_TM_ZONE 1
 
@@ -155,12 +131,21 @@
 /* Define to 1 if you have the <utime.h> header file. */
 #define HAVE_UTIME_H 1
 
+/* Define to 1 if you have the `vsnprintf' function. */
+#define HAVE_VSNPRINTF 1
+
 /* Define to 1 if you have the <wchar.h> header file. */
 #define HAVE_WCHAR_H 1
 
+/* Define to 1 if you have the <wctype.h> header file. */
+#define HAVE_WCTYPE_H 1
+
 /* Define to 1 if you have the `wcwidth' function. */
 #define HAVE_WCWIDTH 1
 
+/* Define to 1 if you have the <zlib.h> header file. */
+#define HAVE_ZLIB_H 1
+
 /* Define to 1 if `major', `minor', and `makedev' are declared in <mkdev.h>.
    */
 /* #undef MAJOR_IN_MKDEV */
@@ -187,6 +172,21 @@
 /* Define to the version of this package. */
 #define PACKAGE_VERSION ""
 
+/* */
+#define SIZEOF_INT64_T 8
+
+/* */
+#define SIZEOF_UINT16_T 2
+
+/* */
+#define SIZEOF_UINT32_T 4
+
+/* */
+#define SIZEOF_UINT64_T 8
+
+/* */
+#define SIZEOF_UINT8_T 1
+
 /* Define to 1 if you have the ANSI C header files. */
 #define STDC_HEADERS 1
 
@@ -194,22 +194,47 @@
 /* #undef TM_IN_SYS_TIME */
 
 /* Version number of package */
-#define VERSION "4.12"
+#define VERSION "4.23"
 
 /* Number of bits in a file offset, on hosts where this is settable. */
 /* #undef _FILE_OFFSET_BITS */
 
+/* Enable GNU extensions on systems that have them.  */
+#ifndef __FreeBSD__
+#ifndef _GNU_SOURCE
+# define _GNU_SOURCE 1
+#endif
+#endif
+
 /* Define for large files, on AIX-style hosts. */
 /* #undef _LARGE_FILES */
 
 /* Define to empty if `const' does not conform to ANSI C. */
 /* #undef const */
 
+/* */
+/* #undef int32_t */
+
+/* */
+/* #undef int64_t */
+
 /* Define to a type if <wchar.h> does not define. */
 /* #undef mbstate_t */
 
-/* Define to `long' if <sys/types.h> does not define. */
+/* Define to `long int' if <sys/types.h> does not define. */
 /* #undef off_t */
 
-/* Define to `unsigned' if <sys/types.h> does not define. */
+/* Define to `unsigned int' if <sys/types.h> does not define. */
 /* #undef size_t */
+
+/* */
+/* #undef uint16_t */
+
+/* */
+/* #undef uint32_t */
+
+/* */
+/* #undef uint64_t */
+
+/* */
+/* #undef uint8_t */
Index: file.1
===================================================================
RCS file: /home/cvs/src/usr.bin/file/file.1,v
retrieving revision 1.1.1.2
retrieving revision 1.2
diff -L usr.bin/file/file.1 -L usr.bin/file/file.1 -u -r1.1.1.2 -r1.2
--- usr.bin/file/file.1
+++ usr.bin/file/file.1
@@ -1,6 +1,6 @@
-.\" $FreeBSD: src/usr.bin/file/file.1,v 1.36 2005/02/13 23:45:50 ru Exp $
-.\" $Id: file.man,v 1.54 2003/10/27 18:09:08 christos Exp $
-.Dd October 27, 2003
+.\" $FreeBSD: src/usr.bin/file/file.1,v 1.38 2007/05/25 09:25:05 ru Exp $
+.\" $Id: file.man,v 1.57 2005/08/18 15:18:22 christos Exp $
+.Dd August 18, 2005
 .Dt FILE 1 "Copyright but distributable"
 .Os
 .Sh NAME
@@ -8,7 +8,7 @@
 .Nd determine file type
 .Sh SYNOPSIS
 .Nm
-.Op Fl bcikLnNprsvz
+.Op Fl bchikLnNprsvz
 .Op Fl f Ar namefile
 .Op Fl F Ar separator
 .Op Fl m Ar magicfiles
@@ -17,7 +17,7 @@
 .Fl C
 .Op Fl m Ar magicfile
 .Sh DESCRIPTION
-This manual page documents version 4.12 of the
+This manual page documents version 4.21 of the
 .Nm
 utility which tests each argument in an attempt to classify it.
 There are three sets of tests, performed in this order:
@@ -103,6 +103,13 @@
 or
 .Pa /usr/share/misc/magic
 if the compile file does not exist.
+In addition
+.Nm
+will look in
+.Pa $HOME/.magic.mgc ,
+or
+.Pa $HOME/.magic
+for magic entries.
 .Pp
 If a file does not match any of the entries in the magic file,
 it is examined to see if it seems to be a text file.
@@ -187,6 +194,13 @@
 file result returned.
 Defaults to
 .Ql \&: .
+.It Fl h , -no-dereference
+Causes symlinks not to be followed
+(on systems that support symbolic links).
+This is the default if the
+environment variable
+.Ev POSIXLY_CORRECT
+is not defined.
 .It Fl i , -mime
 Causes the file command to output mime type strings rather than the more
 traditional human readable ones.
@@ -206,8 +220,11 @@
 Do not stop at the first match, keep going.
 .It Fl L , -dereference
 option causes symlinks to be followed, as the like-named option in
-.Xr ls 1 .
+.Xr ls 1
 (on systems that support symbolic links).
+This is the default if the environment variable
+.Ev POSIXLY_CORRECT
+is defined.
 .It Fl m , -magic-file Ar list
 Specify an alternate list of files containing magic numbers.
 This can be a single file, or a colon-separated list of files.
@@ -281,19 +298,35 @@
 Default list of magic numbers, used to output mime types when the
 .Fl i
 option is specified.
-.It Pa /etc/magic
-Local additions to magic wisdom.
 .El
 .Sh ENVIRONMENT
 The environment variable
 .Ev MAGIC
 can be used to set the default magic number file name.
+If that variable is set, then
+.Nm
+will not attempt to open
+.Pa $HOME/.magic .
 .Nm
 adds
 .Pa .mime
 and/or
 .Pa .mgc
 to the value of this variable as appropriate.
+The environment variable
+.Ev POSIXLY_CORRECT
+controls (on systems that support symbolic links), if
+.Nm
+will attempt to follow symlinks or not.
+If set, then
+.Nm
+follows symlink, otherwise it does not.
+This is also controlled
+by the
+.Fl L
+and
+.Fl h
+options.
 .Sh SEE ALSO
 .Xr hexdump 1 ,
 .Xr od 1 ,


More information about the Midnightbsd-cvs mailing list