ascii-gen

simple ascii image converter
git clone git://git.jakekoroman.com/ascii-gen
Log | Files | Refs | README

stb_image.h (279339B)


      1 /* stb_image - v2.27 - public domain image loader - http://nothings.org/stb
      2                                   no warranty implied; use at your own risk
      3 
      4    Do this:
      5       #define STB_IMAGE_IMPLEMENTATION
      6    before you include this file in *one* C or C++ file to create the implementation.
      7 
      8    // i.e. it should look like this:
      9    #include ...
     10    #include ...
     11    #include ...
     12    #define STB_IMAGE_IMPLEMENTATION
     13    #include "stb_image.h"
     14 
     15    You can #define STBI_ASSERT(x) before the #include to avoid using assert.h.
     16    And #define STBI_MALLOC, STBI_REALLOC, and STBI_FREE to avoid using malloc,realloc,free
     17 
     18 
     19    QUICK NOTES:
     20       Primarily of interest to game developers and other people who can
     21           avoid problematic images and only need the trivial interface
     22 
     23       JPEG baseline & progressive (12 bpc/arithmetic not supported, same as stock IJG lib)
     24       PNG 1/2/4/8/16-bit-per-channel
     25 
     26       TGA (not sure what subset, if a subset)
     27       BMP non-1bpp, non-RLE
     28       PSD (composited view only, no extra channels, 8/16 bit-per-channel)
     29 
     30       GIF (*comp always reports as 4-channel)
     31       HDR (radiance rgbE format)
     32       PIC (Softimage PIC)
     33       PNM (PPM and PGM binary only)
     34 
     35       Animated GIF still needs a proper API, but here's one way to do it:
     36           http://gist.github.com/urraka/685d9a6340b26b830d49
     37 
     38       - decode from memory or through FILE (define STBI_NO_STDIO to remove code)
     39       - decode from arbitrary I/O callbacks
     40       - SIMD acceleration on x86/x64 (SSE2) and ARM (NEON)
     41 
     42    Full documentation under "DOCUMENTATION" below.
     43 
     44 
     45 LICENSE
     46 
     47   See end of file for license information.
     48 
     49 RECENT REVISION HISTORY:
     50 
     51       2.27  (2021-07-11) document stbi_info better, 16-bit PNM support, bug fixes
     52       2.26  (2020-07-13) many minor fixes
     53       2.25  (2020-02-02) fix warnings
     54       2.24  (2020-02-02) fix warnings; thread-local failure_reason and flip_vertically
     55       2.23  (2019-08-11) fix clang static analysis warning
     56       2.22  (2019-03-04) gif fixes, fix warnings
     57       2.21  (2019-02-25) fix typo in comment
     58       2.20  (2019-02-07) support utf8 filenames in Windows; fix warnings and platform ifdefs
     59       2.19  (2018-02-11) fix warning
     60       2.18  (2018-01-30) fix warnings
     61       2.17  (2018-01-29) bugfix, 1-bit BMP, 16-bitness query, fix warnings
     62       2.16  (2017-07-23) all functions have 16-bit variants; optimizations; bugfixes
     63       2.15  (2017-03-18) fix png-1,2,4; all Imagenet JPGs; no runtime SSE detection on GCC
     64       2.14  (2017-03-03) remove deprecated STBI_JPEG_OLD; fixes for Imagenet JPGs
     65       2.13  (2016-12-04) experimental 16-bit API, only for PNG so far; fixes
     66       2.12  (2016-04-02) fix typo in 2.11 PSD fix that caused crashes
     67       2.11  (2016-04-02) 16-bit PNGS; enable SSE2 in non-gcc x64
     68                          RGB-format JPEG; remove white matting in PSD;
     69                          allocate large structures on the stack;
     70                          correct channel count for PNG & BMP
     71       2.10  (2016-01-22) avoid warning introduced in 2.09
     72       2.09  (2016-01-16) 16-bit TGA; comments in PNM files; STBI_REALLOC_SIZED
     73 
     74    See end of file for full revision history.
     75 
     76 
     77  ============================    Contributors    =========================
     78 
     79  Image formats                          Extensions, features
     80     Sean Barrett (jpeg, png, bmp)          Jetro Lauha (stbi_info)
     81     Nicolas Schulz (hdr, psd)              Martin "SpartanJ" Golini (stbi_info)
     82     Jonathan Dummer (tga)                  James "moose2000" Brown (iPhone PNG)
     83     Jean-Marc Lienher (gif)                Ben "Disch" Wenger (io callbacks)
     84     Tom Seddon (pic)                       Omar Cornut (1/2/4-bit PNG)
     85     Thatcher Ulrich (psd)                  Nicolas Guillemot (vertical flip)
     86     Ken Miller (pgm, ppm)                  Richard Mitton (16-bit PSD)
     87     github:urraka (animated gif)           Junggon Kim (PNM comments)
     88     Christopher Forseth (animated gif)     Daniel Gibson (16-bit TGA)
     89                                            socks-the-fox (16-bit PNG)
     90                                            Jeremy Sawicki (handle all ImageNet JPGs)
     91  Optimizations & bugfixes                  Mikhail Morozov (1-bit BMP)
     92     Fabian "ryg" Giesen                    Anael Seghezzi (is-16-bit query)
     93     Arseny Kapoulkine                      Simon Breuss (16-bit PNM)
     94     John-Mark Allen
     95     Carmelo J Fdez-Aguera
     96 
     97  Bug & warning fixes
     98     Marc LeBlanc            David Woo          Guillaume George     Martins Mozeiko
     99     Christpher Lloyd        Jerry Jansson      Joseph Thomson       Blazej Dariusz Roszkowski
    100     Phil Jordan                                Dave Moore           Roy Eltham
    101     Hayaki Saito            Nathan Reed        Won Chun
    102     Luke Graham             Johan Duparc       Nick Verigakis       the Horde3D community
    103     Thomas Ruf              Ronny Chevalier                         github:rlyeh
    104     Janez Zemva             John Bartholomew   Michal Cichon        github:romigrou
    105     Jonathan Blow           Ken Hamada         Tero Hanninen        github:svdijk
    106     Eugene Golushkov        Laurent Gomila     Cort Stratton        github:snagar
    107     Aruelien Pocheville     Sergio Gonzalez    Thibault Reuille     github:Zelex
    108     Cass Everitt            Ryamond Barbiero                        github:grim210
    109     Paul Du Bois            Engin Manap        Aldo Culquicondor    github:sammyhw
    110     Philipp Wiesemann       Dale Weiler        Oriol Ferrer Mesia   github:phprus
    111     Josh Tobin                                 Matthew Gregan       github:poppolopoppo
    112     Julian Raschke          Gregory Mullen     Christian Floisand   github:darealshinji
    113     Baldur Karlsson         Kevin Schmidt      JR Smith             github:Michaelangel007
    114                             Brad Weinberger    Matvey Cherevko      github:mosra
    115     Luca Sas                Alexander Veselov  Zack Middleton       [reserved]
    116     Ryan C. Gordon          [reserved]                              [reserved]
    117                      DO NOT ADD YOUR NAME HERE
    118 
    119                      Jacko Dirks
    120 
    121   To add your name to the credits, pick a random blank space in the middle and fill it.
    122   80% of merge conflicts on stb PRs are due to people adding their name at the end
    123   of the credits.
    124 */
    125 
    126 #ifndef STBI_INCLUDE_STB_IMAGE_H
    127 #define STBI_INCLUDE_STB_IMAGE_H
    128 
    129 // DOCUMENTATION
    130 //
    131 // Limitations:
    132 //    - no 12-bit-per-channel JPEG
    133 //    - no JPEGs with arithmetic coding
    134 //    - GIF always returns *comp=4
    135 //
    136 // Basic usage (see HDR discussion below for HDR usage):
    137 //    int x,y,n;
    138 //    unsigned char *data = stbi_load(filename, &x, &y, &n, 0);
    139 //    // ... process data if not NULL ...
    140 //    // ... x = width, y = height, n = # 8-bit components per pixel ...
    141 //    // ... replace '0' with '1'..'4' to force that many components per pixel
    142 //    // ... but 'n' will always be the number that it would have been if you said 0
    143 //    stbi_image_free(data)
    144 //
    145 // Standard parameters:
    146 //    int *x                 -- outputs image width in pixels
    147 //    int *y                 -- outputs image height in pixels
    148 //    int *channels_in_file  -- outputs # of image components in image file
    149 //    int desired_channels   -- if non-zero, # of image components requested in result
    150 //
    151 // The return value from an image loader is an 'unsigned char *' which points
    152 // to the pixel data, or NULL on an allocation failure or if the image is
    153 // corrupt or invalid. The pixel data consists of *y scanlines of *x pixels,
    154 // with each pixel consisting of N interleaved 8-bit components; the first
    155 // pixel pointed to is top-left-most in the image. There is no padding between
    156 // image scanlines or between pixels, regardless of format. The number of
    157 // components N is 'desired_channels' if desired_channels is non-zero, or
    158 // *channels_in_file otherwise. If desired_channels is non-zero,
    159 // *channels_in_file has the number of components that _would_ have been
    160 // output otherwise. E.g. if you set desired_channels to 4, you will always
    161 // get RGBA output, but you can check *channels_in_file to see if it's trivially
    162 // opaque because e.g. there were only 3 channels in the source image.
    163 //
    164 // An output image with N components has the following components interleaved
    165 // in this order in each pixel:
    166 //
    167 //     N=#comp     components
    168 //       1           grey
    169 //       2           grey, alpha
    170 //       3           red, green, blue
    171 //       4           red, green, blue, alpha
    172 //
    173 // If image loading fails for any reason, the return value will be NULL,
    174 // and *x, *y, *channels_in_file will be unchanged. The function
    175 // stbi_failure_reason() can be queried for an extremely brief, end-user
    176 // unfriendly explanation of why the load failed. Define STBI_NO_FAILURE_STRINGS
    177 // to avoid compiling these strings at all, and STBI_FAILURE_USERMSG to get slightly
    178 // more user-friendly ones.
    179 //
    180 // Paletted PNG, BMP, GIF, and PIC images are automatically depalettized.
    181 //
    182 // To query the width, height and component count of an image without having to
    183 // decode the full file, you can use the stbi_info family of functions:
    184 //
    185 //   int x,y,n,ok;
    186 //   ok = stbi_info(filename, &x, &y, &n);
    187 //   // returns ok=1 and sets x, y, n if image is a supported format,
    188 //   // 0 otherwise.
    189 //
    190 // Note that stb_image pervasively uses ints in its public API for sizes,
    191 // including sizes of memory buffers. This is now part of the API and thus
    192 // hard to change without causing breakage. As a result, the various image
    193 // loaders all have certain limits on image size; these differ somewhat
    194 // by format but generally boil down to either just under 2GB or just under
    195 // 1GB. When the decoded image would be larger than this, stb_image decoding
    196 // will fail.
    197 //
    198 // Additionally, stb_image will reject image files that have any of their
    199 // dimensions set to a larger value than the configurable STBI_MAX_DIMENSIONS,
    200 // which defaults to 2**24 = 16777216 pixels. Due to the above memory limit,
    201 // the only way to have an image with such dimensions load correctly
    202 // is for it to have a rather extreme aspect ratio. Either way, the
    203 // assumption here is that such larger images are likely to be malformed
    204 // or malicious. If you do need to load an image with individual dimensions
    205 // larger than that, and it still fits in the overall size limit, you can
    206 // #define STBI_MAX_DIMENSIONS on your own to be something larger.
    207 //
    208 // ===========================================================================
    209 //
    210 // UNICODE:
    211 //
    212 //   If compiling for Windows and you wish to use Unicode filenames, compile
    213 //   with
    214 //       #define STBI_WINDOWS_UTF8
    215 //   and pass utf8-encoded filenames. Call stbi_convert_wchar_to_utf8 to convert
    216 //   Windows wchar_t filenames to utf8.
    217 //
    218 // ===========================================================================
    219 //
    220 // Philosophy
    221 //
    222 // stb libraries are designed with the following priorities:
    223 //
    224 //    1. easy to use
    225 //    2. easy to maintain
    226 //    3. good performance
    227 //
    228 // Sometimes I let "good performance" creep up in priority over "easy to maintain",
    229 // and for best performance I may provide less-easy-to-use APIs that give higher
    230 // performance, in addition to the easy-to-use ones. Nevertheless, it's important
    231 // to keep in mind that from the standpoint of you, a client of this library,
    232 // all you care about is #1 and #3, and stb libraries DO NOT emphasize #3 above all.
    233 //
    234 // Some secondary priorities arise directly from the first two, some of which
    235 // provide more explicit reasons why performance can't be emphasized.
    236 //
    237 //    - Portable ("ease of use")
    238 //    - Small source code footprint ("easy to maintain")
    239 //    - No dependencies ("ease of use")
    240 //
    241 // ===========================================================================
    242 //
    243 // I/O callbacks
    244 //
    245 // I/O callbacks allow you to read from arbitrary sources, like packaged
    246 // files or some other source. Data read from callbacks are processed
    247 // through a small internal buffer (currently 128 bytes) to try to reduce
    248 // overhead.
    249 //
    250 // The three functions you must define are "read" (reads some bytes of data),
    251 // "skip" (skips some bytes of data), "eof" (reports if the stream is at the end).
    252 //
    253 // ===========================================================================
    254 //
    255 // SIMD support
    256 //
    257 // The JPEG decoder will try to automatically use SIMD kernels on x86 when
    258 // supported by the compiler. For ARM Neon support, you must explicitly
    259 // request it.
    260 //
    261 // (The old do-it-yourself SIMD API is no longer supported in the current
    262 // code.)
    263 //
    264 // On x86, SSE2 will automatically be used when available based on a run-time
    265 // test; if not, the generic C versions are used as a fall-back. On ARM targets,
    266 // the typical path is to have separate builds for NEON and non-NEON devices
    267 // (at least this is true for iOS and Android). Therefore, the NEON support is
    268 // toggled by a build flag: define STBI_NEON to get NEON loops.
    269 //
    270 // If for some reason you do not want to use any of SIMD code, or if
    271 // you have issues compiling it, you can disable it entirely by
    272 // defining STBI_NO_SIMD.
    273 //
    274 // ===========================================================================
    275 //
    276 // HDR image support   (disable by defining STBI_NO_HDR)
    277 //
    278 // stb_image supports loading HDR images in general, and currently the Radiance
    279 // .HDR file format specifically. You can still load any file through the existing
    280 // interface; if you attempt to load an HDR file, it will be automatically remapped
    281 // to LDR, assuming gamma 2.2 and an arbitrary scale factor defaulting to 1;
    282 // both of these constants can be reconfigured through this interface:
    283 //
    284 //     stbi_hdr_to_ldr_gamma(2.2f);
    285 //     stbi_hdr_to_ldr_scale(1.0f);
    286 //
    287 // (note, do not use _inverse_ constants; stbi_image will invert them
    288 // appropriately).
    289 //
    290 // Additionally, there is a new, parallel interface for loading files as
    291 // (linear) floats to preserve the full dynamic range:
    292 //
    293 //    float *data = stbi_loadf(filename, &x, &y, &n, 0);
    294 //
    295 // If you load LDR images through this interface, those images will
    296 // be promoted to floating point values, run through the inverse of
    297 // constants corresponding to the above:
    298 //
    299 //     stbi_ldr_to_hdr_scale(1.0f);
    300 //     stbi_ldr_to_hdr_gamma(2.2f);
    301 //
    302 // Finally, given a filename (or an open file or memory block--see header
    303 // file for details) containing image data, you can query for the "most
    304 // appropriate" interface to use (that is, whether the image is HDR or
    305 // not), using:
    306 //
    307 //     stbi_is_hdr(char *filename);
    308 //
    309 // ===========================================================================
    310 //
    311 // iPhone PNG support:
    312 //
    313 // We optionally support converting iPhone-formatted PNGs (which store
    314 // premultiplied BGRA) back to RGB, even though they're internally encoded
    315 // differently. To enable this conversion, call
    316 // stbi_convert_iphone_png_to_rgb(1).
    317 //
    318 // Call stbi_set_unpremultiply_on_load(1) as well to force a divide per
    319 // pixel to remove any premultiplied alpha *only* if the image file explicitly
    320 // says there's premultiplied data (currently only happens in iPhone images,
    321 // and only if iPhone convert-to-rgb processing is on).
    322 //
    323 // ===========================================================================
    324 //
    325 // ADDITIONAL CONFIGURATION
    326 //
    327 //  - You can suppress implementation of any of the decoders to reduce
    328 //    your code footprint by #defining one or more of the following
    329 //    symbols before creating the implementation.
    330 //
    331 //        STBI_NO_JPEG
    332 //        STBI_NO_PNG
    333 //        STBI_NO_BMP
    334 //        STBI_NO_PSD
    335 //        STBI_NO_TGA
    336 //        STBI_NO_GIF
    337 //        STBI_NO_HDR
    338 //        STBI_NO_PIC
    339 //        STBI_NO_PNM   (.ppm and .pgm)
    340 //
    341 //  - You can request *only* certain decoders and suppress all other ones
    342 //    (this will be more forward-compatible, as addition of new decoders
    343 //    doesn't require you to disable them explicitly):
    344 //
    345 //        STBI_ONLY_JPEG
    346 //        STBI_ONLY_PNG
    347 //        STBI_ONLY_BMP
    348 //        STBI_ONLY_PSD
    349 //        STBI_ONLY_TGA
    350 //        STBI_ONLY_GIF
    351 //        STBI_ONLY_HDR
    352 //        STBI_ONLY_PIC
    353 //        STBI_ONLY_PNM   (.ppm and .pgm)
    354 //
    355 //   - If you use STBI_NO_PNG (or _ONLY_ without PNG), and you still
    356 //     want the zlib decoder to be available, #define STBI_SUPPORT_ZLIB
    357 //
    358 //  - If you define STBI_MAX_DIMENSIONS, stb_image will reject images greater
    359 //    than that size (in either width or height) without further processing.
    360 //    This is to let programs in the wild set an upper bound to prevent
    361 //    denial-of-service attacks on untrusted data, as one could generate a
    362 //    valid image of gigantic dimensions and force stb_image to allocate a
    363 //    huge block of memory and spend disproportionate time decoding it. By
    364 //    default this is set to (1 << 24), which is 16777216, but that's still
    365 //    very big.
    366 
    367 #ifndef STBI_NO_STDIO
    368 #include <stdio.h>
    369 #endif // STBI_NO_STDIO
    370 
    371 #define STBI_VERSION 1
    372 
    373 enum
    374 {
    375    STBI_default = 0, // only used for desired_channels
    376 
    377    STBI_grey       = 1,
    378    STBI_grey_alpha = 2,
    379    STBI_rgb        = 3,
    380    STBI_rgb_alpha  = 4
    381 };
    382 
    383 #include <stdlib.h>
    384 typedef unsigned char stbi_uc;
    385 typedef unsigned short stbi_us;
    386 
    387 #ifdef __cplusplus
    388 extern "C" {
    389 #endif
    390 
    391 #ifndef STBIDEF
    392 #ifdef STB_IMAGE_STATIC
    393 #define STBIDEF static
    394 #else
    395 #define STBIDEF extern
    396 #endif
    397 #endif
    398 
    399 //////////////////////////////////////////////////////////////////////////////
    400 //
    401 // PRIMARY API - works on images of any type
    402 //
    403 
    404 //
    405 // load image by filename, open file, or memory buffer
    406 //
    407 
    408 typedef struct
    409 {
    410    int      (*read)  (void *user,char *data,int size);   // fill 'data' with 'size' bytes.  return number of bytes actually read
    411    void     (*skip)  (void *user,int n);                 // skip the next 'n' bytes, or 'unget' the last -n bytes if negative
    412    int      (*eof)   (void *user);                       // returns nonzero if we are at end of file/data
    413 } stbi_io_callbacks;
    414 
    415 ////////////////////////////////////
    416 //
    417 // 8-bits-per-channel interface
    418 //
    419 
    420 STBIDEF stbi_uc *stbi_load_from_memory   (stbi_uc           const *buffer, int len   , int *x, int *y, int *channels_in_file, int desired_channels);
    421 STBIDEF stbi_uc *stbi_load_from_callbacks(stbi_io_callbacks const *clbk  , void *user, int *x, int *y, int *channels_in_file, int desired_channels);
    422 
    423 #ifndef STBI_NO_STDIO
    424 STBIDEF stbi_uc *stbi_load            (char const *filename, int *x, int *y, int *channels_in_file, int desired_channels);
    425 STBIDEF stbi_uc *stbi_load_from_file  (FILE *f, int *x, int *y, int *channels_in_file, int desired_channels);
    426 // for stbi_load_from_file, file pointer is left pointing immediately after image
    427 #endif
    428 
    429 #ifndef STBI_NO_GIF
    430 STBIDEF stbi_uc *stbi_load_gif_from_memory(stbi_uc const *buffer, int len, int **delays, int *x, int *y, int *z, int *comp, int req_comp);
    431 #endif
    432 
    433 #ifdef STBI_WINDOWS_UTF8
    434 STBIDEF int stbi_convert_wchar_to_utf8(char *buffer, size_t bufferlen, const wchar_t* input);
    435 #endif
    436 
    437 ////////////////////////////////////
    438 //
    439 // 16-bits-per-channel interface
    440 //
    441 
    442 STBIDEF stbi_us *stbi_load_16_from_memory   (stbi_uc const *buffer, int len, int *x, int *y, int *channels_in_file, int desired_channels);
    443 STBIDEF stbi_us *stbi_load_16_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *channels_in_file, int desired_channels);
    444 
    445 #ifndef STBI_NO_STDIO
    446 STBIDEF stbi_us *stbi_load_16          (char const *filename, int *x, int *y, int *channels_in_file, int desired_channels);
    447 STBIDEF stbi_us *stbi_load_from_file_16(FILE *f, int *x, int *y, int *channels_in_file, int desired_channels);
    448 #endif
    449 
    450 ////////////////////////////////////
    451 //
    452 // float-per-channel interface
    453 //
    454 #ifndef STBI_NO_LINEAR
    455    STBIDEF float *stbi_loadf_from_memory     (stbi_uc const *buffer, int len, int *x, int *y, int *channels_in_file, int desired_channels);
    456    STBIDEF float *stbi_loadf_from_callbacks  (stbi_io_callbacks const *clbk, void *user, int *x, int *y,  int *channels_in_file, int desired_channels);
    457 
    458    #ifndef STBI_NO_STDIO
    459    STBIDEF float *stbi_loadf            (char const *filename, int *x, int *y, int *channels_in_file, int desired_channels);
    460    STBIDEF float *stbi_loadf_from_file  (FILE *f, int *x, int *y, int *channels_in_file, int desired_channels);
    461    #endif
    462 #endif
    463 
    464 #ifndef STBI_NO_HDR
    465    STBIDEF void   stbi_hdr_to_ldr_gamma(float gamma);
    466    STBIDEF void   stbi_hdr_to_ldr_scale(float scale);
    467 #endif // STBI_NO_HDR
    468 
    469 #ifndef STBI_NO_LINEAR
    470    STBIDEF void   stbi_ldr_to_hdr_gamma(float gamma);
    471    STBIDEF void   stbi_ldr_to_hdr_scale(float scale);
    472 #endif // STBI_NO_LINEAR
    473 
    474 // stbi_is_hdr is always defined, but always returns false if STBI_NO_HDR
    475 STBIDEF int    stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user);
    476 STBIDEF int    stbi_is_hdr_from_memory(stbi_uc const *buffer, int len);
    477 #ifndef STBI_NO_STDIO
    478 STBIDEF int      stbi_is_hdr          (char const *filename);
    479 STBIDEF int      stbi_is_hdr_from_file(FILE *f);
    480 #endif // STBI_NO_STDIO
    481 
    482 
    483 // get a VERY brief reason for failure
    484 // on most compilers (and ALL modern mainstream compilers) this is threadsafe
    485 STBIDEF const char *stbi_failure_reason  (void);
    486 
    487 // free the loaded image -- this is just free()
    488 STBIDEF void     stbi_image_free      (void *retval_from_stbi_load);
    489 
    490 // get image dimensions & components without fully decoding
    491 STBIDEF int      stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp);
    492 STBIDEF int      stbi_info_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp);
    493 STBIDEF int      stbi_is_16_bit_from_memory(stbi_uc const *buffer, int len);
    494 STBIDEF int      stbi_is_16_bit_from_callbacks(stbi_io_callbacks const *clbk, void *user);
    495 
    496 #ifndef STBI_NO_STDIO
    497 STBIDEF int      stbi_info               (char const *filename,     int *x, int *y, int *comp);
    498 STBIDEF int      stbi_info_from_file     (FILE *f,                  int *x, int *y, int *comp);
    499 STBIDEF int      stbi_is_16_bit          (char const *filename);
    500 STBIDEF int      stbi_is_16_bit_from_file(FILE *f);
    501 #endif
    502 
    503 
    504 
    505 // for image formats that explicitly notate that they have premultiplied alpha,
    506 // we just return the colors as stored in the file. set this flag to force
    507 // unpremultiplication. results are undefined if the unpremultiply overflow.
    508 STBIDEF void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply);
    509 
    510 // indicate whether we should process iphone images back to canonical format,
    511 // or just pass them through "as-is"
    512 STBIDEF void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert);
    513 
    514 // flip the image vertically, so the first pixel in the output array is the bottom left
    515 STBIDEF void stbi_set_flip_vertically_on_load(int flag_true_if_should_flip);
    516 
    517 // as above, but only applies to images loaded on the thread that calls the function
    518 // this function is only available if your compiler supports thread-local variables;
    519 // calling it will fail to link if your compiler doesn't
    520 STBIDEF void stbi_set_unpremultiply_on_load_thread(int flag_true_if_should_unpremultiply);
    521 STBIDEF void stbi_convert_iphone_png_to_rgb_thread(int flag_true_if_should_convert);
    522 STBIDEF void stbi_set_flip_vertically_on_load_thread(int flag_true_if_should_flip);
    523 
    524 // ZLIB client - used by PNG, available for other purposes
    525 
    526 STBIDEF char *stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size, int *outlen);
    527 STBIDEF char *stbi_zlib_decode_malloc_guesssize_headerflag(const char *buffer, int len, int initial_size, int *outlen, int parse_header);
    528 STBIDEF char *stbi_zlib_decode_malloc(const char *buffer, int len, int *outlen);
    529 STBIDEF int   stbi_zlib_decode_buffer(char *obuffer, int olen, const char *ibuffer, int ilen);
    530 
    531 STBIDEF char *stbi_zlib_decode_noheader_malloc(const char *buffer, int len, int *outlen);
    532 STBIDEF int   stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer, int ilen);
    533 
    534 
    535 #ifdef __cplusplus
    536 }
    537 #endif
    538 
    539 //
    540 //
    541 ////   end header file   /////////////////////////////////////////////////////
    542 #endif // STBI_INCLUDE_STB_IMAGE_H
    543 
    544 #ifdef STB_IMAGE_IMPLEMENTATION
    545 
    546 #if defined(STBI_ONLY_JPEG) || defined(STBI_ONLY_PNG) || defined(STBI_ONLY_BMP) \
    547   || defined(STBI_ONLY_TGA) || defined(STBI_ONLY_GIF) || defined(STBI_ONLY_PSD) \
    548   || defined(STBI_ONLY_HDR) || defined(STBI_ONLY_PIC) || defined(STBI_ONLY_PNM) \
    549   || defined(STBI_ONLY_ZLIB)
    550    #ifndef STBI_ONLY_JPEG
    551    #define STBI_NO_JPEG
    552    #endif
    553    #ifndef STBI_ONLY_PNG
    554    #define STBI_NO_PNG
    555    #endif
    556    #ifndef STBI_ONLY_BMP
    557    #define STBI_NO_BMP
    558    #endif
    559    #ifndef STBI_ONLY_PSD
    560    #define STBI_NO_PSD
    561    #endif
    562    #ifndef STBI_ONLY_TGA
    563    #define STBI_NO_TGA
    564    #endif
    565    #ifndef STBI_ONLY_GIF
    566    #define STBI_NO_GIF
    567    #endif
    568    #ifndef STBI_ONLY_HDR
    569    #define STBI_NO_HDR
    570    #endif
    571    #ifndef STBI_ONLY_PIC
    572    #define STBI_NO_PIC
    573    #endif
    574    #ifndef STBI_ONLY_PNM
    575    #define STBI_NO_PNM
    576    #endif
    577 #endif
    578 
    579 #if defined(STBI_NO_PNG) && !defined(STBI_SUPPORT_ZLIB) && !defined(STBI_NO_ZLIB)
    580 #define STBI_NO_ZLIB
    581 #endif
    582 
    583 
    584 #include <stdarg.h>
    585 #include <stddef.h> // ptrdiff_t on osx
    586 #include <stdlib.h>
    587 #include <string.h>
    588 #include <limits.h>
    589 
    590 #if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR)
    591 #include <math.h>  // ldexp, pow
    592 #endif
    593 
    594 #ifndef STBI_NO_STDIO
    595 #include <stdio.h>
    596 #endif
    597 
    598 #ifndef STBI_ASSERT
    599 #include <assert.h>
    600 #define STBI_ASSERT(x) assert(x)
    601 #endif
    602 
    603 #ifdef __cplusplus
    604 #define STBI_EXTERN extern "C"
    605 #else
    606 #define STBI_EXTERN extern
    607 #endif
    608 
    609 
    610 #ifndef _MSC_VER
    611    #ifdef __cplusplus
    612    #define stbi_inline inline
    613    #else
    614    #define stbi_inline
    615    #endif
    616 #else
    617    #define stbi_inline __forceinline
    618 #endif
    619 
    620 #ifndef STBI_NO_THREAD_LOCALS
    621    #if defined(__cplusplus) &&  __cplusplus >= 201103L
    622       #define STBI_THREAD_LOCAL       thread_local
    623    #elif defined(__GNUC__) && __GNUC__ < 5
    624       #define STBI_THREAD_LOCAL       __thread
    625    #elif defined(_MSC_VER)
    626       #define STBI_THREAD_LOCAL       __declspec(thread)
    627    #elif defined (__STDC_VERSION__) && __STDC_VERSION__ >= 201112L && !defined(__STDC_NO_THREADS__)
    628       #define STBI_THREAD_LOCAL       _Thread_local
    629    #endif
    630 
    631    #ifndef STBI_THREAD_LOCAL
    632       #if defined(__GNUC__)
    633         #define STBI_THREAD_LOCAL       __thread
    634       #endif
    635    #endif
    636 #endif
    637 
    638 #ifdef _MSC_VER
    639 typedef unsigned short stbi__uint16;
    640 typedef   signed short stbi__int16;
    641 typedef unsigned int   stbi__uint32;
    642 typedef   signed int   stbi__int32;
    643 #else
    644 #include <stdint.h>
    645 typedef uint16_t stbi__uint16;
    646 typedef int16_t  stbi__int16;
    647 typedef uint32_t stbi__uint32;
    648 typedef int32_t  stbi__int32;
    649 #endif
    650 
    651 // should produce compiler error if size is wrong
    652 typedef unsigned char validate_uint32[sizeof(stbi__uint32)==4 ? 1 : -1];
    653 
    654 #ifdef _MSC_VER
    655 #define STBI_NOTUSED(v)  (void)(v)
    656 #else
    657 #define STBI_NOTUSED(v)  (void)sizeof(v)
    658 #endif
    659 
    660 #ifdef _MSC_VER
    661 #define STBI_HAS_LROTL
    662 #endif
    663 
    664 #ifdef STBI_HAS_LROTL
    665    #define stbi_lrot(x,y)  _lrotl(x,y)
    666 #else
    667    #define stbi_lrot(x,y)  (((x) << (y)) | ((x) >> (-(y) & 31)))
    668 #endif
    669 
    670 #if defined(STBI_MALLOC) && defined(STBI_FREE) && (defined(STBI_REALLOC) || defined(STBI_REALLOC_SIZED))
    671 // ok
    672 #elif !defined(STBI_MALLOC) && !defined(STBI_FREE) && !defined(STBI_REALLOC) && !defined(STBI_REALLOC_SIZED)
    673 // ok
    674 #else
    675 #error "Must define all or none of STBI_MALLOC, STBI_FREE, and STBI_REALLOC (or STBI_REALLOC_SIZED)."
    676 #endif
    677 
    678 #ifndef STBI_MALLOC
    679 #define STBI_MALLOC(sz)           malloc(sz)
    680 #define STBI_REALLOC(p,newsz)     realloc(p,newsz)
    681 #define STBI_FREE(p)              free(p)
    682 #endif
    683 
    684 #ifndef STBI_REALLOC_SIZED
    685 #define STBI_REALLOC_SIZED(p,oldsz,newsz) STBI_REALLOC(p,newsz)
    686 #endif
    687 
    688 // x86/x64 detection
    689 #if defined(__x86_64__) || defined(_M_X64)
    690 #define STBI__X64_TARGET
    691 #elif defined(__i386) || defined(_M_IX86)
    692 #define STBI__X86_TARGET
    693 #endif
    694 
    695 #if defined(__GNUC__) && defined(STBI__X86_TARGET) && !defined(__SSE2__) && !defined(STBI_NO_SIMD)
    696 // gcc doesn't support sse2 intrinsics unless you compile with -msse2,
    697 // which in turn means it gets to use SSE2 everywhere. This is unfortunate,
    698 // but previous attempts to provide the SSE2 functions with runtime
    699 // detection caused numerous issues. The way architecture extensions are
    700 // exposed in GCC/Clang is, sadly, not really suited for one-file libs.
    701 // New behavior: if compiled with -msse2, we use SSE2 without any
    702 // detection; if not, we don't use it at all.
    703 #define STBI_NO_SIMD
    704 #endif
    705 
    706 #if defined(__MINGW32__) && defined(STBI__X86_TARGET) && !defined(STBI_MINGW_ENABLE_SSE2) && !defined(STBI_NO_SIMD)
    707 // Note that __MINGW32__ doesn't actually mean 32-bit, so we have to avoid STBI__X64_TARGET
    708 //
    709 // 32-bit MinGW wants ESP to be 16-byte aligned, but this is not in the
    710 // Windows ABI and VC++ as well as Windows DLLs don't maintain that invariant.
    711 // As a result, enabling SSE2 on 32-bit MinGW is dangerous when not
    712 // simultaneously enabling "-mstackrealign".
    713 //
    714 // See https://github.com/nothings/stb/issues/81 for more information.
    715 //
    716 // So default to no SSE2 on 32-bit MinGW. If you've read this far and added
    717 // -mstackrealign to your build settings, feel free to #define STBI_MINGW_ENABLE_SSE2.
    718 #define STBI_NO_SIMD
    719 #endif
    720 
    721 #if !defined(STBI_NO_SIMD) && (defined(STBI__X86_TARGET) || defined(STBI__X64_TARGET))
    722 #define STBI_SSE2
    723 #include <emmintrin.h>
    724 
    725 #ifdef _MSC_VER
    726 
    727 #if _MSC_VER >= 1400  // not VC6
    728 #include <intrin.h> // __cpuid
    729 static int stbi__cpuid3(void)
    730 {
    731    int info[4];
    732    __cpuid(info,1);
    733    return info[3];
    734 }
    735 #else
    736 static int stbi__cpuid3(void)
    737 {
    738    int res;
    739    __asm {
    740       mov  eax,1
    741       cpuid
    742       mov  res,edx
    743    }
    744    return res;
    745 }
    746 #endif
    747 
    748 #define STBI_SIMD_ALIGN(type, name) __declspec(align(16)) type name
    749 
    750 #if !defined(STBI_NO_JPEG) && defined(STBI_SSE2)
    751 static int stbi__sse2_available(void)
    752 {
    753    int info3 = stbi__cpuid3();
    754    return ((info3 >> 26) & 1) != 0;
    755 }
    756 #endif
    757 
    758 #else // assume GCC-style if not VC++
    759 #define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16)))
    760 
    761 #if !defined(STBI_NO_JPEG) && defined(STBI_SSE2)
    762 static int stbi__sse2_available(void)
    763 {
    764    // If we're even attempting to compile this on GCC/Clang, that means
    765    // -msse2 is on, which means the compiler is allowed to use SSE2
    766    // instructions at will, and so are we.
    767    return 1;
    768 }
    769 #endif
    770 
    771 #endif
    772 #endif
    773 
    774 // ARM NEON
    775 #if defined(STBI_NO_SIMD) && defined(STBI_NEON)
    776 #undef STBI_NEON
    777 #endif
    778 
    779 #ifdef STBI_NEON
    780 #include <arm_neon.h>
    781 #ifdef _MSC_VER
    782 #define STBI_SIMD_ALIGN(type, name) __declspec(align(16)) type name
    783 #else
    784 #define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16)))
    785 #endif
    786 #endif
    787 
    788 #ifndef STBI_SIMD_ALIGN
    789 #define STBI_SIMD_ALIGN(type, name) type name
    790 #endif
    791 
    792 #ifndef STBI_MAX_DIMENSIONS
    793 #define STBI_MAX_DIMENSIONS (1 << 24)
    794 #endif
    795 
    796 ///////////////////////////////////////////////
    797 //
    798 //  stbi__context struct and start_xxx functions
    799 
    800 // stbi__context structure is our basic context used by all images, so it
    801 // contains all the IO context, plus some basic image information
    802 typedef struct
    803 {
    804    stbi__uint32 img_x, img_y;
    805    int img_n, img_out_n;
    806 
    807    stbi_io_callbacks io;
    808    void *io_user_data;
    809 
    810    int read_from_callbacks;
    811    int buflen;
    812    stbi_uc buffer_start[128];
    813    int callback_already_read;
    814 
    815    stbi_uc *img_buffer, *img_buffer_end;
    816    stbi_uc *img_buffer_original, *img_buffer_original_end;
    817 } stbi__context;
    818 
    819 
    820 static void stbi__refill_buffer(stbi__context *s);
    821 
    822 // initialize a memory-decode context
    823 static void stbi__start_mem(stbi__context *s, stbi_uc const *buffer, int len)
    824 {
    825    s->io.read = NULL;
    826    s->read_from_callbacks = 0;
    827    s->callback_already_read = 0;
    828    s->img_buffer = s->img_buffer_original = (stbi_uc *) buffer;
    829    s->img_buffer_end = s->img_buffer_original_end = (stbi_uc *) buffer+len;
    830 }
    831 
    832 // initialize a callback-based context
    833 static void stbi__start_callbacks(stbi__context *s, stbi_io_callbacks *c, void *user)
    834 {
    835    s->io = *c;
    836    s->io_user_data = user;
    837    s->buflen = sizeof(s->buffer_start);
    838    s->read_from_callbacks = 1;
    839    s->callback_already_read = 0;
    840    s->img_buffer = s->img_buffer_original = s->buffer_start;
    841    stbi__refill_buffer(s);
    842    s->img_buffer_original_end = s->img_buffer_end;
    843 }
    844 
    845 #ifndef STBI_NO_STDIO
    846 
    847 static int stbi__stdio_read(void *user, char *data, int size)
    848 {
    849    return (int) fread(data,1,size,(FILE*) user);
    850 }
    851 
    852 static void stbi__stdio_skip(void *user, int n)
    853 {
    854    int ch;
    855    fseek((FILE*) user, n, SEEK_CUR);
    856    ch = fgetc((FILE*) user);  /* have to read a byte to reset feof()'s flag */
    857    if (ch != EOF) {
    858       ungetc(ch, (FILE *) user);  /* push byte back onto stream if valid. */
    859    }
    860 }
    861 
    862 static int stbi__stdio_eof(void *user)
    863 {
    864    return feof((FILE*) user) || ferror((FILE *) user);
    865 }
    866 
    867 static stbi_io_callbacks stbi__stdio_callbacks =
    868 {
    869    stbi__stdio_read,
    870    stbi__stdio_skip,
    871    stbi__stdio_eof,
    872 };
    873 
    874 static void stbi__start_file(stbi__context *s, FILE *f)
    875 {
    876    stbi__start_callbacks(s, &stbi__stdio_callbacks, (void *) f);
    877 }
    878 
    879 //static void stop_file(stbi__context *s) { }
    880 
    881 #endif // !STBI_NO_STDIO
    882 
    883 static void stbi__rewind(stbi__context *s)
    884 {
    885    // conceptually rewind SHOULD rewind to the beginning of the stream,
    886    // but we just rewind to the beginning of the initial buffer, because
    887    // we only use it after doing 'test', which only ever looks at at most 92 bytes
    888    s->img_buffer = s->img_buffer_original;
    889    s->img_buffer_end = s->img_buffer_original_end;
    890 }
    891 
    892 enum
    893 {
    894    STBI_ORDER_RGB,
    895    STBI_ORDER_BGR
    896 };
    897 
    898 typedef struct
    899 {
    900    int bits_per_channel;
    901    int num_channels;
    902    int channel_order;
    903 } stbi__result_info;
    904 
    905 #ifndef STBI_NO_JPEG
    906 static int      stbi__jpeg_test(stbi__context *s);
    907 static void    *stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
    908 static int      stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp);
    909 #endif
    910 
    911 #ifndef STBI_NO_PNG
    912 static int      stbi__png_test(stbi__context *s);
    913 static void    *stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
    914 static int      stbi__png_info(stbi__context *s, int *x, int *y, int *comp);
    915 static int      stbi__png_is16(stbi__context *s);
    916 #endif
    917 
    918 #ifndef STBI_NO_BMP
    919 static int      stbi__bmp_test(stbi__context *s);
    920 static void    *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
    921 static int      stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp);
    922 #endif
    923 
    924 #ifndef STBI_NO_TGA
    925 static int      stbi__tga_test(stbi__context *s);
    926 static void    *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
    927 static int      stbi__tga_info(stbi__context *s, int *x, int *y, int *comp);
    928 #endif
    929 
    930 #ifndef STBI_NO_PSD
    931 static int      stbi__psd_test(stbi__context *s);
    932 static void    *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri, int bpc);
    933 static int      stbi__psd_info(stbi__context *s, int *x, int *y, int *comp);
    934 static int      stbi__psd_is16(stbi__context *s);
    935 #endif
    936 
    937 #ifndef STBI_NO_HDR
    938 static int      stbi__hdr_test(stbi__context *s);
    939 static float   *stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
    940 static int      stbi__hdr_info(stbi__context *s, int *x, int *y, int *comp);
    941 #endif
    942 
    943 #ifndef STBI_NO_PIC
    944 static int      stbi__pic_test(stbi__context *s);
    945 static void    *stbi__pic_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
    946 static int      stbi__pic_info(stbi__context *s, int *x, int *y, int *comp);
    947 #endif
    948 
    949 #ifndef STBI_NO_GIF
    950 static int      stbi__gif_test(stbi__context *s);
    951 static void    *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
    952 static void    *stbi__load_gif_main(stbi__context *s, int **delays, int *x, int *y, int *z, int *comp, int req_comp);
    953 static int      stbi__gif_info(stbi__context *s, int *x, int *y, int *comp);
    954 #endif
    955 
    956 #ifndef STBI_NO_PNM
    957 static int      stbi__pnm_test(stbi__context *s);
    958 static void    *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
    959 static int      stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp);
    960 static int      stbi__pnm_is16(stbi__context *s);
    961 #endif
    962 
    963 static
    964 #ifdef STBI_THREAD_LOCAL
    965 STBI_THREAD_LOCAL
    966 #endif
    967 const char *stbi__g_failure_reason;
    968 
    969 STBIDEF const char *stbi_failure_reason(void)
    970 {
    971    return stbi__g_failure_reason;
    972 }
    973 
    974 #ifndef STBI_NO_FAILURE_STRINGS
    975 static int stbi__err(const char *str)
    976 {
    977    stbi__g_failure_reason = str;
    978    return 0;
    979 }
    980 #endif
    981 
    982 static void *stbi__malloc(size_t size)
    983 {
    984     return STBI_MALLOC(size);
    985 }
    986 
    987 // stb_image uses ints pervasively, including for offset calculations.
    988 // therefore the largest decoded image size we can support with the
    989 // current code, even on 64-bit targets, is INT_MAX. this is not a
    990 // significant limitation for the intended use case.
    991 //
    992 // we do, however, need to make sure our size calculations don't
    993 // overflow. hence a few helper functions for size calculations that
    994 // multiply integers together, making sure that they're non-negative
    995 // and no overflow occurs.
    996 
    997 // return 1 if the sum is valid, 0 on overflow.
    998 // negative terms are considered invalid.
    999 static int stbi__addsizes_valid(int a, int b)
   1000 {
   1001    if (b < 0) return 0;
   1002    // now 0 <= b <= INT_MAX, hence also
   1003    // 0 <= INT_MAX - b <= INTMAX.
   1004    // And "a + b <= INT_MAX" (which might overflow) is the
   1005    // same as a <= INT_MAX - b (no overflow)
   1006    return a <= INT_MAX - b;
   1007 }
   1008 
   1009 // returns 1 if the product is valid, 0 on overflow.
   1010 // negative factors are considered invalid.
   1011 static int stbi__mul2sizes_valid(int a, int b)
   1012 {
   1013    if (a < 0 || b < 0) return 0;
   1014    if (b == 0) return 1; // mul-by-0 is always safe
   1015    // portable way to check for no overflows in a*b
   1016    return a <= INT_MAX/b;
   1017 }
   1018 
   1019 #if !defined(STBI_NO_JPEG) || !defined(STBI_NO_PNG) || !defined(STBI_NO_TGA) || !defined(STBI_NO_HDR)
   1020 // returns 1 if "a*b + add" has no negative terms/factors and doesn't overflow
   1021 static int stbi__mad2sizes_valid(int a, int b, int add)
   1022 {
   1023    return stbi__mul2sizes_valid(a, b) && stbi__addsizes_valid(a*b, add);
   1024 }
   1025 #endif
   1026 
   1027 // returns 1 if "a*b*c + add" has no negative terms/factors and doesn't overflow
   1028 static int stbi__mad3sizes_valid(int a, int b, int c, int add)
   1029 {
   1030    return stbi__mul2sizes_valid(a, b) && stbi__mul2sizes_valid(a*b, c) &&
   1031       stbi__addsizes_valid(a*b*c, add);
   1032 }
   1033 
   1034 // returns 1 if "a*b*c*d + add" has no negative terms/factors and doesn't overflow
   1035 #if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR) || !defined(STBI_NO_PNM)
   1036 static int stbi__mad4sizes_valid(int a, int b, int c, int d, int add)
   1037 {
   1038    return stbi__mul2sizes_valid(a, b) && stbi__mul2sizes_valid(a*b, c) &&
   1039       stbi__mul2sizes_valid(a*b*c, d) && stbi__addsizes_valid(a*b*c*d, add);
   1040 }
   1041 #endif
   1042 
   1043 #if !defined(STBI_NO_JPEG) || !defined(STBI_NO_PNG) || !defined(STBI_NO_TGA) || !defined(STBI_NO_HDR)
   1044 // mallocs with size overflow checking
   1045 static void *stbi__malloc_mad2(int a, int b, int add)
   1046 {
   1047    if (!stbi__mad2sizes_valid(a, b, add)) return NULL;
   1048    return stbi__malloc(a*b + add);
   1049 }
   1050 #endif
   1051 
   1052 static void *stbi__malloc_mad3(int a, int b, int c, int add)
   1053 {
   1054    if (!stbi__mad3sizes_valid(a, b, c, add)) return NULL;
   1055    return stbi__malloc(a*b*c + add);
   1056 }
   1057 
   1058 #if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR) || !defined(STBI_NO_PNM)
   1059 static void *stbi__malloc_mad4(int a, int b, int c, int d, int add)
   1060 {
   1061    if (!stbi__mad4sizes_valid(a, b, c, d, add)) return NULL;
   1062    return stbi__malloc(a*b*c*d + add);
   1063 }
   1064 #endif
   1065 
   1066 // stbi__err - error
   1067 // stbi__errpf - error returning pointer to float
   1068 // stbi__errpuc - error returning pointer to unsigned char
   1069 
   1070 #ifdef STBI_NO_FAILURE_STRINGS
   1071    #define stbi__err(x,y)  0
   1072 #elif defined(STBI_FAILURE_USERMSG)
   1073    #define stbi__err(x,y)  stbi__err(y)
   1074 #else
   1075    #define stbi__err(x,y)  stbi__err(x)
   1076 #endif
   1077 
   1078 #define stbi__errpf(x,y)   ((float *)(size_t) (stbi__err(x,y)?NULL:NULL))
   1079 #define stbi__errpuc(x,y)  ((unsigned char *)(size_t) (stbi__err(x,y)?NULL:NULL))
   1080 
   1081 STBIDEF void stbi_image_free(void *retval_from_stbi_load)
   1082 {
   1083    STBI_FREE(retval_from_stbi_load);
   1084 }
   1085 
   1086 #ifndef STBI_NO_LINEAR
   1087 static float   *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp);
   1088 #endif
   1089 
   1090 #ifndef STBI_NO_HDR
   1091 static stbi_uc *stbi__hdr_to_ldr(float   *data, int x, int y, int comp);
   1092 #endif
   1093 
   1094 static int stbi__vertically_flip_on_load_global = 0;
   1095 
   1096 STBIDEF void stbi_set_flip_vertically_on_load(int flag_true_if_should_flip)
   1097 {
   1098    stbi__vertically_flip_on_load_global = flag_true_if_should_flip;
   1099 }
   1100 
   1101 #ifndef STBI_THREAD_LOCAL
   1102 #define stbi__vertically_flip_on_load  stbi__vertically_flip_on_load_global
   1103 #else
   1104 static STBI_THREAD_LOCAL int stbi__vertically_flip_on_load_local, stbi__vertically_flip_on_load_set;
   1105 
   1106 STBIDEF void stbi_set_flip_vertically_on_load_thread(int flag_true_if_should_flip)
   1107 {
   1108    stbi__vertically_flip_on_load_local = flag_true_if_should_flip;
   1109    stbi__vertically_flip_on_load_set = 1;
   1110 }
   1111 
   1112 #define stbi__vertically_flip_on_load  (stbi__vertically_flip_on_load_set       \
   1113                                          ? stbi__vertically_flip_on_load_local  \
   1114                                          : stbi__vertically_flip_on_load_global)
   1115 #endif // STBI_THREAD_LOCAL
   1116 
   1117 static void *stbi__load_main(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri, int bpc)
   1118 {
   1119    memset(ri, 0, sizeof(*ri)); // make sure it's initialized if we add new fields
   1120    ri->bits_per_channel = 8; // default is 8 so most paths don't have to be changed
   1121    ri->channel_order = STBI_ORDER_RGB; // all current input & output are this, but this is here so we can add BGR order
   1122    ri->num_channels = 0;
   1123 
   1124    // test the formats with a very explicit header first (at least a FOURCC
   1125    // or distinctive magic number first)
   1126    #ifndef STBI_NO_PNG
   1127    if (stbi__png_test(s))  return stbi__png_load(s,x,y,comp,req_comp, ri);
   1128    #endif
   1129    #ifndef STBI_NO_BMP
   1130    if (stbi__bmp_test(s))  return stbi__bmp_load(s,x,y,comp,req_comp, ri);
   1131    #endif
   1132    #ifndef STBI_NO_GIF
   1133    if (stbi__gif_test(s))  return stbi__gif_load(s,x,y,comp,req_comp, ri);
   1134    #endif
   1135    #ifndef STBI_NO_PSD
   1136    if (stbi__psd_test(s))  return stbi__psd_load(s,x,y,comp,req_comp, ri, bpc);
   1137    #else
   1138    STBI_NOTUSED(bpc);
   1139    #endif
   1140    #ifndef STBI_NO_PIC
   1141    if (stbi__pic_test(s))  return stbi__pic_load(s,x,y,comp,req_comp, ri);
   1142    #endif
   1143 
   1144    // then the formats that can end up attempting to load with just 1 or 2
   1145    // bytes matching expectations; these are prone to false positives, so
   1146    // try them later
   1147    #ifndef STBI_NO_JPEG
   1148    if (stbi__jpeg_test(s)) return stbi__jpeg_load(s,x,y,comp,req_comp, ri);
   1149    #endif
   1150    #ifndef STBI_NO_PNM
   1151    if (stbi__pnm_test(s))  return stbi__pnm_load(s,x,y,comp,req_comp, ri);
   1152    #endif
   1153 
   1154    #ifndef STBI_NO_HDR
   1155    if (stbi__hdr_test(s)) {
   1156       float *hdr = stbi__hdr_load(s, x,y,comp,req_comp, ri);
   1157       return stbi__hdr_to_ldr(hdr, *x, *y, req_comp ? req_comp : *comp);
   1158    }
   1159    #endif
   1160 
   1161    #ifndef STBI_NO_TGA
   1162    // test tga last because it's a crappy test!
   1163    if (stbi__tga_test(s))
   1164       return stbi__tga_load(s,x,y,comp,req_comp, ri);
   1165    #endif
   1166 
   1167    return stbi__errpuc("unknown image type", "Image not of any known type, or corrupt");
   1168 }
   1169 
   1170 static stbi_uc *stbi__convert_16_to_8(stbi__uint16 *orig, int w, int h, int channels)
   1171 {
   1172    int i;
   1173    int img_len = w * h * channels;
   1174    stbi_uc *reduced;
   1175 
   1176    reduced = (stbi_uc *) stbi__malloc(img_len);
   1177    if (reduced == NULL) return stbi__errpuc("outofmem", "Out of memory");
   1178 
   1179    for (i = 0; i < img_len; ++i)
   1180       reduced[i] = (stbi_uc)((orig[i] >> 8) & 0xFF); // top half of each byte is sufficient approx of 16->8 bit scaling
   1181 
   1182    STBI_FREE(orig);
   1183    return reduced;
   1184 }
   1185 
   1186 static stbi__uint16 *stbi__convert_8_to_16(stbi_uc *orig, int w, int h, int channels)
   1187 {
   1188    int i;
   1189    int img_len = w * h * channels;
   1190    stbi__uint16 *enlarged;
   1191 
   1192    enlarged = (stbi__uint16 *) stbi__malloc(img_len*2);
   1193    if (enlarged == NULL) return (stbi__uint16 *) stbi__errpuc("outofmem", "Out of memory");
   1194 
   1195    for (i = 0; i < img_len; ++i)
   1196       enlarged[i] = (stbi__uint16)((orig[i] << 8) + orig[i]); // replicate to high and low byte, maps 0->0, 255->0xffff
   1197 
   1198    STBI_FREE(orig);
   1199    return enlarged;
   1200 }
   1201 
   1202 static void stbi__vertical_flip(void *image, int w, int h, int bytes_per_pixel)
   1203 {
   1204    int row;
   1205    size_t bytes_per_row = (size_t)w * bytes_per_pixel;
   1206    stbi_uc temp[2048];
   1207    stbi_uc *bytes = (stbi_uc *)image;
   1208 
   1209    for (row = 0; row < (h>>1); row++) {
   1210       stbi_uc *row0 = bytes + row*bytes_per_row;
   1211       stbi_uc *row1 = bytes + (h - row - 1)*bytes_per_row;
   1212       // swap row0 with row1
   1213       size_t bytes_left = bytes_per_row;
   1214       while (bytes_left) {
   1215          size_t bytes_copy = (bytes_left < sizeof(temp)) ? bytes_left : sizeof(temp);
   1216          memcpy(temp, row0, bytes_copy);
   1217          memcpy(row0, row1, bytes_copy);
   1218          memcpy(row1, temp, bytes_copy);
   1219          row0 += bytes_copy;
   1220          row1 += bytes_copy;
   1221          bytes_left -= bytes_copy;
   1222       }
   1223    }
   1224 }
   1225 
   1226 #ifndef STBI_NO_GIF
   1227 static void stbi__vertical_flip_slices(void *image, int w, int h, int z, int bytes_per_pixel)
   1228 {
   1229    int slice;
   1230    int slice_size = w * h * bytes_per_pixel;
   1231 
   1232    stbi_uc *bytes = (stbi_uc *)image;
   1233    for (slice = 0; slice < z; ++slice) {
   1234       stbi__vertical_flip(bytes, w, h, bytes_per_pixel);
   1235       bytes += slice_size;
   1236    }
   1237 }
   1238 #endif
   1239 
   1240 static unsigned char *stbi__load_and_postprocess_8bit(stbi__context *s, int *x, int *y, int *comp, int req_comp)
   1241 {
   1242    stbi__result_info ri;
   1243    void *result = stbi__load_main(s, x, y, comp, req_comp, &ri, 8);
   1244 
   1245    if (result == NULL)
   1246       return NULL;
   1247 
   1248    // it is the responsibility of the loaders to make sure we get either 8 or 16 bit.
   1249    STBI_ASSERT(ri.bits_per_channel == 8 || ri.bits_per_channel == 16);
   1250 
   1251    if (ri.bits_per_channel != 8) {
   1252       result = stbi__convert_16_to_8((stbi__uint16 *) result, *x, *y, req_comp == 0 ? *comp : req_comp);
   1253       ri.bits_per_channel = 8;
   1254    }
   1255 
   1256    // @TODO: move stbi__convert_format to here
   1257 
   1258    if (stbi__vertically_flip_on_load) {
   1259       int channels = req_comp ? req_comp : *comp;
   1260       stbi__vertical_flip(result, *x, *y, channels * sizeof(stbi_uc));
   1261    }
   1262 
   1263    return (unsigned char *) result;
   1264 }
   1265 
   1266 static stbi__uint16 *stbi__load_and_postprocess_16bit(stbi__context *s, int *x, int *y, int *comp, int req_comp)
   1267 {
   1268    stbi__result_info ri;
   1269    void *result = stbi__load_main(s, x, y, comp, req_comp, &ri, 16);
   1270 
   1271    if (result == NULL)
   1272       return NULL;
   1273 
   1274    // it is the responsibility of the loaders to make sure we get either 8 or 16 bit.
   1275    STBI_ASSERT(ri.bits_per_channel == 8 || ri.bits_per_channel == 16);
   1276 
   1277    if (ri.bits_per_channel != 16) {
   1278       result = stbi__convert_8_to_16((stbi_uc *) result, *x, *y, req_comp == 0 ? *comp : req_comp);
   1279       ri.bits_per_channel = 16;
   1280    }
   1281 
   1282    // @TODO: move stbi__convert_format16 to here
   1283    // @TODO: special case RGB-to-Y (and RGBA-to-YA) for 8-bit-to-16-bit case to keep more precision
   1284 
   1285    if (stbi__vertically_flip_on_load) {
   1286       int channels = req_comp ? req_comp : *comp;
   1287       stbi__vertical_flip(result, *x, *y, channels * sizeof(stbi__uint16));
   1288    }
   1289 
   1290    return (stbi__uint16 *) result;
   1291 }
   1292 
   1293 #if !defined(STBI_NO_HDR) && !defined(STBI_NO_LINEAR)
   1294 static void stbi__float_postprocess(float *result, int *x, int *y, int *comp, int req_comp)
   1295 {
   1296    if (stbi__vertically_flip_on_load && result != NULL) {
   1297       int channels = req_comp ? req_comp : *comp;
   1298       stbi__vertical_flip(result, *x, *y, channels * sizeof(float));
   1299    }
   1300 }
   1301 #endif
   1302 
   1303 #ifndef STBI_NO_STDIO
   1304 
   1305 #if defined(_WIN32) && defined(STBI_WINDOWS_UTF8)
   1306 STBI_EXTERN __declspec(dllimport) int __stdcall MultiByteToWideChar(unsigned int cp, unsigned long flags, const char *str, int cbmb, wchar_t *widestr, int cchwide);
   1307 STBI_EXTERN __declspec(dllimport) int __stdcall WideCharToMultiByte(unsigned int cp, unsigned long flags, const wchar_t *widestr, int cchwide, char *str, int cbmb, const char *defchar, int *used_default);
   1308 #endif
   1309 
   1310 #if defined(_WIN32) && defined(STBI_WINDOWS_UTF8)
   1311 STBIDEF int stbi_convert_wchar_to_utf8(char *buffer, size_t bufferlen, const wchar_t* input)
   1312 {
   1313 	return WideCharToMultiByte(65001 /* UTF8 */, 0, input, -1, buffer, (int) bufferlen, NULL, NULL);
   1314 }
   1315 #endif
   1316 
   1317 static FILE *stbi__fopen(char const *filename, char const *mode)
   1318 {
   1319    FILE *f;
   1320 #if defined(_WIN32) && defined(STBI_WINDOWS_UTF8)
   1321    wchar_t wMode[64];
   1322    wchar_t wFilename[1024];
   1323 	if (0 == MultiByteToWideChar(65001 /* UTF8 */, 0, filename, -1, wFilename, sizeof(wFilename)/sizeof(*wFilename)))
   1324       return 0;
   1325 
   1326 	if (0 == MultiByteToWideChar(65001 /* UTF8 */, 0, mode, -1, wMode, sizeof(wMode)/sizeof(*wMode)))
   1327       return 0;
   1328 
   1329 #if defined(_MSC_VER) && _MSC_VER >= 1400
   1330 	if (0 != _wfopen_s(&f, wFilename, wMode))
   1331 		f = 0;
   1332 #else
   1333    f = _wfopen(wFilename, wMode);
   1334 #endif
   1335 
   1336 #elif defined(_MSC_VER) && _MSC_VER >= 1400
   1337    if (0 != fopen_s(&f, filename, mode))
   1338       f=0;
   1339 #else
   1340    f = fopen(filename, mode);
   1341 #endif
   1342    return f;
   1343 }
   1344 
   1345 
   1346 STBIDEF stbi_uc *stbi_load(char const *filename, int *x, int *y, int *comp, int req_comp)
   1347 {
   1348    FILE *f = stbi__fopen(filename, "rb");
   1349    unsigned char *result;
   1350    if (!f) return stbi__errpuc("can't fopen", "Unable to open file");
   1351    result = stbi_load_from_file(f,x,y,comp,req_comp);
   1352    fclose(f);
   1353    return result;
   1354 }
   1355 
   1356 STBIDEF stbi_uc *stbi_load_from_file(FILE *f, int *x, int *y, int *comp, int req_comp)
   1357 {
   1358    unsigned char *result;
   1359    stbi__context s;
   1360    stbi__start_file(&s,f);
   1361    result = stbi__load_and_postprocess_8bit(&s,x,y,comp,req_comp);
   1362    if (result) {
   1363       // need to 'unget' all the characters in the IO buffer
   1364       fseek(f, - (int) (s.img_buffer_end - s.img_buffer), SEEK_CUR);
   1365    }
   1366    return result;
   1367 }
   1368 
   1369 STBIDEF stbi__uint16 *stbi_load_from_file_16(FILE *f, int *x, int *y, int *comp, int req_comp)
   1370 {
   1371    stbi__uint16 *result;
   1372    stbi__context s;
   1373    stbi__start_file(&s,f);
   1374    result = stbi__load_and_postprocess_16bit(&s,x,y,comp,req_comp);
   1375    if (result) {
   1376       // need to 'unget' all the characters in the IO buffer
   1377       fseek(f, - (int) (s.img_buffer_end - s.img_buffer), SEEK_CUR);
   1378    }
   1379    return result;
   1380 }
   1381 
   1382 STBIDEF stbi_us *stbi_load_16(char const *filename, int *x, int *y, int *comp, int req_comp)
   1383 {
   1384    FILE *f = stbi__fopen(filename, "rb");
   1385    stbi__uint16 *result;
   1386    if (!f) return (stbi_us *) stbi__errpuc("can't fopen", "Unable to open file");
   1387    result = stbi_load_from_file_16(f,x,y,comp,req_comp);
   1388    fclose(f);
   1389    return result;
   1390 }
   1391 
   1392 
   1393 #endif //!STBI_NO_STDIO
   1394 
   1395 STBIDEF stbi_us *stbi_load_16_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *channels_in_file, int desired_channels)
   1396 {
   1397    stbi__context s;
   1398    stbi__start_mem(&s,buffer,len);
   1399    return stbi__load_and_postprocess_16bit(&s,x,y,channels_in_file,desired_channels);
   1400 }
   1401 
   1402 STBIDEF stbi_us *stbi_load_16_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *channels_in_file, int desired_channels)
   1403 {
   1404    stbi__context s;
   1405    stbi__start_callbacks(&s, (stbi_io_callbacks *)clbk, user);
   1406    return stbi__load_and_postprocess_16bit(&s,x,y,channels_in_file,desired_channels);
   1407 }
   1408 
   1409 STBIDEF stbi_uc *stbi_load_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp)
   1410 {
   1411    stbi__context s;
   1412    stbi__start_mem(&s,buffer,len);
   1413    return stbi__load_and_postprocess_8bit(&s,x,y,comp,req_comp);
   1414 }
   1415 
   1416 STBIDEF stbi_uc *stbi_load_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp)
   1417 {
   1418    stbi__context s;
   1419    stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
   1420    return stbi__load_and_postprocess_8bit(&s,x,y,comp,req_comp);
   1421 }
   1422 
   1423 #ifndef STBI_NO_GIF
   1424 STBIDEF stbi_uc *stbi_load_gif_from_memory(stbi_uc const *buffer, int len, int **delays, int *x, int *y, int *z, int *comp, int req_comp)
   1425 {
   1426    unsigned char *result;
   1427    stbi__context s;
   1428    stbi__start_mem(&s,buffer,len);
   1429 
   1430    result = (unsigned char*) stbi__load_gif_main(&s, delays, x, y, z, comp, req_comp);
   1431    if (stbi__vertically_flip_on_load) {
   1432       stbi__vertical_flip_slices( result, *x, *y, *z, *comp );
   1433    }
   1434 
   1435    return result;
   1436 }
   1437 #endif
   1438 
   1439 #ifndef STBI_NO_LINEAR
   1440 static float *stbi__loadf_main(stbi__context *s, int *x, int *y, int *comp, int req_comp)
   1441 {
   1442    unsigned char *data;
   1443    #ifndef STBI_NO_HDR
   1444    if (stbi__hdr_test(s)) {
   1445       stbi__result_info ri;
   1446       float *hdr_data = stbi__hdr_load(s,x,y,comp,req_comp, &ri);
   1447       if (hdr_data)
   1448          stbi__float_postprocess(hdr_data,x,y,comp,req_comp);
   1449       return hdr_data;
   1450    }
   1451    #endif
   1452    data = stbi__load_and_postprocess_8bit(s, x, y, comp, req_comp);
   1453    if (data)
   1454       return stbi__ldr_to_hdr(data, *x, *y, req_comp ? req_comp : *comp);
   1455    return stbi__errpf("unknown image type", "Image not of any known type, or corrupt");
   1456 }
   1457 
   1458 STBIDEF float *stbi_loadf_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp)
   1459 {
   1460    stbi__context s;
   1461    stbi__start_mem(&s,buffer,len);
   1462    return stbi__loadf_main(&s,x,y,comp,req_comp);
   1463 }
   1464 
   1465 STBIDEF float *stbi_loadf_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp)
   1466 {
   1467    stbi__context s;
   1468    stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
   1469    return stbi__loadf_main(&s,x,y,comp,req_comp);
   1470 }
   1471 
   1472 #ifndef STBI_NO_STDIO
   1473 STBIDEF float *stbi_loadf(char const *filename, int *x, int *y, int *comp, int req_comp)
   1474 {
   1475    float *result;
   1476    FILE *f = stbi__fopen(filename, "rb");
   1477    if (!f) return stbi__errpf("can't fopen", "Unable to open file");
   1478    result = stbi_loadf_from_file(f,x,y,comp,req_comp);
   1479    fclose(f);
   1480    return result;
   1481 }
   1482 
   1483 STBIDEF float *stbi_loadf_from_file(FILE *f, int *x, int *y, int *comp, int req_comp)
   1484 {
   1485    stbi__context s;
   1486    stbi__start_file(&s,f);
   1487    return stbi__loadf_main(&s,x,y,comp,req_comp);
   1488 }
   1489 #endif // !STBI_NO_STDIO
   1490 
   1491 #endif // !STBI_NO_LINEAR
   1492 
   1493 // these is-hdr-or-not is defined independent of whether STBI_NO_LINEAR is
   1494 // defined, for API simplicity; if STBI_NO_LINEAR is defined, it always
   1495 // reports false!
   1496 
   1497 STBIDEF int stbi_is_hdr_from_memory(stbi_uc const *buffer, int len)
   1498 {
   1499    #ifndef STBI_NO_HDR
   1500    stbi__context s;
   1501    stbi__start_mem(&s,buffer,len);
   1502    return stbi__hdr_test(&s);
   1503    #else
   1504    STBI_NOTUSED(buffer);
   1505    STBI_NOTUSED(len);
   1506    return 0;
   1507    #endif
   1508 }
   1509 
   1510 #ifndef STBI_NO_STDIO
   1511 STBIDEF int      stbi_is_hdr          (char const *filename)
   1512 {
   1513    FILE *f = stbi__fopen(filename, "rb");
   1514    int result=0;
   1515    if (f) {
   1516       result = stbi_is_hdr_from_file(f);
   1517       fclose(f);
   1518    }
   1519    return result;
   1520 }
   1521 
   1522 STBIDEF int stbi_is_hdr_from_file(FILE *f)
   1523 {
   1524    #ifndef STBI_NO_HDR
   1525    long pos = ftell(f);
   1526    int res;
   1527    stbi__context s;
   1528    stbi__start_file(&s,f);
   1529    res = stbi__hdr_test(&s);
   1530    fseek(f, pos, SEEK_SET);
   1531    return res;
   1532    #else
   1533    STBI_NOTUSED(f);
   1534    return 0;
   1535    #endif
   1536 }
   1537 #endif // !STBI_NO_STDIO
   1538 
   1539 STBIDEF int      stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user)
   1540 {
   1541    #ifndef STBI_NO_HDR
   1542    stbi__context s;
   1543    stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
   1544    return stbi__hdr_test(&s);
   1545    #else
   1546    STBI_NOTUSED(clbk);
   1547    STBI_NOTUSED(user);
   1548    return 0;
   1549    #endif
   1550 }
   1551 
   1552 #ifndef STBI_NO_LINEAR
   1553 static float stbi__l2h_gamma=2.2f, stbi__l2h_scale=1.0f;
   1554 
   1555 STBIDEF void   stbi_ldr_to_hdr_gamma(float gamma) { stbi__l2h_gamma = gamma; }
   1556 STBIDEF void   stbi_ldr_to_hdr_scale(float scale) { stbi__l2h_scale = scale; }
   1557 #endif
   1558 
   1559 static float stbi__h2l_gamma_i=1.0f/2.2f, stbi__h2l_scale_i=1.0f;
   1560 
   1561 STBIDEF void   stbi_hdr_to_ldr_gamma(float gamma) { stbi__h2l_gamma_i = 1/gamma; }
   1562 STBIDEF void   stbi_hdr_to_ldr_scale(float scale) { stbi__h2l_scale_i = 1/scale; }
   1563 
   1564 
   1565 //////////////////////////////////////////////////////////////////////////////
   1566 //
   1567 // Common code used by all image loaders
   1568 //
   1569 
   1570 enum
   1571 {
   1572    STBI__SCAN_load=0,
   1573    STBI__SCAN_type,
   1574    STBI__SCAN_header
   1575 };
   1576 
   1577 static void stbi__refill_buffer(stbi__context *s)
   1578 {
   1579    int n = (s->io.read)(s->io_user_data,(char*)s->buffer_start,s->buflen);
   1580    s->callback_already_read += (int) (s->img_buffer - s->img_buffer_original);
   1581    if (n == 0) {
   1582       // at end of file, treat same as if from memory, but need to handle case
   1583       // where s->img_buffer isn't pointing to safe memory, e.g. 0-byte file
   1584       s->read_from_callbacks = 0;
   1585       s->img_buffer = s->buffer_start;
   1586       s->img_buffer_end = s->buffer_start+1;
   1587       *s->img_buffer = 0;
   1588    } else {
   1589       s->img_buffer = s->buffer_start;
   1590       s->img_buffer_end = s->buffer_start + n;
   1591    }
   1592 }
   1593 
   1594 stbi_inline static stbi_uc stbi__get8(stbi__context *s)
   1595 {
   1596    if (s->img_buffer < s->img_buffer_end)
   1597       return *s->img_buffer++;
   1598    if (s->read_from_callbacks) {
   1599       stbi__refill_buffer(s);
   1600       return *s->img_buffer++;
   1601    }
   1602    return 0;
   1603 }
   1604 
   1605 #if defined(STBI_NO_JPEG) && defined(STBI_NO_HDR) && defined(STBI_NO_PIC) && defined(STBI_NO_PNM)
   1606 // nothing
   1607 #else
   1608 stbi_inline static int stbi__at_eof(stbi__context *s)
   1609 {
   1610    if (s->io.read) {
   1611       if (!(s->io.eof)(s->io_user_data)) return 0;
   1612       // if feof() is true, check if buffer = end
   1613       // special case: we've only got the special 0 character at the end
   1614       if (s->read_from_callbacks == 0) return 1;
   1615    }
   1616 
   1617    return s->img_buffer >= s->img_buffer_end;
   1618 }
   1619 #endif
   1620 
   1621 #if defined(STBI_NO_JPEG) && defined(STBI_NO_PNG) && defined(STBI_NO_BMP) && defined(STBI_NO_PSD) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF) && defined(STBI_NO_PIC)
   1622 // nothing
   1623 #else
   1624 static void stbi__skip(stbi__context *s, int n)
   1625 {
   1626    if (n == 0) return;  // already there!
   1627    if (n < 0) {
   1628       s->img_buffer = s->img_buffer_end;
   1629       return;
   1630    }
   1631    if (s->io.read) {
   1632       int blen = (int) (s->img_buffer_end - s->img_buffer);
   1633       if (blen < n) {
   1634          s->img_buffer = s->img_buffer_end;
   1635          (s->io.skip)(s->io_user_data, n - blen);
   1636          return;
   1637       }
   1638    }
   1639    s->img_buffer += n;
   1640 }
   1641 #endif
   1642 
   1643 #if defined(STBI_NO_PNG) && defined(STBI_NO_TGA) && defined(STBI_NO_HDR) && defined(STBI_NO_PNM)
   1644 // nothing
   1645 #else
   1646 static int stbi__getn(stbi__context *s, stbi_uc *buffer, int n)
   1647 {
   1648    if (s->io.read) {
   1649       int blen = (int) (s->img_buffer_end - s->img_buffer);
   1650       if (blen < n) {
   1651          int res, count;
   1652 
   1653          memcpy(buffer, s->img_buffer, blen);
   1654 
   1655          count = (s->io.read)(s->io_user_data, (char*) buffer + blen, n - blen);
   1656          res = (count == (n-blen));
   1657          s->img_buffer = s->img_buffer_end;
   1658          return res;
   1659       }
   1660    }
   1661 
   1662    if (s->img_buffer+n <= s->img_buffer_end) {
   1663       memcpy(buffer, s->img_buffer, n);
   1664       s->img_buffer += n;
   1665       return 1;
   1666    } else
   1667       return 0;
   1668 }
   1669 #endif
   1670 
   1671 #if defined(STBI_NO_JPEG) && defined(STBI_NO_PNG) && defined(STBI_NO_PSD) && defined(STBI_NO_PIC)
   1672 // nothing
   1673 #else
   1674 static int stbi__get16be(stbi__context *s)
   1675 {
   1676    int z = stbi__get8(s);
   1677    return (z << 8) + stbi__get8(s);
   1678 }
   1679 #endif
   1680 
   1681 #if defined(STBI_NO_PNG) && defined(STBI_NO_PSD) && defined(STBI_NO_PIC)
   1682 // nothing
   1683 #else
   1684 static stbi__uint32 stbi__get32be(stbi__context *s)
   1685 {
   1686    stbi__uint32 z = stbi__get16be(s);
   1687    return (z << 16) + stbi__get16be(s);
   1688 }
   1689 #endif
   1690 
   1691 #if defined(STBI_NO_BMP) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF)
   1692 // nothing
   1693 #else
   1694 static int stbi__get16le(stbi__context *s)
   1695 {
   1696    int z = stbi__get8(s);
   1697    return z + (stbi__get8(s) << 8);
   1698 }
   1699 #endif
   1700 
   1701 #ifndef STBI_NO_BMP
   1702 static stbi__uint32 stbi__get32le(stbi__context *s)
   1703 {
   1704    stbi__uint32 z = stbi__get16le(s);
   1705    z += (stbi__uint32)stbi__get16le(s) << 16;
   1706    return z;
   1707 }
   1708 #endif
   1709 
   1710 #define STBI__BYTECAST(x)  ((stbi_uc) ((x) & 255))  // truncate int to byte without warnings
   1711 
   1712 #if defined(STBI_NO_JPEG) && defined(STBI_NO_PNG) && defined(STBI_NO_BMP) && defined(STBI_NO_PSD) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF) && defined(STBI_NO_PIC) && defined(STBI_NO_PNM)
   1713 // nothing
   1714 #else
   1715 //////////////////////////////////////////////////////////////////////////////
   1716 //
   1717 //  generic converter from built-in img_n to req_comp
   1718 //    individual types do this automatically as much as possible (e.g. jpeg
   1719 //    does all cases internally since it needs to colorspace convert anyway,
   1720 //    and it never has alpha, so very few cases ). png can automatically
   1721 //    interleave an alpha=255 channel, but falls back to this for other cases
   1722 //
   1723 //  assume data buffer is malloced, so malloc a new one and free that one
   1724 //  only failure mode is malloc failing
   1725 
   1726 static stbi_uc stbi__compute_y(int r, int g, int b)
   1727 {
   1728    return (stbi_uc) (((r*77) + (g*150) +  (29*b)) >> 8);
   1729 }
   1730 #endif
   1731 
   1732 #if defined(STBI_NO_PNG) && defined(STBI_NO_BMP) && defined(STBI_NO_PSD) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF) && defined(STBI_NO_PIC) && defined(STBI_NO_PNM)
   1733 // nothing
   1734 #else
   1735 static unsigned char *stbi__convert_format(unsigned char *data, int img_n, int req_comp, unsigned int x, unsigned int y)
   1736 {
   1737    int i,j;
   1738    unsigned char *good;
   1739 
   1740    if (req_comp == img_n) return data;
   1741    STBI_ASSERT(req_comp >= 1 && req_comp <= 4);
   1742 
   1743    good = (unsigned char *) stbi__malloc_mad3(req_comp, x, y, 0);
   1744    if (good == NULL) {
   1745       STBI_FREE(data);
   1746       return stbi__errpuc("outofmem", "Out of memory");
   1747    }
   1748 
   1749    for (j=0; j < (int) y; ++j) {
   1750       unsigned char *src  = data + j * x * img_n   ;
   1751       unsigned char *dest = good + j * x * req_comp;
   1752 
   1753       #define STBI__COMBO(a,b)  ((a)*8+(b))
   1754       #define STBI__CASE(a,b)   case STBI__COMBO(a,b): for(i=x-1; i >= 0; --i, src += a, dest += b)
   1755       // convert source image with img_n components to one with req_comp components;
   1756       // avoid switch per pixel, so use switch per scanline and massive macros
   1757       switch (STBI__COMBO(img_n, req_comp)) {
   1758          STBI__CASE(1,2) { dest[0]=src[0]; dest[1]=255;                                     } break;
   1759          STBI__CASE(1,3) { dest[0]=dest[1]=dest[2]=src[0];                                  } break;
   1760          STBI__CASE(1,4) { dest[0]=dest[1]=dest[2]=src[0]; dest[3]=255;                     } break;
   1761          STBI__CASE(2,1) { dest[0]=src[0];                                                  } break;
   1762          STBI__CASE(2,3) { dest[0]=dest[1]=dest[2]=src[0];                                  } break;
   1763          STBI__CASE(2,4) { dest[0]=dest[1]=dest[2]=src[0]; dest[3]=src[1];                  } break;
   1764          STBI__CASE(3,4) { dest[0]=src[0];dest[1]=src[1];dest[2]=src[2];dest[3]=255;        } break;
   1765          STBI__CASE(3,1) { dest[0]=stbi__compute_y(src[0],src[1],src[2]);                   } break;
   1766          STBI__CASE(3,2) { dest[0]=stbi__compute_y(src[0],src[1],src[2]); dest[1] = 255;    } break;
   1767          STBI__CASE(4,1) { dest[0]=stbi__compute_y(src[0],src[1],src[2]);                   } break;
   1768          STBI__CASE(4,2) { dest[0]=stbi__compute_y(src[0],src[1],src[2]); dest[1] = src[3]; } break;
   1769          STBI__CASE(4,3) { dest[0]=src[0];dest[1]=src[1];dest[2]=src[2];                    } break;
   1770          default: STBI_ASSERT(0); STBI_FREE(data); STBI_FREE(good); return stbi__errpuc("unsupported", "Unsupported format conversion");
   1771       }
   1772       #undef STBI__CASE
   1773    }
   1774 
   1775    STBI_FREE(data);
   1776    return good;
   1777 }
   1778 #endif
   1779 
   1780 #if defined(STBI_NO_PNG) && defined(STBI_NO_PSD)
   1781 // nothing
   1782 #else
   1783 static stbi__uint16 stbi__compute_y_16(int r, int g, int b)
   1784 {
   1785    return (stbi__uint16) (((r*77) + (g*150) +  (29*b)) >> 8);
   1786 }
   1787 #endif
   1788 
   1789 #if defined(STBI_NO_PNG) && defined(STBI_NO_PSD)
   1790 // nothing
   1791 #else
   1792 static stbi__uint16 *stbi__convert_format16(stbi__uint16 *data, int img_n, int req_comp, unsigned int x, unsigned int y)
   1793 {
   1794    int i,j;
   1795    stbi__uint16 *good;
   1796 
   1797    if (req_comp == img_n) return data;
   1798    STBI_ASSERT(req_comp >= 1 && req_comp <= 4);
   1799 
   1800    good = (stbi__uint16 *) stbi__malloc(req_comp * x * y * 2);
   1801    if (good == NULL) {
   1802       STBI_FREE(data);
   1803       return (stbi__uint16 *) stbi__errpuc("outofmem", "Out of memory");
   1804    }
   1805 
   1806    for (j=0; j < (int) y; ++j) {
   1807       stbi__uint16 *src  = data + j * x * img_n   ;
   1808       stbi__uint16 *dest = good + j * x * req_comp;
   1809 
   1810       #define STBI__COMBO(a,b)  ((a)*8+(b))
   1811       #define STBI__CASE(a,b)   case STBI__COMBO(a,b): for(i=x-1; i >= 0; --i, src += a, dest += b)
   1812       // convert source image with img_n components to one with req_comp components;
   1813       // avoid switch per pixel, so use switch per scanline and massive macros
   1814       switch (STBI__COMBO(img_n, req_comp)) {
   1815          STBI__CASE(1,2) { dest[0]=src[0]; dest[1]=0xffff;                                     } break;
   1816          STBI__CASE(1,3) { dest[0]=dest[1]=dest[2]=src[0];                                     } break;
   1817          STBI__CASE(1,4) { dest[0]=dest[1]=dest[2]=src[0]; dest[3]=0xffff;                     } break;
   1818          STBI__CASE(2,1) { dest[0]=src[0];                                                     } break;
   1819          STBI__CASE(2,3) { dest[0]=dest[1]=dest[2]=src[0];                                     } break;
   1820          STBI__CASE(2,4) { dest[0]=dest[1]=dest[2]=src[0]; dest[3]=src[1];                     } break;
   1821          STBI__CASE(3,4) { dest[0]=src[0];dest[1]=src[1];dest[2]=src[2];dest[3]=0xffff;        } break;
   1822          STBI__CASE(3,1) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]);                   } break;
   1823          STBI__CASE(3,2) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]); dest[1] = 0xffff; } break;
   1824          STBI__CASE(4,1) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]);                   } break;
   1825          STBI__CASE(4,2) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]); dest[1] = src[3]; } break;
   1826          STBI__CASE(4,3) { dest[0]=src[0];dest[1]=src[1];dest[2]=src[2];                       } break;
   1827          default: STBI_ASSERT(0); STBI_FREE(data); STBI_FREE(good); return (stbi__uint16*) stbi__errpuc("unsupported", "Unsupported format conversion");
   1828       }
   1829       #undef STBI__CASE
   1830    }
   1831 
   1832    STBI_FREE(data);
   1833    return good;
   1834 }
   1835 #endif
   1836 
   1837 #ifndef STBI_NO_LINEAR
   1838 static float   *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp)
   1839 {
   1840    int i,k,n;
   1841    float *output;
   1842    if (!data) return NULL;
   1843    output = (float *) stbi__malloc_mad4(x, y, comp, sizeof(float), 0);
   1844    if (output == NULL) { STBI_FREE(data); return stbi__errpf("outofmem", "Out of memory"); }
   1845    // compute number of non-alpha components
   1846    if (comp & 1) n = comp; else n = comp-1;
   1847    for (i=0; i < x*y; ++i) {
   1848       for (k=0; k < n; ++k) {
   1849          output[i*comp + k] = (float) (pow(data[i*comp+k]/255.0f, stbi__l2h_gamma) * stbi__l2h_scale);
   1850       }
   1851    }
   1852    if (n < comp) {
   1853       for (i=0; i < x*y; ++i) {
   1854          output[i*comp + n] = data[i*comp + n]/255.0f;
   1855       }
   1856    }
   1857    STBI_FREE(data);
   1858    return output;
   1859 }
   1860 #endif
   1861 
   1862 #ifndef STBI_NO_HDR
   1863 #define stbi__float2int(x)   ((int) (x))
   1864 static stbi_uc *stbi__hdr_to_ldr(float   *data, int x, int y, int comp)
   1865 {
   1866    int i,k,n;
   1867    stbi_uc *output;
   1868    if (!data) return NULL;
   1869    output = (stbi_uc *) stbi__malloc_mad3(x, y, comp, 0);
   1870    if (output == NULL) { STBI_FREE(data); return stbi__errpuc("outofmem", "Out of memory"); }
   1871    // compute number of non-alpha components
   1872    if (comp & 1) n = comp; else n = comp-1;
   1873    for (i=0; i < x*y; ++i) {
   1874       for (k=0; k < n; ++k) {
   1875          float z = (float) pow(data[i*comp+k]*stbi__h2l_scale_i, stbi__h2l_gamma_i) * 255 + 0.5f;
   1876          if (z < 0) z = 0;
   1877          if (z > 255) z = 255;
   1878          output[i*comp + k] = (stbi_uc) stbi__float2int(z);
   1879       }
   1880       if (k < comp) {
   1881          float z = data[i*comp+k] * 255 + 0.5f;
   1882          if (z < 0) z = 0;
   1883          if (z > 255) z = 255;
   1884          output[i*comp + k] = (stbi_uc) stbi__float2int(z);
   1885       }
   1886    }
   1887    STBI_FREE(data);
   1888    return output;
   1889 }
   1890 #endif
   1891 
   1892 //////////////////////////////////////////////////////////////////////////////
   1893 //
   1894 //  "baseline" JPEG/JFIF decoder
   1895 //
   1896 //    simple implementation
   1897 //      - doesn't support delayed output of y-dimension
   1898 //      - simple interface (only one output format: 8-bit interleaved RGB)
   1899 //      - doesn't try to recover corrupt jpegs
   1900 //      - doesn't allow partial loading, loading multiple at once
   1901 //      - still fast on x86 (copying globals into locals doesn't help x86)
   1902 //      - allocates lots of intermediate memory (full size of all components)
   1903 //        - non-interleaved case requires this anyway
   1904 //        - allows good upsampling (see next)
   1905 //    high-quality
   1906 //      - upsampled channels are bilinearly interpolated, even across blocks
   1907 //      - quality integer IDCT derived from IJG's 'slow'
   1908 //    performance
   1909 //      - fast huffman; reasonable integer IDCT
   1910 //      - some SIMD kernels for common paths on targets with SSE2/NEON
   1911 //      - uses a lot of intermediate memory, could cache poorly
   1912 
   1913 #ifndef STBI_NO_JPEG
   1914 
   1915 // huffman decoding acceleration
   1916 #define FAST_BITS   9  // larger handles more cases; smaller stomps less cache
   1917 
   1918 typedef struct
   1919 {
   1920    stbi_uc  fast[1 << FAST_BITS];
   1921    // weirdly, repacking this into AoS is a 10% speed loss, instead of a win
   1922    stbi__uint16 code[256];
   1923    stbi_uc  values[256];
   1924    stbi_uc  size[257];
   1925    unsigned int maxcode[18];
   1926    int    delta[17];   // old 'firstsymbol' - old 'firstcode'
   1927 } stbi__huffman;
   1928 
   1929 typedef struct
   1930 {
   1931    stbi__context *s;
   1932    stbi__huffman huff_dc[4];
   1933    stbi__huffman huff_ac[4];
   1934    stbi__uint16 dequant[4][64];
   1935    stbi__int16 fast_ac[4][1 << FAST_BITS];
   1936 
   1937 // sizes for components, interleaved MCUs
   1938    int img_h_max, img_v_max;
   1939    int img_mcu_x, img_mcu_y;
   1940    int img_mcu_w, img_mcu_h;
   1941 
   1942 // definition of jpeg image component
   1943    struct
   1944    {
   1945       int id;
   1946       int h,v;
   1947       int tq;
   1948       int hd,ha;
   1949       int dc_pred;
   1950 
   1951       int x,y,w2,h2;
   1952       stbi_uc *data;
   1953       void *raw_data, *raw_coeff;
   1954       stbi_uc *linebuf;
   1955       short   *coeff;   // progressive only
   1956       int      coeff_w, coeff_h; // number of 8x8 coefficient blocks
   1957    } img_comp[4];
   1958 
   1959    stbi__uint32   code_buffer; // jpeg entropy-coded buffer
   1960    int            code_bits;   // number of valid bits
   1961    unsigned char  marker;      // marker seen while filling entropy buffer
   1962    int            nomore;      // flag if we saw a marker so must stop
   1963 
   1964    int            progressive;
   1965    int            spec_start;
   1966    int            spec_end;
   1967    int            succ_high;
   1968    int            succ_low;
   1969    int            eob_run;
   1970    int            jfif;
   1971    int            app14_color_transform; // Adobe APP14 tag
   1972    int            rgb;
   1973 
   1974    int scan_n, order[4];
   1975    int restart_interval, todo;
   1976 
   1977 // kernels
   1978    void (*idct_block_kernel)(stbi_uc *out, int out_stride, short data[64]);
   1979    void (*YCbCr_to_RGB_kernel)(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step);
   1980    stbi_uc *(*resample_row_hv_2_kernel)(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs);
   1981 } stbi__jpeg;
   1982 
   1983 static int stbi__build_huffman(stbi__huffman *h, int *count)
   1984 {
   1985    int i,j,k=0;
   1986    unsigned int code;
   1987    // build size list for each symbol (from JPEG spec)
   1988    for (i=0; i < 16; ++i)
   1989       for (j=0; j < count[i]; ++j)
   1990          h->size[k++] = (stbi_uc) (i+1);
   1991    h->size[k] = 0;
   1992 
   1993    // compute actual symbols (from jpeg spec)
   1994    code = 0;
   1995    k = 0;
   1996    for(j=1; j <= 16; ++j) {
   1997       // compute delta to add to code to compute symbol id
   1998       h->delta[j] = k - code;
   1999       if (h->size[k] == j) {
   2000          while (h->size[k] == j)
   2001             h->code[k++] = (stbi__uint16) (code++);
   2002          if (code-1 >= (1u << j)) return stbi__err("bad code lengths","Corrupt JPEG");
   2003       }
   2004       // compute largest code + 1 for this size, preshifted as needed later
   2005       h->maxcode[j] = code << (16-j);
   2006       code <<= 1;
   2007    }
   2008    h->maxcode[j] = 0xffffffff;
   2009 
   2010    // build non-spec acceleration table; 255 is flag for not-accelerated
   2011    memset(h->fast, 255, 1 << FAST_BITS);
   2012    for (i=0; i < k; ++i) {
   2013       int s = h->size[i];
   2014       if (s <= FAST_BITS) {
   2015          int c = h->code[i] << (FAST_BITS-s);
   2016          int m = 1 << (FAST_BITS-s);
   2017          for (j=0; j < m; ++j) {
   2018             h->fast[c+j] = (stbi_uc) i;
   2019          }
   2020       }
   2021    }
   2022    return 1;
   2023 }
   2024 
   2025 // build a table that decodes both magnitude and value of small ACs in
   2026 // one go.
   2027 static void stbi__build_fast_ac(stbi__int16 *fast_ac, stbi__huffman *h)
   2028 {
   2029    int i;
   2030    for (i=0; i < (1 << FAST_BITS); ++i) {
   2031       stbi_uc fast = h->fast[i];
   2032       fast_ac[i] = 0;
   2033       if (fast < 255) {
   2034          int rs = h->values[fast];
   2035          int run = (rs >> 4) & 15;
   2036          int magbits = rs & 15;
   2037          int len = h->size[fast];
   2038 
   2039          if (magbits && len + magbits <= FAST_BITS) {
   2040             // magnitude code followed by receive_extend code
   2041             int k = ((i << len) & ((1 << FAST_BITS) - 1)) >> (FAST_BITS - magbits);
   2042             int m = 1 << (magbits - 1);
   2043             if (k < m) k += (~0U << magbits) + 1;
   2044             // if the result is small enough, we can fit it in fast_ac table
   2045             if (k >= -128 && k <= 127)
   2046                fast_ac[i] = (stbi__int16) ((k * 256) + (run * 16) + (len + magbits));
   2047          }
   2048       }
   2049    }
   2050 }
   2051 
   2052 static void stbi__grow_buffer_unsafe(stbi__jpeg *j)
   2053 {
   2054    do {
   2055       unsigned int b = j->nomore ? 0 : stbi__get8(j->s);
   2056       if (b == 0xff) {
   2057          int c = stbi__get8(j->s);
   2058          while (c == 0xff) c = stbi__get8(j->s); // consume fill bytes
   2059          if (c != 0) {
   2060             j->marker = (unsigned char) c;
   2061             j->nomore = 1;
   2062             return;
   2063          }
   2064       }
   2065       j->code_buffer |= b << (24 - j->code_bits);
   2066       j->code_bits += 8;
   2067    } while (j->code_bits <= 24);
   2068 }
   2069 
   2070 // (1 << n) - 1
   2071 static const stbi__uint32 stbi__bmask[17]={0,1,3,7,15,31,63,127,255,511,1023,2047,4095,8191,16383,32767,65535};
   2072 
   2073 // decode a jpeg huffman value from the bitstream
   2074 stbi_inline static int stbi__jpeg_huff_decode(stbi__jpeg *j, stbi__huffman *h)
   2075 {
   2076    unsigned int temp;
   2077    int c,k;
   2078 
   2079    if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
   2080 
   2081    // look at the top FAST_BITS and determine what symbol ID it is,
   2082    // if the code is <= FAST_BITS
   2083    c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
   2084    k = h->fast[c];
   2085    if (k < 255) {
   2086       int s = h->size[k];
   2087       if (s > j->code_bits)
   2088          return -1;
   2089       j->code_buffer <<= s;
   2090       j->code_bits -= s;
   2091       return h->values[k];
   2092    }
   2093 
   2094    // naive test is to shift the code_buffer down so k bits are
   2095    // valid, then test against maxcode. To speed this up, we've
   2096    // preshifted maxcode left so that it has (16-k) 0s at the
   2097    // end; in other words, regardless of the number of bits, it
   2098    // wants to be compared against something shifted to have 16;
   2099    // that way we don't need to shift inside the loop.
   2100    temp = j->code_buffer >> 16;
   2101    for (k=FAST_BITS+1 ; ; ++k)
   2102       if (temp < h->maxcode[k])
   2103          break;
   2104    if (k == 17) {
   2105       // error! code not found
   2106       j->code_bits -= 16;
   2107       return -1;
   2108    }
   2109 
   2110    if (k > j->code_bits)
   2111       return -1;
   2112 
   2113    // convert the huffman code to the symbol id
   2114    c = ((j->code_buffer >> (32 - k)) & stbi__bmask[k]) + h->delta[k];
   2115    STBI_ASSERT((((j->code_buffer) >> (32 - h->size[c])) & stbi__bmask[h->size[c]]) == h->code[c]);
   2116 
   2117    // convert the id to a symbol
   2118    j->code_bits -= k;
   2119    j->code_buffer <<= k;
   2120    return h->values[c];
   2121 }
   2122 
   2123 // bias[n] = (-1<<n) + 1
   2124 static const int stbi__jbias[16] = {0,-1,-3,-7,-15,-31,-63,-127,-255,-511,-1023,-2047,-4095,-8191,-16383,-32767};
   2125 
   2126 // combined JPEG 'receive' and JPEG 'extend', since baseline
   2127 // always extends everything it receives.
   2128 stbi_inline static int stbi__extend_receive(stbi__jpeg *j, int n)
   2129 {
   2130    unsigned int k;
   2131    int sgn;
   2132    if (j->code_bits < n) stbi__grow_buffer_unsafe(j);
   2133 
   2134    sgn = j->code_buffer >> 31; // sign bit always in MSB; 0 if MSB clear (positive), 1 if MSB set (negative)
   2135    k = stbi_lrot(j->code_buffer, n);
   2136    j->code_buffer = k & ~stbi__bmask[n];
   2137    k &= stbi__bmask[n];
   2138    j->code_bits -= n;
   2139    return k + (stbi__jbias[n] & (sgn - 1));
   2140 }
   2141 
   2142 // get some unsigned bits
   2143 stbi_inline static int stbi__jpeg_get_bits(stbi__jpeg *j, int n)
   2144 {
   2145    unsigned int k;
   2146    if (j->code_bits < n) stbi__grow_buffer_unsafe(j);
   2147    k = stbi_lrot(j->code_buffer, n);
   2148    j->code_buffer = k & ~stbi__bmask[n];
   2149    k &= stbi__bmask[n];
   2150    j->code_bits -= n;
   2151    return k;
   2152 }
   2153 
   2154 stbi_inline static int stbi__jpeg_get_bit(stbi__jpeg *j)
   2155 {
   2156    unsigned int k;
   2157    if (j->code_bits < 1) stbi__grow_buffer_unsafe(j);
   2158    k = j->code_buffer;
   2159    j->code_buffer <<= 1;
   2160    --j->code_bits;
   2161    return k & 0x80000000;
   2162 }
   2163 
   2164 // given a value that's at position X in the zigzag stream,
   2165 // where does it appear in the 8x8 matrix coded as row-major?
   2166 static const stbi_uc stbi__jpeg_dezigzag[64+15] =
   2167 {
   2168     0,  1,  8, 16,  9,  2,  3, 10,
   2169    17, 24, 32, 25, 18, 11,  4,  5,
   2170    12, 19, 26, 33, 40, 48, 41, 34,
   2171    27, 20, 13,  6,  7, 14, 21, 28,
   2172    35, 42, 49, 56, 57, 50, 43, 36,
   2173    29, 22, 15, 23, 30, 37, 44, 51,
   2174    58, 59, 52, 45, 38, 31, 39, 46,
   2175    53, 60, 61, 54, 47, 55, 62, 63,
   2176    // let corrupt input sample past end
   2177    63, 63, 63, 63, 63, 63, 63, 63,
   2178    63, 63, 63, 63, 63, 63, 63
   2179 };
   2180 
   2181 // decode one 64-entry block--
   2182 static int stbi__jpeg_decode_block(stbi__jpeg *j, short data[64], stbi__huffman *hdc, stbi__huffman *hac, stbi__int16 *fac, int b, stbi__uint16 *dequant)
   2183 {
   2184    int diff,dc,k;
   2185    int t;
   2186 
   2187    if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
   2188    t = stbi__jpeg_huff_decode(j, hdc);
   2189    if (t < 0 || t > 15) return stbi__err("bad huffman code","Corrupt JPEG");
   2190 
   2191    // 0 all the ac values now so we can do it 32-bits at a time
   2192    memset(data,0,64*sizeof(data[0]));
   2193 
   2194    diff = t ? stbi__extend_receive(j, t) : 0;
   2195    dc = j->img_comp[b].dc_pred + diff;
   2196    j->img_comp[b].dc_pred = dc;
   2197    data[0] = (short) (dc * dequant[0]);
   2198 
   2199    // decode AC components, see JPEG spec
   2200    k = 1;
   2201    do {
   2202       unsigned int zig;
   2203       int c,r,s;
   2204       if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
   2205       c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
   2206       r = fac[c];
   2207       if (r) { // fast-AC path
   2208          k += (r >> 4) & 15; // run
   2209          s = r & 15; // combined length
   2210          j->code_buffer <<= s;
   2211          j->code_bits -= s;
   2212          // decode into unzigzag'd location
   2213          zig = stbi__jpeg_dezigzag[k++];
   2214          data[zig] = (short) ((r >> 8) * dequant[zig]);
   2215       } else {
   2216          int rs = stbi__jpeg_huff_decode(j, hac);
   2217          if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
   2218          s = rs & 15;
   2219          r = rs >> 4;
   2220          if (s == 0) {
   2221             if (rs != 0xf0) break; // end block
   2222             k += 16;
   2223          } else {
   2224             k += r;
   2225             // decode into unzigzag'd location
   2226             zig = stbi__jpeg_dezigzag[k++];
   2227             data[zig] = (short) (stbi__extend_receive(j,s) * dequant[zig]);
   2228          }
   2229       }
   2230    } while (k < 64);
   2231    return 1;
   2232 }
   2233 
   2234 static int stbi__jpeg_decode_block_prog_dc(stbi__jpeg *j, short data[64], stbi__huffman *hdc, int b)
   2235 {
   2236    int diff,dc;
   2237    int t;
   2238    if (j->spec_end != 0) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
   2239 
   2240    if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
   2241 
   2242    if (j->succ_high == 0) {
   2243       // first scan for DC coefficient, must be first
   2244       memset(data,0,64*sizeof(data[0])); // 0 all the ac values now
   2245       t = stbi__jpeg_huff_decode(j, hdc);
   2246       if (t < 0 || t > 15) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
   2247       diff = t ? stbi__extend_receive(j, t) : 0;
   2248 
   2249       dc = j->img_comp[b].dc_pred + diff;
   2250       j->img_comp[b].dc_pred = dc;
   2251       data[0] = (short) (dc * (1 << j->succ_low));
   2252    } else {
   2253       // refinement scan for DC coefficient
   2254       if (stbi__jpeg_get_bit(j))
   2255          data[0] += (short) (1 << j->succ_low);
   2256    }
   2257    return 1;
   2258 }
   2259 
   2260 // @OPTIMIZE: store non-zigzagged during the decode passes,
   2261 // and only de-zigzag when dequantizing
   2262 static int stbi__jpeg_decode_block_prog_ac(stbi__jpeg *j, short data[64], stbi__huffman *hac, stbi__int16 *fac)
   2263 {
   2264    int k;
   2265    if (j->spec_start == 0) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
   2266 
   2267    if (j->succ_high == 0) {
   2268       int shift = j->succ_low;
   2269 
   2270       if (j->eob_run) {
   2271          --j->eob_run;
   2272          return 1;
   2273       }
   2274 
   2275       k = j->spec_start;
   2276       do {
   2277          unsigned int zig;
   2278          int c,r,s;
   2279          if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
   2280          c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
   2281          r = fac[c];
   2282          if (r) { // fast-AC path
   2283             k += (r >> 4) & 15; // run
   2284             s = r & 15; // combined length
   2285             j->code_buffer <<= s;
   2286             j->code_bits -= s;
   2287             zig = stbi__jpeg_dezigzag[k++];
   2288             data[zig] = (short) ((r >> 8) * (1 << shift));
   2289          } else {
   2290             int rs = stbi__jpeg_huff_decode(j, hac);
   2291             if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
   2292             s = rs & 15;
   2293             r = rs >> 4;
   2294             if (s == 0) {
   2295                if (r < 15) {
   2296                   j->eob_run = (1 << r);
   2297                   if (r)
   2298                      j->eob_run += stbi__jpeg_get_bits(j, r);
   2299                   --j->eob_run;
   2300                   break;
   2301                }
   2302                k += 16;
   2303             } else {
   2304                k += r;
   2305                zig = stbi__jpeg_dezigzag[k++];
   2306                data[zig] = (short) (stbi__extend_receive(j,s) * (1 << shift));
   2307             }
   2308          }
   2309       } while (k <= j->spec_end);
   2310    } else {
   2311       // refinement scan for these AC coefficients
   2312 
   2313       short bit = (short) (1 << j->succ_low);
   2314 
   2315       if (j->eob_run) {
   2316          --j->eob_run;
   2317          for (k = j->spec_start; k <= j->spec_end; ++k) {
   2318             short *p = &data[stbi__jpeg_dezigzag[k]];
   2319             if (*p != 0)
   2320                if (stbi__jpeg_get_bit(j))
   2321                   if ((*p & bit)==0) {
   2322                      if (*p > 0)
   2323                         *p += bit;
   2324                      else
   2325                         *p -= bit;
   2326                   }
   2327          }
   2328       } else {
   2329          k = j->spec_start;
   2330          do {
   2331             int r,s;
   2332             int rs = stbi__jpeg_huff_decode(j, hac); // @OPTIMIZE see if we can use the fast path here, advance-by-r is so slow, eh
   2333             if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
   2334             s = rs & 15;
   2335             r = rs >> 4;
   2336             if (s == 0) {
   2337                if (r < 15) {
   2338                   j->eob_run = (1 << r) - 1;
   2339                   if (r)
   2340                      j->eob_run += stbi__jpeg_get_bits(j, r);
   2341                   r = 64; // force end of block
   2342                } else {
   2343                   // r=15 s=0 should write 16 0s, so we just do
   2344                   // a run of 15 0s and then write s (which is 0),
   2345                   // so we don't have to do anything special here
   2346                }
   2347             } else {
   2348                if (s != 1) return stbi__err("bad huffman code", "Corrupt JPEG");
   2349                // sign bit
   2350                if (stbi__jpeg_get_bit(j))
   2351                   s = bit;
   2352                else
   2353                   s = -bit;
   2354             }
   2355 
   2356             // advance by r
   2357             while (k <= j->spec_end) {
   2358                short *p = &data[stbi__jpeg_dezigzag[k++]];
   2359                if (*p != 0) {
   2360                   if (stbi__jpeg_get_bit(j))
   2361                      if ((*p & bit)==0) {
   2362                         if (*p > 0)
   2363                            *p += bit;
   2364                         else
   2365                            *p -= bit;
   2366                      }
   2367                } else {
   2368                   if (r == 0) {
   2369                      *p = (short) s;
   2370                      break;
   2371                   }
   2372                   --r;
   2373                }
   2374             }
   2375          } while (k <= j->spec_end);
   2376       }
   2377    }
   2378    return 1;
   2379 }
   2380 
   2381 // take a -128..127 value and stbi__clamp it and convert to 0..255
   2382 stbi_inline static stbi_uc stbi__clamp(int x)
   2383 {
   2384    // trick to use a single test to catch both cases
   2385    if ((unsigned int) x > 255) {
   2386       if (x < 0) return 0;
   2387       if (x > 255) return 255;
   2388    }
   2389    return (stbi_uc) x;
   2390 }
   2391 
   2392 #define stbi__f2f(x)  ((int) (((x) * 4096 + 0.5)))
   2393 #define stbi__fsh(x)  ((x) * 4096)
   2394 
   2395 // derived from jidctint -- DCT_ISLOW
   2396 #define STBI__IDCT_1D(s0,s1,s2,s3,s4,s5,s6,s7) \
   2397    int t0,t1,t2,t3,p1,p2,p3,p4,p5,x0,x1,x2,x3; \
   2398    p2 = s2;                                    \
   2399    p3 = s6;                                    \
   2400    p1 = (p2+p3) * stbi__f2f(0.5411961f);       \
   2401    t2 = p1 + p3*stbi__f2f(-1.847759065f);      \
   2402    t3 = p1 + p2*stbi__f2f( 0.765366865f);      \
   2403    p2 = s0;                                    \
   2404    p3 = s4;                                    \
   2405    t0 = stbi__fsh(p2+p3);                      \
   2406    t1 = stbi__fsh(p2-p3);                      \
   2407    x0 = t0+t3;                                 \
   2408    x3 = t0-t3;                                 \
   2409    x1 = t1+t2;                                 \
   2410    x2 = t1-t2;                                 \
   2411    t0 = s7;                                    \
   2412    t1 = s5;                                    \
   2413    t2 = s3;                                    \
   2414    t3 = s1;                                    \
   2415    p3 = t0+t2;                                 \
   2416    p4 = t1+t3;                                 \
   2417    p1 = t0+t3;                                 \
   2418    p2 = t1+t2;                                 \
   2419    p5 = (p3+p4)*stbi__f2f( 1.175875602f);      \
   2420    t0 = t0*stbi__f2f( 0.298631336f);           \
   2421    t1 = t1*stbi__f2f( 2.053119869f);           \
   2422    t2 = t2*stbi__f2f( 3.072711026f);           \
   2423    t3 = t3*stbi__f2f( 1.501321110f);           \
   2424    p1 = p5 + p1*stbi__f2f(-0.899976223f);      \
   2425    p2 = p5 + p2*stbi__f2f(-2.562915447f);      \
   2426    p3 = p3*stbi__f2f(-1.961570560f);           \
   2427    p4 = p4*stbi__f2f(-0.390180644f);           \
   2428    t3 += p1+p4;                                \
   2429    t2 += p2+p3;                                \
   2430    t1 += p2+p4;                                \
   2431    t0 += p1+p3;
   2432 
   2433 static void stbi__idct_block(stbi_uc *out, int out_stride, short data[64])
   2434 {
   2435    int i,val[64],*v=val;
   2436    stbi_uc *o;
   2437    short *d = data;
   2438 
   2439    // columns
   2440    for (i=0; i < 8; ++i,++d, ++v) {
   2441       // if all zeroes, shortcut -- this avoids dequantizing 0s and IDCTing
   2442       if (d[ 8]==0 && d[16]==0 && d[24]==0 && d[32]==0
   2443            && d[40]==0 && d[48]==0 && d[56]==0) {
   2444          //    no shortcut                 0     seconds
   2445          //    (1|2|3|4|5|6|7)==0          0     seconds
   2446          //    all separate               -0.047 seconds
   2447          //    1 && 2|3 && 4|5 && 6|7:    -0.047 seconds
   2448          int dcterm = d[0]*4;
   2449          v[0] = v[8] = v[16] = v[24] = v[32] = v[40] = v[48] = v[56] = dcterm;
   2450       } else {
   2451          STBI__IDCT_1D(d[ 0],d[ 8],d[16],d[24],d[32],d[40],d[48],d[56])
   2452          // constants scaled things up by 1<<12; let's bring them back
   2453          // down, but keep 2 extra bits of precision
   2454          x0 += 512; x1 += 512; x2 += 512; x3 += 512;
   2455          v[ 0] = (x0+t3) >> 10;
   2456          v[56] = (x0-t3) >> 10;
   2457          v[ 8] = (x1+t2) >> 10;
   2458          v[48] = (x1-t2) >> 10;
   2459          v[16] = (x2+t1) >> 10;
   2460          v[40] = (x2-t1) >> 10;
   2461          v[24] = (x3+t0) >> 10;
   2462          v[32] = (x3-t0) >> 10;
   2463       }
   2464    }
   2465 
   2466    for (i=0, v=val, o=out; i < 8; ++i,v+=8,o+=out_stride) {
   2467       // no fast case since the first 1D IDCT spread components out
   2468       STBI__IDCT_1D(v[0],v[1],v[2],v[3],v[4],v[5],v[6],v[7])
   2469       // constants scaled things up by 1<<12, plus we had 1<<2 from first
   2470       // loop, plus horizontal and vertical each scale by sqrt(8) so together
   2471       // we've got an extra 1<<3, so 1<<17 total we need to remove.
   2472       // so we want to round that, which means adding 0.5 * 1<<17,
   2473       // aka 65536. Also, we'll end up with -128 to 127 that we want
   2474       // to encode as 0..255 by adding 128, so we'll add that before the shift
   2475       x0 += 65536 + (128<<17);
   2476       x1 += 65536 + (128<<17);
   2477       x2 += 65536 + (128<<17);
   2478       x3 += 65536 + (128<<17);
   2479       // tried computing the shifts into temps, or'ing the temps to see
   2480       // if any were out of range, but that was slower
   2481       o[0] = stbi__clamp((x0+t3) >> 17);
   2482       o[7] = stbi__clamp((x0-t3) >> 17);
   2483       o[1] = stbi__clamp((x1+t2) >> 17);
   2484       o[6] = stbi__clamp((x1-t2) >> 17);
   2485       o[2] = stbi__clamp((x2+t1) >> 17);
   2486       o[5] = stbi__clamp((x2-t1) >> 17);
   2487       o[3] = stbi__clamp((x3+t0) >> 17);
   2488       o[4] = stbi__clamp((x3-t0) >> 17);
   2489    }
   2490 }
   2491 
   2492 #ifdef STBI_SSE2
   2493 // sse2 integer IDCT. not the fastest possible implementation but it
   2494 // produces bit-identical results to the generic C version so it's
   2495 // fully "transparent".
   2496 static void stbi__idct_simd(stbi_uc *out, int out_stride, short data[64])
   2497 {
   2498    // This is constructed to match our regular (generic) integer IDCT exactly.
   2499    __m128i row0, row1, row2, row3, row4, row5, row6, row7;
   2500    __m128i tmp;
   2501 
   2502    // dot product constant: even elems=x, odd elems=y
   2503    #define dct_const(x,y)  _mm_setr_epi16((x),(y),(x),(y),(x),(y),(x),(y))
   2504 
   2505    // out(0) = c0[even]*x + c0[odd]*y   (c0, x, y 16-bit, out 32-bit)
   2506    // out(1) = c1[even]*x + c1[odd]*y
   2507    #define dct_rot(out0,out1, x,y,c0,c1) \
   2508       __m128i c0##lo = _mm_unpacklo_epi16((x),(y)); \
   2509       __m128i c0##hi = _mm_unpackhi_epi16((x),(y)); \
   2510       __m128i out0##_l = _mm_madd_epi16(c0##lo, c0); \
   2511       __m128i out0##_h = _mm_madd_epi16(c0##hi, c0); \
   2512       __m128i out1##_l = _mm_madd_epi16(c0##lo, c1); \
   2513       __m128i out1##_h = _mm_madd_epi16(c0##hi, c1)
   2514 
   2515    // out = in << 12  (in 16-bit, out 32-bit)
   2516    #define dct_widen(out, in) \
   2517       __m128i out##_l = _mm_srai_epi32(_mm_unpacklo_epi16(_mm_setzero_si128(), (in)), 4); \
   2518       __m128i out##_h = _mm_srai_epi32(_mm_unpackhi_epi16(_mm_setzero_si128(), (in)), 4)
   2519 
   2520    // wide add
   2521    #define dct_wadd(out, a, b) \
   2522       __m128i out##_l = _mm_add_epi32(a##_l, b##_l); \
   2523       __m128i out##_h = _mm_add_epi32(a##_h, b##_h)
   2524 
   2525    // wide sub
   2526    #define dct_wsub(out, a, b) \
   2527       __m128i out##_l = _mm_sub_epi32(a##_l, b##_l); \
   2528       __m128i out##_h = _mm_sub_epi32(a##_h, b##_h)
   2529 
   2530    // butterfly a/b, add bias, then shift by "s" and pack
   2531    #define dct_bfly32o(out0, out1, a,b,bias,s) \
   2532       { \
   2533          __m128i abiased_l = _mm_add_epi32(a##_l, bias); \
   2534          __m128i abiased_h = _mm_add_epi32(a##_h, bias); \
   2535          dct_wadd(sum, abiased, b); \
   2536          dct_wsub(dif, abiased, b); \
   2537          out0 = _mm_packs_epi32(_mm_srai_epi32(sum_l, s), _mm_srai_epi32(sum_h, s)); \
   2538          out1 = _mm_packs_epi32(_mm_srai_epi32(dif_l, s), _mm_srai_epi32(dif_h, s)); \
   2539       }
   2540 
   2541    // 8-bit interleave step (for transposes)
   2542    #define dct_interleave8(a, b) \
   2543       tmp = a; \
   2544       a = _mm_unpacklo_epi8(a, b); \
   2545       b = _mm_unpackhi_epi8(tmp, b)
   2546 
   2547    // 16-bit interleave step (for transposes)
   2548    #define dct_interleave16(a, b) \
   2549       tmp = a; \
   2550       a = _mm_unpacklo_epi16(a, b); \
   2551       b = _mm_unpackhi_epi16(tmp, b)
   2552 
   2553    #define dct_pass(bias,shift) \
   2554       { \
   2555          /* even part */ \
   2556          dct_rot(t2e,t3e, row2,row6, rot0_0,rot0_1); \
   2557          __m128i sum04 = _mm_add_epi16(row0, row4); \
   2558          __m128i dif04 = _mm_sub_epi16(row0, row4); \
   2559          dct_widen(t0e, sum04); \
   2560          dct_widen(t1e, dif04); \
   2561          dct_wadd(x0, t0e, t3e); \
   2562          dct_wsub(x3, t0e, t3e); \
   2563          dct_wadd(x1, t1e, t2e); \
   2564          dct_wsub(x2, t1e, t2e); \
   2565          /* odd part */ \
   2566          dct_rot(y0o,y2o, row7,row3, rot2_0,rot2_1); \
   2567          dct_rot(y1o,y3o, row5,row1, rot3_0,rot3_1); \
   2568          __m128i sum17 = _mm_add_epi16(row1, row7); \
   2569          __m128i sum35 = _mm_add_epi16(row3, row5); \
   2570          dct_rot(y4o,y5o, sum17,sum35, rot1_0,rot1_1); \
   2571          dct_wadd(x4, y0o, y4o); \
   2572          dct_wadd(x5, y1o, y5o); \
   2573          dct_wadd(x6, y2o, y5o); \
   2574          dct_wadd(x7, y3o, y4o); \
   2575          dct_bfly32o(row0,row7, x0,x7,bias,shift); \
   2576          dct_bfly32o(row1,row6, x1,x6,bias,shift); \
   2577          dct_bfly32o(row2,row5, x2,x5,bias,shift); \
   2578          dct_bfly32o(row3,row4, x3,x4,bias,shift); \
   2579       }
   2580 
   2581    __m128i rot0_0 = dct_const(stbi__f2f(0.5411961f), stbi__f2f(0.5411961f) + stbi__f2f(-1.847759065f));
   2582    __m128i rot0_1 = dct_const(stbi__f2f(0.5411961f) + stbi__f2f( 0.765366865f), stbi__f2f(0.5411961f));
   2583    __m128i rot1_0 = dct_const(stbi__f2f(1.175875602f) + stbi__f2f(-0.899976223f), stbi__f2f(1.175875602f));
   2584    __m128i rot1_1 = dct_const(stbi__f2f(1.175875602f), stbi__f2f(1.175875602f) + stbi__f2f(-2.562915447f));
   2585    __m128i rot2_0 = dct_const(stbi__f2f(-1.961570560f) + stbi__f2f( 0.298631336f), stbi__f2f(-1.961570560f));
   2586    __m128i rot2_1 = dct_const(stbi__f2f(-1.961570560f), stbi__f2f(-1.961570560f) + stbi__f2f( 3.072711026f));
   2587    __m128i rot3_0 = dct_const(stbi__f2f(-0.390180644f) + stbi__f2f( 2.053119869f), stbi__f2f(-0.390180644f));
   2588    __m128i rot3_1 = dct_const(stbi__f2f(-0.390180644f), stbi__f2f(-0.390180644f) + stbi__f2f( 1.501321110f));
   2589 
   2590    // rounding biases in column/row passes, see stbi__idct_block for explanation.
   2591    __m128i bias_0 = _mm_set1_epi32(512);
   2592    __m128i bias_1 = _mm_set1_epi32(65536 + (128<<17));
   2593 
   2594    // load
   2595    row0 = _mm_load_si128((const __m128i *) (data + 0*8));
   2596    row1 = _mm_load_si128((const __m128i *) (data + 1*8));
   2597    row2 = _mm_load_si128((const __m128i *) (data + 2*8));
   2598    row3 = _mm_load_si128((const __m128i *) (data + 3*8));
   2599    row4 = _mm_load_si128((const __m128i *) (data + 4*8));
   2600    row5 = _mm_load_si128((const __m128i *) (data + 5*8));
   2601    row6 = _mm_load_si128((const __m128i *) (data + 6*8));
   2602    row7 = _mm_load_si128((const __m128i *) (data + 7*8));
   2603 
   2604    // column pass
   2605    dct_pass(bias_0, 10);
   2606 
   2607    {
   2608       // 16bit 8x8 transpose pass 1
   2609       dct_interleave16(row0, row4);
   2610       dct_interleave16(row1, row5);
   2611       dct_interleave16(row2, row6);
   2612       dct_interleave16(row3, row7);
   2613 
   2614       // transpose pass 2
   2615       dct_interleave16(row0, row2);
   2616       dct_interleave16(row1, row3);
   2617       dct_interleave16(row4, row6);
   2618       dct_interleave16(row5, row7);
   2619 
   2620       // transpose pass 3
   2621       dct_interleave16(row0, row1);
   2622       dct_interleave16(row2, row3);
   2623       dct_interleave16(row4, row5);
   2624       dct_interleave16(row6, row7);
   2625    }
   2626 
   2627    // row pass
   2628    dct_pass(bias_1, 17);
   2629 
   2630    {
   2631       // pack
   2632       __m128i p0 = _mm_packus_epi16(row0, row1); // a0a1a2a3...a7b0b1b2b3...b7
   2633       __m128i p1 = _mm_packus_epi16(row2, row3);
   2634       __m128i p2 = _mm_packus_epi16(row4, row5);
   2635       __m128i p3 = _mm_packus_epi16(row6, row7);
   2636 
   2637       // 8bit 8x8 transpose pass 1
   2638       dct_interleave8(p0, p2); // a0e0a1e1...
   2639       dct_interleave8(p1, p3); // c0g0c1g1...
   2640 
   2641       // transpose pass 2
   2642       dct_interleave8(p0, p1); // a0c0e0g0...
   2643       dct_interleave8(p2, p3); // b0d0f0h0...
   2644 
   2645       // transpose pass 3
   2646       dct_interleave8(p0, p2); // a0b0c0d0...
   2647       dct_interleave8(p1, p3); // a4b4c4d4...
   2648 
   2649       // store
   2650       _mm_storel_epi64((__m128i *) out, p0); out += out_stride;
   2651       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p0, 0x4e)); out += out_stride;
   2652       _mm_storel_epi64((__m128i *) out, p2); out += out_stride;
   2653       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p2, 0x4e)); out += out_stride;
   2654       _mm_storel_epi64((__m128i *) out, p1); out += out_stride;
   2655       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p1, 0x4e)); out += out_stride;
   2656       _mm_storel_epi64((__m128i *) out, p3); out += out_stride;
   2657       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p3, 0x4e));
   2658    }
   2659 
   2660 #undef dct_const
   2661 #undef dct_rot
   2662 #undef dct_widen
   2663 #undef dct_wadd
   2664 #undef dct_wsub
   2665 #undef dct_bfly32o
   2666 #undef dct_interleave8
   2667 #undef dct_interleave16
   2668 #undef dct_pass
   2669 }
   2670 
   2671 #endif // STBI_SSE2
   2672 
   2673 #ifdef STBI_NEON
   2674 
   2675 // NEON integer IDCT. should produce bit-identical
   2676 // results to the generic C version.
   2677 static void stbi__idct_simd(stbi_uc *out, int out_stride, short data[64])
   2678 {
   2679    int16x8_t row0, row1, row2, row3, row4, row5, row6, row7;
   2680 
   2681    int16x4_t rot0_0 = vdup_n_s16(stbi__f2f(0.5411961f));
   2682    int16x4_t rot0_1 = vdup_n_s16(stbi__f2f(-1.847759065f));
   2683    int16x4_t rot0_2 = vdup_n_s16(stbi__f2f( 0.765366865f));
   2684    int16x4_t rot1_0 = vdup_n_s16(stbi__f2f( 1.175875602f));
   2685    int16x4_t rot1_1 = vdup_n_s16(stbi__f2f(-0.899976223f));
   2686    int16x4_t rot1_2 = vdup_n_s16(stbi__f2f(-2.562915447f));
   2687    int16x4_t rot2_0 = vdup_n_s16(stbi__f2f(-1.961570560f));
   2688    int16x4_t rot2_1 = vdup_n_s16(stbi__f2f(-0.390180644f));
   2689    int16x4_t rot3_0 = vdup_n_s16(stbi__f2f( 0.298631336f));
   2690    int16x4_t rot3_1 = vdup_n_s16(stbi__f2f( 2.053119869f));
   2691    int16x4_t rot3_2 = vdup_n_s16(stbi__f2f( 3.072711026f));
   2692    int16x4_t rot3_3 = vdup_n_s16(stbi__f2f( 1.501321110f));
   2693 
   2694 #define dct_long_mul(out, inq, coeff) \
   2695    int32x4_t out##_l = vmull_s16(vget_low_s16(inq), coeff); \
   2696    int32x4_t out##_h = vmull_s16(vget_high_s16(inq), coeff)
   2697 
   2698 #define dct_long_mac(out, acc, inq, coeff) \
   2699    int32x4_t out##_l = vmlal_s16(acc##_l, vget_low_s16(inq), coeff); \
   2700    int32x4_t out##_h = vmlal_s16(acc##_h, vget_high_s16(inq), coeff)
   2701 
   2702 #define dct_widen(out, inq) \
   2703    int32x4_t out##_l = vshll_n_s16(vget_low_s16(inq), 12); \
   2704    int32x4_t out##_h = vshll_n_s16(vget_high_s16(inq), 12)
   2705 
   2706 // wide add
   2707 #define dct_wadd(out, a, b) \
   2708    int32x4_t out##_l = vaddq_s32(a##_l, b##_l); \
   2709    int32x4_t out##_h = vaddq_s32(a##_h, b##_h)
   2710 
   2711 // wide sub
   2712 #define dct_wsub(out, a, b) \
   2713    int32x4_t out##_l = vsubq_s32(a##_l, b##_l); \
   2714    int32x4_t out##_h = vsubq_s32(a##_h, b##_h)
   2715 
   2716 // butterfly a/b, then shift using "shiftop" by "s" and pack
   2717 #define dct_bfly32o(out0,out1, a,b,shiftop,s) \
   2718    { \
   2719       dct_wadd(sum, a, b); \
   2720       dct_wsub(dif, a, b); \
   2721       out0 = vcombine_s16(shiftop(sum_l, s), shiftop(sum_h, s)); \
   2722       out1 = vcombine_s16(shiftop(dif_l, s), shiftop(dif_h, s)); \
   2723    }
   2724 
   2725 #define dct_pass(shiftop, shift) \
   2726    { \
   2727       /* even part */ \
   2728       int16x8_t sum26 = vaddq_s16(row2, row6); \
   2729       dct_long_mul(p1e, sum26, rot0_0); \
   2730       dct_long_mac(t2e, p1e, row6, rot0_1); \
   2731       dct_long_mac(t3e, p1e, row2, rot0_2); \
   2732       int16x8_t sum04 = vaddq_s16(row0, row4); \
   2733       int16x8_t dif04 = vsubq_s16(row0, row4); \
   2734       dct_widen(t0e, sum04); \
   2735       dct_widen(t1e, dif04); \
   2736       dct_wadd(x0, t0e, t3e); \
   2737       dct_wsub(x3, t0e, t3e); \
   2738       dct_wadd(x1, t1e, t2e); \
   2739       dct_wsub(x2, t1e, t2e); \
   2740       /* odd part */ \
   2741       int16x8_t sum15 = vaddq_s16(row1, row5); \
   2742       int16x8_t sum17 = vaddq_s16(row1, row7); \
   2743       int16x8_t sum35 = vaddq_s16(row3, row5); \
   2744       int16x8_t sum37 = vaddq_s16(row3, row7); \
   2745       int16x8_t sumodd = vaddq_s16(sum17, sum35); \
   2746       dct_long_mul(p5o, sumodd, rot1_0); \
   2747       dct_long_mac(p1o, p5o, sum17, rot1_1); \
   2748       dct_long_mac(p2o, p5o, sum35, rot1_2); \
   2749       dct_long_mul(p3o, sum37, rot2_0); \
   2750       dct_long_mul(p4o, sum15, rot2_1); \
   2751       dct_wadd(sump13o, p1o, p3o); \
   2752       dct_wadd(sump24o, p2o, p4o); \
   2753       dct_wadd(sump23o, p2o, p3o); \
   2754       dct_wadd(sump14o, p1o, p4o); \
   2755       dct_long_mac(x4, sump13o, row7, rot3_0); \
   2756       dct_long_mac(x5, sump24o, row5, rot3_1); \
   2757       dct_long_mac(x6, sump23o, row3, rot3_2); \
   2758       dct_long_mac(x7, sump14o, row1, rot3_3); \
   2759       dct_bfly32o(row0,row7, x0,x7,shiftop,shift); \
   2760       dct_bfly32o(row1,row6, x1,x6,shiftop,shift); \
   2761       dct_bfly32o(row2,row5, x2,x5,shiftop,shift); \
   2762       dct_bfly32o(row3,row4, x3,x4,shiftop,shift); \
   2763    }
   2764 
   2765    // load
   2766    row0 = vld1q_s16(data + 0*8);
   2767    row1 = vld1q_s16(data + 1*8);
   2768    row2 = vld1q_s16(data + 2*8);
   2769    row3 = vld1q_s16(data + 3*8);
   2770    row4 = vld1q_s16(data + 4*8);
   2771    row5 = vld1q_s16(data + 5*8);
   2772    row6 = vld1q_s16(data + 6*8);
   2773    row7 = vld1q_s16(data + 7*8);
   2774 
   2775    // add DC bias
   2776    row0 = vaddq_s16(row0, vsetq_lane_s16(1024, vdupq_n_s16(0), 0));
   2777 
   2778    // column pass
   2779    dct_pass(vrshrn_n_s32, 10);
   2780 
   2781    // 16bit 8x8 transpose
   2782    {
   2783 // these three map to a single VTRN.16, VTRN.32, and VSWP, respectively.
   2784 // whether compilers actually get this is another story, sadly.
   2785 #define dct_trn16(x, y) { int16x8x2_t t = vtrnq_s16(x, y); x = t.val[0]; y = t.val[1]; }
   2786 #define dct_trn32(x, y) { int32x4x2_t t = vtrnq_s32(vreinterpretq_s32_s16(x), vreinterpretq_s32_s16(y)); x = vreinterpretq_s16_s32(t.val[0]); y = vreinterpretq_s16_s32(t.val[1]); }
   2787 #define dct_trn64(x, y) { int16x8_t x0 = x; int16x8_t y0 = y; x = vcombine_s16(vget_low_s16(x0), vget_low_s16(y0)); y = vcombine_s16(vget_high_s16(x0), vget_high_s16(y0)); }
   2788 
   2789       // pass 1
   2790       dct_trn16(row0, row1); // a0b0a2b2a4b4a6b6
   2791       dct_trn16(row2, row3);
   2792       dct_trn16(row4, row5);
   2793       dct_trn16(row6, row7);
   2794 
   2795       // pass 2
   2796       dct_trn32(row0, row2); // a0b0c0d0a4b4c4d4
   2797       dct_trn32(row1, row3);
   2798       dct_trn32(row4, row6);
   2799       dct_trn32(row5, row7);
   2800 
   2801       // pass 3
   2802       dct_trn64(row0, row4); // a0b0c0d0e0f0g0h0
   2803       dct_trn64(row1, row5);
   2804       dct_trn64(row2, row6);
   2805       dct_trn64(row3, row7);
   2806 
   2807 #undef dct_trn16
   2808 #undef dct_trn32
   2809 #undef dct_trn64
   2810    }
   2811 
   2812    // row pass
   2813    // vrshrn_n_s32 only supports shifts up to 16, we need
   2814    // 17. so do a non-rounding shift of 16 first then follow
   2815    // up with a rounding shift by 1.
   2816    dct_pass(vshrn_n_s32, 16);
   2817 
   2818    {
   2819       // pack and round
   2820       uint8x8_t p0 = vqrshrun_n_s16(row0, 1);
   2821       uint8x8_t p1 = vqrshrun_n_s16(row1, 1);
   2822       uint8x8_t p2 = vqrshrun_n_s16(row2, 1);
   2823       uint8x8_t p3 = vqrshrun_n_s16(row3, 1);
   2824       uint8x8_t p4 = vqrshrun_n_s16(row4, 1);
   2825       uint8x8_t p5 = vqrshrun_n_s16(row5, 1);
   2826       uint8x8_t p6 = vqrshrun_n_s16(row6, 1);
   2827       uint8x8_t p7 = vqrshrun_n_s16(row7, 1);
   2828 
   2829       // again, these can translate into one instruction, but often don't.
   2830 #define dct_trn8_8(x, y) { uint8x8x2_t t = vtrn_u8(x, y); x = t.val[0]; y = t.val[1]; }
   2831 #define dct_trn8_16(x, y) { uint16x4x2_t t = vtrn_u16(vreinterpret_u16_u8(x), vreinterpret_u16_u8(y)); x = vreinterpret_u8_u16(t.val[0]); y = vreinterpret_u8_u16(t.val[1]); }
   2832 #define dct_trn8_32(x, y) { uint32x2x2_t t = vtrn_u32(vreinterpret_u32_u8(x), vreinterpret_u32_u8(y)); x = vreinterpret_u8_u32(t.val[0]); y = vreinterpret_u8_u32(t.val[1]); }
   2833 
   2834       // sadly can't use interleaved stores here since we only write
   2835       // 8 bytes to each scan line!
   2836 
   2837       // 8x8 8-bit transpose pass 1
   2838       dct_trn8_8(p0, p1);
   2839       dct_trn8_8(p2, p3);
   2840       dct_trn8_8(p4, p5);
   2841       dct_trn8_8(p6, p7);
   2842 
   2843       // pass 2
   2844       dct_trn8_16(p0, p2);
   2845       dct_trn8_16(p1, p3);
   2846       dct_trn8_16(p4, p6);
   2847       dct_trn8_16(p5, p7);
   2848 
   2849       // pass 3
   2850       dct_trn8_32(p0, p4);
   2851       dct_trn8_32(p1, p5);
   2852       dct_trn8_32(p2, p6);
   2853       dct_trn8_32(p3, p7);
   2854 
   2855       // store
   2856       vst1_u8(out, p0); out += out_stride;
   2857       vst1_u8(out, p1); out += out_stride;
   2858       vst1_u8(out, p2); out += out_stride;
   2859       vst1_u8(out, p3); out += out_stride;
   2860       vst1_u8(out, p4); out += out_stride;
   2861       vst1_u8(out, p5); out += out_stride;
   2862       vst1_u8(out, p6); out += out_stride;
   2863       vst1_u8(out, p7);
   2864 
   2865 #undef dct_trn8_8
   2866 #undef dct_trn8_16
   2867 #undef dct_trn8_32
   2868    }
   2869 
   2870 #undef dct_long_mul
   2871 #undef dct_long_mac
   2872 #undef dct_widen
   2873 #undef dct_wadd
   2874 #undef dct_wsub
   2875 #undef dct_bfly32o
   2876 #undef dct_pass
   2877 }
   2878 
   2879 #endif // STBI_NEON
   2880 
   2881 #define STBI__MARKER_none  0xff
   2882 // if there's a pending marker from the entropy stream, return that
   2883 // otherwise, fetch from the stream and get a marker. if there's no
   2884 // marker, return 0xff, which is never a valid marker value
   2885 static stbi_uc stbi__get_marker(stbi__jpeg *j)
   2886 {
   2887    stbi_uc x;
   2888    if (j->marker != STBI__MARKER_none) { x = j->marker; j->marker = STBI__MARKER_none; return x; }
   2889    x = stbi__get8(j->s);
   2890    if (x != 0xff) return STBI__MARKER_none;
   2891    while (x == 0xff)
   2892       x = stbi__get8(j->s); // consume repeated 0xff fill bytes
   2893    return x;
   2894 }
   2895 
   2896 // in each scan, we'll have scan_n components, and the order
   2897 // of the components is specified by order[]
   2898 #define STBI__RESTART(x)     ((x) >= 0xd0 && (x) <= 0xd7)
   2899 
   2900 // after a restart interval, stbi__jpeg_reset the entropy decoder and
   2901 // the dc prediction
   2902 static void stbi__jpeg_reset(stbi__jpeg *j)
   2903 {
   2904    j->code_bits = 0;
   2905    j->code_buffer = 0;
   2906    j->nomore = 0;
   2907    j->img_comp[0].dc_pred = j->img_comp[1].dc_pred = j->img_comp[2].dc_pred = j->img_comp[3].dc_pred = 0;
   2908    j->marker = STBI__MARKER_none;
   2909    j->todo = j->restart_interval ? j->restart_interval : 0x7fffffff;
   2910    j->eob_run = 0;
   2911    // no more than 1<<31 MCUs if no restart_interal? that's plenty safe,
   2912    // since we don't even allow 1<<30 pixels
   2913 }
   2914 
   2915 static int stbi__parse_entropy_coded_data(stbi__jpeg *z)
   2916 {
   2917    stbi__jpeg_reset(z);
   2918    if (!z->progressive) {
   2919       if (z->scan_n == 1) {
   2920          int i,j;
   2921          STBI_SIMD_ALIGN(short, data[64]);
   2922          int n = z->order[0];
   2923          // non-interleaved data, we just need to process one block at a time,
   2924          // in trivial scanline order
   2925          // number of blocks to do just depends on how many actual "pixels" this
   2926          // component has, independent of interleaved MCU blocking and such
   2927          int w = (z->img_comp[n].x+7) >> 3;
   2928          int h = (z->img_comp[n].y+7) >> 3;
   2929          for (j=0; j < h; ++j) {
   2930             for (i=0; i < w; ++i) {
   2931                int ha = z->img_comp[n].ha;
   2932                if (!stbi__jpeg_decode_block(z, data, z->huff_dc+z->img_comp[n].hd, z->huff_ac+ha, z->fast_ac[ha], n, z->dequant[z->img_comp[n].tq])) return 0;
   2933                z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*j*8+i*8, z->img_comp[n].w2, data);
   2934                // every data block is an MCU, so countdown the restart interval
   2935                if (--z->todo <= 0) {
   2936                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
   2937                   // if it's NOT a restart, then just bail, so we get corrupt data
   2938                   // rather than no data
   2939                   if (!STBI__RESTART(z->marker)) return 1;
   2940                   stbi__jpeg_reset(z);
   2941                }
   2942             }
   2943          }
   2944          return 1;
   2945       } else { // interleaved
   2946          int i,j,k,x,y;
   2947          STBI_SIMD_ALIGN(short, data[64]);
   2948          for (j=0; j < z->img_mcu_y; ++j) {
   2949             for (i=0; i < z->img_mcu_x; ++i) {
   2950                // scan an interleaved mcu... process scan_n components in order
   2951                for (k=0; k < z->scan_n; ++k) {
   2952                   int n = z->order[k];
   2953                   // scan out an mcu's worth of this component; that's just determined
   2954                   // by the basic H and V specified for the component
   2955                   for (y=0; y < z->img_comp[n].v; ++y) {
   2956                      for (x=0; x < z->img_comp[n].h; ++x) {
   2957                         int x2 = (i*z->img_comp[n].h + x)*8;
   2958                         int y2 = (j*z->img_comp[n].v + y)*8;
   2959                         int ha = z->img_comp[n].ha;
   2960                         if (!stbi__jpeg_decode_block(z, data, z->huff_dc+z->img_comp[n].hd, z->huff_ac+ha, z->fast_ac[ha], n, z->dequant[z->img_comp[n].tq])) return 0;
   2961                         z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*y2+x2, z->img_comp[n].w2, data);
   2962                      }
   2963                   }
   2964                }
   2965                // after all interleaved components, that's an interleaved MCU,
   2966                // so now count down the restart interval
   2967                if (--z->todo <= 0) {
   2968                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
   2969                   if (!STBI__RESTART(z->marker)) return 1;
   2970                   stbi__jpeg_reset(z);
   2971                }
   2972             }
   2973          }
   2974          return 1;
   2975       }
   2976    } else {
   2977       if (z->scan_n == 1) {
   2978          int i,j;
   2979          int n = z->order[0];
   2980          // non-interleaved data, we just need to process one block at a time,
   2981          // in trivial scanline order
   2982          // number of blocks to do just depends on how many actual "pixels" this
   2983          // component has, independent of interleaved MCU blocking and such
   2984          int w = (z->img_comp[n].x+7) >> 3;
   2985          int h = (z->img_comp[n].y+7) >> 3;
   2986          for (j=0; j < h; ++j) {
   2987             for (i=0; i < w; ++i) {
   2988                short *data = z->img_comp[n].coeff + 64 * (i + j * z->img_comp[n].coeff_w);
   2989                if (z->spec_start == 0) {
   2990                   if (!stbi__jpeg_decode_block_prog_dc(z, data, &z->huff_dc[z->img_comp[n].hd], n))
   2991                      return 0;
   2992                } else {
   2993                   int ha = z->img_comp[n].ha;
   2994                   if (!stbi__jpeg_decode_block_prog_ac(z, data, &z->huff_ac[ha], z->fast_ac[ha]))
   2995                      return 0;
   2996                }
   2997                // every data block is an MCU, so countdown the restart interval
   2998                if (--z->todo <= 0) {
   2999                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
   3000                   if (!STBI__RESTART(z->marker)) return 1;
   3001                   stbi__jpeg_reset(z);
   3002                }
   3003             }
   3004          }
   3005          return 1;
   3006       } else { // interleaved
   3007          int i,j,k,x,y;
   3008          for (j=0; j < z->img_mcu_y; ++j) {
   3009             for (i=0; i < z->img_mcu_x; ++i) {
   3010                // scan an interleaved mcu... process scan_n components in order
   3011                for (k=0; k < z->scan_n; ++k) {
   3012                   int n = z->order[k];
   3013                   // scan out an mcu's worth of this component; that's just determined
   3014                   // by the basic H and V specified for the component
   3015                   for (y=0; y < z->img_comp[n].v; ++y) {
   3016                      for (x=0; x < z->img_comp[n].h; ++x) {
   3017                         int x2 = (i*z->img_comp[n].h + x);
   3018                         int y2 = (j*z->img_comp[n].v + y);
   3019                         short *data = z->img_comp[n].coeff + 64 * (x2 + y2 * z->img_comp[n].coeff_w);
   3020                         if (!stbi__jpeg_decode_block_prog_dc(z, data, &z->huff_dc[z->img_comp[n].hd], n))
   3021                            return 0;
   3022                      }
   3023                   }
   3024                }
   3025                // after all interleaved components, that's an interleaved MCU,
   3026                // so now count down the restart interval
   3027                if (--z->todo <= 0) {
   3028                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
   3029                   if (!STBI__RESTART(z->marker)) return 1;
   3030                   stbi__jpeg_reset(z);
   3031                }
   3032             }
   3033          }
   3034          return 1;
   3035       }
   3036    }
   3037 }
   3038 
   3039 static void stbi__jpeg_dequantize(short *data, stbi__uint16 *dequant)
   3040 {
   3041    int i;
   3042    for (i=0; i < 64; ++i)
   3043       data[i] *= dequant[i];
   3044 }
   3045 
   3046 static void stbi__jpeg_finish(stbi__jpeg *z)
   3047 {
   3048    if (z->progressive) {
   3049       // dequantize and idct the data
   3050       int i,j,n;
   3051       for (n=0; n < z->s->img_n; ++n) {
   3052          int w = (z->img_comp[n].x+7) >> 3;
   3053          int h = (z->img_comp[n].y+7) >> 3;
   3054          for (j=0; j < h; ++j) {
   3055             for (i=0; i < w; ++i) {
   3056                short *data = z->img_comp[n].coeff + 64 * (i + j * z->img_comp[n].coeff_w);
   3057                stbi__jpeg_dequantize(data, z->dequant[z->img_comp[n].tq]);
   3058                z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*j*8+i*8, z->img_comp[n].w2, data);
   3059             }
   3060          }
   3061       }
   3062    }
   3063 }
   3064 
   3065 static int stbi__process_marker(stbi__jpeg *z, int m)
   3066 {
   3067    int L;
   3068    switch (m) {
   3069       case STBI__MARKER_none: // no marker found
   3070          return stbi__err("expected marker","Corrupt JPEG");
   3071 
   3072       case 0xDD: // DRI - specify restart interval
   3073          if (stbi__get16be(z->s) != 4) return stbi__err("bad DRI len","Corrupt JPEG");
   3074          z->restart_interval = stbi__get16be(z->s);
   3075          return 1;
   3076 
   3077       case 0xDB: // DQT - define quantization table
   3078          L = stbi__get16be(z->s)-2;
   3079          while (L > 0) {
   3080             int q = stbi__get8(z->s);
   3081             int p = q >> 4, sixteen = (p != 0);
   3082             int t = q & 15,i;
   3083             if (p != 0 && p != 1) return stbi__err("bad DQT type","Corrupt JPEG");
   3084             if (t > 3) return stbi__err("bad DQT table","Corrupt JPEG");
   3085 
   3086             for (i=0; i < 64; ++i)
   3087                z->dequant[t][stbi__jpeg_dezigzag[i]] = (stbi__uint16)(sixteen ? stbi__get16be(z->s) : stbi__get8(z->s));
   3088             L -= (sixteen ? 129 : 65);
   3089          }
   3090          return L==0;
   3091 
   3092       case 0xC4: // DHT - define huffman table
   3093          L = stbi__get16be(z->s)-2;
   3094          while (L > 0) {
   3095             stbi_uc *v;
   3096             int sizes[16],i,n=0;
   3097             int q = stbi__get8(z->s);
   3098             int tc = q >> 4;
   3099             int th = q & 15;
   3100             if (tc > 1 || th > 3) return stbi__err("bad DHT header","Corrupt JPEG");
   3101             for (i=0; i < 16; ++i) {
   3102                sizes[i] = stbi__get8(z->s);
   3103                n += sizes[i];
   3104             }
   3105             L -= 17;
   3106             if (tc == 0) {
   3107                if (!stbi__build_huffman(z->huff_dc+th, sizes)) return 0;
   3108                v = z->huff_dc[th].values;
   3109             } else {
   3110                if (!stbi__build_huffman(z->huff_ac+th, sizes)) return 0;
   3111                v = z->huff_ac[th].values;
   3112             }
   3113             for (i=0; i < n; ++i)
   3114                v[i] = stbi__get8(z->s);
   3115             if (tc != 0)
   3116                stbi__build_fast_ac(z->fast_ac[th], z->huff_ac + th);
   3117             L -= n;
   3118          }
   3119          return L==0;
   3120    }
   3121 
   3122    // check for comment block or APP blocks
   3123    if ((m >= 0xE0 && m <= 0xEF) || m == 0xFE) {
   3124       L = stbi__get16be(z->s);
   3125       if (L < 2) {
   3126          if (m == 0xFE)
   3127             return stbi__err("bad COM len","Corrupt JPEG");
   3128          else
   3129             return stbi__err("bad APP len","Corrupt JPEG");
   3130       }
   3131       L -= 2;
   3132 
   3133       if (m == 0xE0 && L >= 5) { // JFIF APP0 segment
   3134          static const unsigned char tag[5] = {'J','F','I','F','\0'};
   3135          int ok = 1;
   3136          int i;
   3137          for (i=0; i < 5; ++i)
   3138             if (stbi__get8(z->s) != tag[i])
   3139                ok = 0;
   3140          L -= 5;
   3141          if (ok)
   3142             z->jfif = 1;
   3143       } else if (m == 0xEE && L >= 12) { // Adobe APP14 segment
   3144          static const unsigned char tag[6] = {'A','d','o','b','e','\0'};
   3145          int ok = 1;
   3146          int i;
   3147          for (i=0; i < 6; ++i)
   3148             if (stbi__get8(z->s) != tag[i])
   3149                ok = 0;
   3150          L -= 6;
   3151          if (ok) {
   3152             stbi__get8(z->s); // version
   3153             stbi__get16be(z->s); // flags0
   3154             stbi__get16be(z->s); // flags1
   3155             z->app14_color_transform = stbi__get8(z->s); // color transform
   3156             L -= 6;
   3157          }
   3158       }
   3159 
   3160       stbi__skip(z->s, L);
   3161       return 1;
   3162    }
   3163 
   3164    return stbi__err("unknown marker","Corrupt JPEG");
   3165 }
   3166 
   3167 // after we see SOS
   3168 static int stbi__process_scan_header(stbi__jpeg *z)
   3169 {
   3170    int i;
   3171    int Ls = stbi__get16be(z->s);
   3172    z->scan_n = stbi__get8(z->s);
   3173    if (z->scan_n < 1 || z->scan_n > 4 || z->scan_n > (int) z->s->img_n) return stbi__err("bad SOS component count","Corrupt JPEG");
   3174    if (Ls != 6+2*z->scan_n) return stbi__err("bad SOS len","Corrupt JPEG");
   3175    for (i=0; i < z->scan_n; ++i) {
   3176       int id = stbi__get8(z->s), which;
   3177       int q = stbi__get8(z->s);
   3178       for (which = 0; which < z->s->img_n; ++which)
   3179          if (z->img_comp[which].id == id)
   3180             break;
   3181       if (which == z->s->img_n) return 0; // no match
   3182       z->img_comp[which].hd = q >> 4;   if (z->img_comp[which].hd > 3) return stbi__err("bad DC huff","Corrupt JPEG");
   3183       z->img_comp[which].ha = q & 15;   if (z->img_comp[which].ha > 3) return stbi__err("bad AC huff","Corrupt JPEG");
   3184       z->order[i] = which;
   3185    }
   3186 
   3187    {
   3188       int aa;
   3189       z->spec_start = stbi__get8(z->s);
   3190       z->spec_end   = stbi__get8(z->s); // should be 63, but might be 0
   3191       aa = stbi__get8(z->s);
   3192       z->succ_high = (aa >> 4);
   3193       z->succ_low  = (aa & 15);
   3194       if (z->progressive) {
   3195          if (z->spec_start > 63 || z->spec_end > 63  || z->spec_start > z->spec_end || z->succ_high > 13 || z->succ_low > 13)
   3196             return stbi__err("bad SOS", "Corrupt JPEG");
   3197       } else {
   3198          if (z->spec_start != 0) return stbi__err("bad SOS","Corrupt JPEG");
   3199          if (z->succ_high != 0 || z->succ_low != 0) return stbi__err("bad SOS","Corrupt JPEG");
   3200          z->spec_end = 63;
   3201       }
   3202    }
   3203 
   3204    return 1;
   3205 }
   3206 
   3207 static int stbi__free_jpeg_components(stbi__jpeg *z, int ncomp, int why)
   3208 {
   3209    int i;
   3210    for (i=0; i < ncomp; ++i) {
   3211       if (z->img_comp[i].raw_data) {
   3212          STBI_FREE(z->img_comp[i].raw_data);
   3213          z->img_comp[i].raw_data = NULL;
   3214          z->img_comp[i].data = NULL;
   3215       }
   3216       if (z->img_comp[i].raw_coeff) {
   3217          STBI_FREE(z->img_comp[i].raw_coeff);
   3218          z->img_comp[i].raw_coeff = 0;
   3219          z->img_comp[i].coeff = 0;
   3220       }
   3221       if (z->img_comp[i].linebuf) {
   3222          STBI_FREE(z->img_comp[i].linebuf);
   3223          z->img_comp[i].linebuf = NULL;
   3224       }
   3225    }
   3226    return why;
   3227 }
   3228 
   3229 static int stbi__process_frame_header(stbi__jpeg *z, int scan)
   3230 {
   3231    stbi__context *s = z->s;
   3232    int Lf,p,i,q, h_max=1,v_max=1,c;
   3233    Lf = stbi__get16be(s);         if (Lf < 11) return stbi__err("bad SOF len","Corrupt JPEG"); // JPEG
   3234    p  = stbi__get8(s);            if (p != 8) return stbi__err("only 8-bit","JPEG format not supported: 8-bit only"); // JPEG baseline
   3235    s->img_y = stbi__get16be(s);   if (s->img_y == 0) return stbi__err("no header height", "JPEG format not supported: delayed height"); // Legal, but we don't handle it--but neither does IJG
   3236    s->img_x = stbi__get16be(s);   if (s->img_x == 0) return stbi__err("0 width","Corrupt JPEG"); // JPEG requires
   3237    if (s->img_y > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
   3238    if (s->img_x > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
   3239    c = stbi__get8(s);
   3240    if (c != 3 && c != 1 && c != 4) return stbi__err("bad component count","Corrupt JPEG");
   3241    s->img_n = c;
   3242    for (i=0; i < c; ++i) {
   3243       z->img_comp[i].data = NULL;
   3244       z->img_comp[i].linebuf = NULL;
   3245    }
   3246 
   3247    if (Lf != 8+3*s->img_n) return stbi__err("bad SOF len","Corrupt JPEG");
   3248 
   3249    z->rgb = 0;
   3250    for (i=0; i < s->img_n; ++i) {
   3251       static const unsigned char rgb[3] = { 'R', 'G', 'B' };
   3252       z->img_comp[i].id = stbi__get8(s);
   3253       if (s->img_n == 3 && z->img_comp[i].id == rgb[i])
   3254          ++z->rgb;
   3255       q = stbi__get8(s);
   3256       z->img_comp[i].h = (q >> 4);  if (!z->img_comp[i].h || z->img_comp[i].h > 4) return stbi__err("bad H","Corrupt JPEG");
   3257       z->img_comp[i].v = q & 15;    if (!z->img_comp[i].v || z->img_comp[i].v > 4) return stbi__err("bad V","Corrupt JPEG");
   3258       z->img_comp[i].tq = stbi__get8(s);  if (z->img_comp[i].tq > 3) return stbi__err("bad TQ","Corrupt JPEG");
   3259    }
   3260 
   3261    if (scan != STBI__SCAN_load) return 1;
   3262 
   3263    if (!stbi__mad3sizes_valid(s->img_x, s->img_y, s->img_n, 0)) return stbi__err("too large", "Image too large to decode");
   3264 
   3265    for (i=0; i < s->img_n; ++i) {
   3266       if (z->img_comp[i].h > h_max) h_max = z->img_comp[i].h;
   3267       if (z->img_comp[i].v > v_max) v_max = z->img_comp[i].v;
   3268    }
   3269 
   3270    // check that plane subsampling factors are integer ratios; our resamplers can't deal with fractional ratios
   3271    // and I've never seen a non-corrupted JPEG file actually use them
   3272    for (i=0; i < s->img_n; ++i) {
   3273       if (h_max % z->img_comp[i].h != 0) return stbi__err("bad H","Corrupt JPEG");
   3274       if (v_max % z->img_comp[i].v != 0) return stbi__err("bad V","Corrupt JPEG");
   3275    }
   3276 
   3277    // compute interleaved mcu info
   3278    z->img_h_max = h_max;
   3279    z->img_v_max = v_max;
   3280    z->img_mcu_w = h_max * 8;
   3281    z->img_mcu_h = v_max * 8;
   3282    // these sizes can't be more than 17 bits
   3283    z->img_mcu_x = (s->img_x + z->img_mcu_w-1) / z->img_mcu_w;
   3284    z->img_mcu_y = (s->img_y + z->img_mcu_h-1) / z->img_mcu_h;
   3285 
   3286    for (i=0; i < s->img_n; ++i) {
   3287       // number of effective pixels (e.g. for non-interleaved MCU)
   3288       z->img_comp[i].x = (s->img_x * z->img_comp[i].h + h_max-1) / h_max;
   3289       z->img_comp[i].y = (s->img_y * z->img_comp[i].v + v_max-1) / v_max;
   3290       // to simplify generation, we'll allocate enough memory to decode
   3291       // the bogus oversized data from using interleaved MCUs and their
   3292       // big blocks (e.g. a 16x16 iMCU on an image of width 33); we won't
   3293       // discard the extra data until colorspace conversion
   3294       //
   3295       // img_mcu_x, img_mcu_y: <=17 bits; comp[i].h and .v are <=4 (checked earlier)
   3296       // so these muls can't overflow with 32-bit ints (which we require)
   3297       z->img_comp[i].w2 = z->img_mcu_x * z->img_comp[i].h * 8;
   3298       z->img_comp[i].h2 = z->img_mcu_y * z->img_comp[i].v * 8;
   3299       z->img_comp[i].coeff = 0;
   3300       z->img_comp[i].raw_coeff = 0;
   3301       z->img_comp[i].linebuf = NULL;
   3302       z->img_comp[i].raw_data = stbi__malloc_mad2(z->img_comp[i].w2, z->img_comp[i].h2, 15);
   3303       if (z->img_comp[i].raw_data == NULL)
   3304          return stbi__free_jpeg_components(z, i+1, stbi__err("outofmem", "Out of memory"));
   3305       // align blocks for idct using mmx/sse
   3306       z->img_comp[i].data = (stbi_uc*) (((size_t) z->img_comp[i].raw_data + 15) & ~15);
   3307       if (z->progressive) {
   3308          // w2, h2 are multiples of 8 (see above)
   3309          z->img_comp[i].coeff_w = z->img_comp[i].w2 / 8;
   3310          z->img_comp[i].coeff_h = z->img_comp[i].h2 / 8;
   3311          z->img_comp[i].raw_coeff = stbi__malloc_mad3(z->img_comp[i].w2, z->img_comp[i].h2, sizeof(short), 15);
   3312          if (z->img_comp[i].raw_coeff == NULL)
   3313             return stbi__free_jpeg_components(z, i+1, stbi__err("outofmem", "Out of memory"));
   3314          z->img_comp[i].coeff = (short*) (((size_t) z->img_comp[i].raw_coeff + 15) & ~15);
   3315       }
   3316    }
   3317 
   3318    return 1;
   3319 }
   3320 
   3321 // use comparisons since in some cases we handle more than one case (e.g. SOF)
   3322 #define stbi__DNL(x)         ((x) == 0xdc)
   3323 #define stbi__SOI(x)         ((x) == 0xd8)
   3324 #define stbi__EOI(x)         ((x) == 0xd9)
   3325 #define stbi__SOF(x)         ((x) == 0xc0 || (x) == 0xc1 || (x) == 0xc2)
   3326 #define stbi__SOS(x)         ((x) == 0xda)
   3327 
   3328 #define stbi__SOF_progressive(x)   ((x) == 0xc2)
   3329 
   3330 static int stbi__decode_jpeg_header(stbi__jpeg *z, int scan)
   3331 {
   3332    int m;
   3333    z->jfif = 0;
   3334    z->app14_color_transform = -1; // valid values are 0,1,2
   3335    z->marker = STBI__MARKER_none; // initialize cached marker to empty
   3336    m = stbi__get_marker(z);
   3337    if (!stbi__SOI(m)) return stbi__err("no SOI","Corrupt JPEG");
   3338    if (scan == STBI__SCAN_type) return 1;
   3339    m = stbi__get_marker(z);
   3340    while (!stbi__SOF(m)) {
   3341       if (!stbi__process_marker(z,m)) return 0;
   3342       m = stbi__get_marker(z);
   3343       while (m == STBI__MARKER_none) {
   3344          // some files have extra padding after their blocks, so ok, we'll scan
   3345          if (stbi__at_eof(z->s)) return stbi__err("no SOF", "Corrupt JPEG");
   3346          m = stbi__get_marker(z);
   3347       }
   3348    }
   3349    z->progressive = stbi__SOF_progressive(m);
   3350    if (!stbi__process_frame_header(z, scan)) return 0;
   3351    return 1;
   3352 }
   3353 
   3354 // decode image to YCbCr format
   3355 static int stbi__decode_jpeg_image(stbi__jpeg *j)
   3356 {
   3357    int m;
   3358    for (m = 0; m < 4; m++) {
   3359       j->img_comp[m].raw_data = NULL;
   3360       j->img_comp[m].raw_coeff = NULL;
   3361    }
   3362    j->restart_interval = 0;
   3363    if (!stbi__decode_jpeg_header(j, STBI__SCAN_load)) return 0;
   3364    m = stbi__get_marker(j);
   3365    while (!stbi__EOI(m)) {
   3366       if (stbi__SOS(m)) {
   3367          if (!stbi__process_scan_header(j)) return 0;
   3368          if (!stbi__parse_entropy_coded_data(j)) return 0;
   3369          if (j->marker == STBI__MARKER_none ) {
   3370             // handle 0s at the end of image data from IP Kamera 9060
   3371             while (!stbi__at_eof(j->s)) {
   3372                int x = stbi__get8(j->s);
   3373                if (x == 255) {
   3374                   j->marker = stbi__get8(j->s);
   3375                   break;
   3376                }
   3377             }
   3378             // if we reach eof without hitting a marker, stbi__get_marker() below will fail and we'll eventually return 0
   3379          }
   3380       } else if (stbi__DNL(m)) {
   3381          int Ld = stbi__get16be(j->s);
   3382          stbi__uint32 NL = stbi__get16be(j->s);
   3383          if (Ld != 4) return stbi__err("bad DNL len", "Corrupt JPEG");
   3384          if (NL != j->s->img_y) return stbi__err("bad DNL height", "Corrupt JPEG");
   3385       } else {
   3386          if (!stbi__process_marker(j, m)) return 0;
   3387       }
   3388       m = stbi__get_marker(j);
   3389    }
   3390    if (j->progressive)
   3391       stbi__jpeg_finish(j);
   3392    return 1;
   3393 }
   3394 
   3395 // static jfif-centered resampling (across block boundaries)
   3396 
   3397 typedef stbi_uc *(*resample_row_func)(stbi_uc *out, stbi_uc *in0, stbi_uc *in1,
   3398                                     int w, int hs);
   3399 
   3400 #define stbi__div4(x) ((stbi_uc) ((x) >> 2))
   3401 
   3402 static stbi_uc *resample_row_1(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
   3403 {
   3404    STBI_NOTUSED(out);
   3405    STBI_NOTUSED(in_far);
   3406    STBI_NOTUSED(w);
   3407    STBI_NOTUSED(hs);
   3408    return in_near;
   3409 }
   3410 
   3411 static stbi_uc* stbi__resample_row_v_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
   3412 {
   3413    // need to generate two samples vertically for every one in input
   3414    int i;
   3415    STBI_NOTUSED(hs);
   3416    for (i=0; i < w; ++i)
   3417       out[i] = stbi__div4(3*in_near[i] + in_far[i] + 2);
   3418    return out;
   3419 }
   3420 
   3421 static stbi_uc*  stbi__resample_row_h_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
   3422 {
   3423    // need to generate two samples horizontally for every one in input
   3424    int i;
   3425    stbi_uc *input = in_near;
   3426 
   3427    if (w == 1) {
   3428       // if only one sample, can't do any interpolation
   3429       out[0] = out[1] = input[0];
   3430       return out;
   3431    }
   3432 
   3433    out[0] = input[0];
   3434    out[1] = stbi__div4(input[0]*3 + input[1] + 2);
   3435    for (i=1; i < w-1; ++i) {
   3436       int n = 3*input[i]+2;
   3437       out[i*2+0] = stbi__div4(n+input[i-1]);
   3438       out[i*2+1] = stbi__div4(n+input[i+1]);
   3439    }
   3440    out[i*2+0] = stbi__div4(input[w-2]*3 + input[w-1] + 2);
   3441    out[i*2+1] = input[w-1];
   3442 
   3443    STBI_NOTUSED(in_far);
   3444    STBI_NOTUSED(hs);
   3445 
   3446    return out;
   3447 }
   3448 
   3449 #define stbi__div16(x) ((stbi_uc) ((x) >> 4))
   3450 
   3451 static stbi_uc *stbi__resample_row_hv_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
   3452 {
   3453    // need to generate 2x2 samples for every one in input
   3454    int i,t0,t1;
   3455    if (w == 1) {
   3456       out[0] = out[1] = stbi__div4(3*in_near[0] + in_far[0] + 2);
   3457       return out;
   3458    }
   3459 
   3460    t1 = 3*in_near[0] + in_far[0];
   3461    out[0] = stbi__div4(t1+2);
   3462    for (i=1; i < w; ++i) {
   3463       t0 = t1;
   3464       t1 = 3*in_near[i]+in_far[i];
   3465       out[i*2-1] = stbi__div16(3*t0 + t1 + 8);
   3466       out[i*2  ] = stbi__div16(3*t1 + t0 + 8);
   3467    }
   3468    out[w*2-1] = stbi__div4(t1+2);
   3469 
   3470    STBI_NOTUSED(hs);
   3471 
   3472    return out;
   3473 }
   3474 
   3475 #if defined(STBI_SSE2) || defined(STBI_NEON)
   3476 static stbi_uc *stbi__resample_row_hv_2_simd(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
   3477 {
   3478    // need to generate 2x2 samples for every one in input
   3479    int i=0,t0,t1;
   3480 
   3481    if (w == 1) {
   3482       out[0] = out[1] = stbi__div4(3*in_near[0] + in_far[0] + 2);
   3483       return out;
   3484    }
   3485 
   3486    t1 = 3*in_near[0] + in_far[0];
   3487    // process groups of 8 pixels for as long as we can.
   3488    // note we can't handle the last pixel in a row in this loop
   3489    // because we need to handle the filter boundary conditions.
   3490    for (; i < ((w-1) & ~7); i += 8) {
   3491 #if defined(STBI_SSE2)
   3492       // load and perform the vertical filtering pass
   3493       // this uses 3*x + y = 4*x + (y - x)
   3494       __m128i zero  = _mm_setzero_si128();
   3495       __m128i farb  = _mm_loadl_epi64((__m128i *) (in_far + i));
   3496       __m128i nearb = _mm_loadl_epi64((__m128i *) (in_near + i));
   3497       __m128i farw  = _mm_unpacklo_epi8(farb, zero);
   3498       __m128i nearw = _mm_unpacklo_epi8(nearb, zero);
   3499       __m128i diff  = _mm_sub_epi16(farw, nearw);
   3500       __m128i nears = _mm_slli_epi16(nearw, 2);
   3501       __m128i curr  = _mm_add_epi16(nears, diff); // current row
   3502 
   3503       // horizontal filter works the same based on shifted vers of current
   3504       // row. "prev" is current row shifted right by 1 pixel; we need to
   3505       // insert the previous pixel value (from t1).
   3506       // "next" is current row shifted left by 1 pixel, with first pixel
   3507       // of next block of 8 pixels added in.
   3508       __m128i prv0 = _mm_slli_si128(curr, 2);
   3509       __m128i nxt0 = _mm_srli_si128(curr, 2);
   3510       __m128i prev = _mm_insert_epi16(prv0, t1, 0);
   3511       __m128i next = _mm_insert_epi16(nxt0, 3*in_near[i+8] + in_far[i+8], 7);
   3512 
   3513       // horizontal filter, polyphase implementation since it's convenient:
   3514       // even pixels = 3*cur + prev = cur*4 + (prev - cur)
   3515       // odd  pixels = 3*cur + next = cur*4 + (next - cur)
   3516       // note the shared term.
   3517       __m128i bias  = _mm_set1_epi16(8);
   3518       __m128i curs = _mm_slli_epi16(curr, 2);
   3519       __m128i prvd = _mm_sub_epi16(prev, curr);
   3520       __m128i nxtd = _mm_sub_epi16(next, curr);
   3521       __m128i curb = _mm_add_epi16(curs, bias);
   3522       __m128i even = _mm_add_epi16(prvd, curb);
   3523       __m128i odd  = _mm_add_epi16(nxtd, curb);
   3524 
   3525       // interleave even and odd pixels, then undo scaling.
   3526       __m128i int0 = _mm_unpacklo_epi16(even, odd);
   3527       __m128i int1 = _mm_unpackhi_epi16(even, odd);
   3528       __m128i de0  = _mm_srli_epi16(int0, 4);
   3529       __m128i de1  = _mm_srli_epi16(int1, 4);
   3530 
   3531       // pack and write output
   3532       __m128i outv = _mm_packus_epi16(de0, de1);
   3533       _mm_storeu_si128((__m128i *) (out + i*2), outv);
   3534 #elif defined(STBI_NEON)
   3535       // load and perform the vertical filtering pass
   3536       // this uses 3*x + y = 4*x + (y - x)
   3537       uint8x8_t farb  = vld1_u8(in_far + i);
   3538       uint8x8_t nearb = vld1_u8(in_near + i);
   3539       int16x8_t diff  = vreinterpretq_s16_u16(vsubl_u8(farb, nearb));
   3540       int16x8_t nears = vreinterpretq_s16_u16(vshll_n_u8(nearb, 2));
   3541       int16x8_t curr  = vaddq_s16(nears, diff); // current row
   3542 
   3543       // horizontal filter works the same based on shifted vers of current
   3544       // row. "prev" is current row shifted right by 1 pixel; we need to
   3545       // insert the previous pixel value (from t1).
   3546       // "next" is current row shifted left by 1 pixel, with first pixel
   3547       // of next block of 8 pixels added in.
   3548       int16x8_t prv0 = vextq_s16(curr, curr, 7);
   3549       int16x8_t nxt0 = vextq_s16(curr, curr, 1);
   3550       int16x8_t prev = vsetq_lane_s16(t1, prv0, 0);
   3551       int16x8_t next = vsetq_lane_s16(3*in_near[i+8] + in_far[i+8], nxt0, 7);
   3552 
   3553       // horizontal filter, polyphase implementation since it's convenient:
   3554       // even pixels = 3*cur + prev = cur*4 + (prev - cur)
   3555       // odd  pixels = 3*cur + next = cur*4 + (next - cur)
   3556       // note the shared term.
   3557       int16x8_t curs = vshlq_n_s16(curr, 2);
   3558       int16x8_t prvd = vsubq_s16(prev, curr);
   3559       int16x8_t nxtd = vsubq_s16(next, curr);
   3560       int16x8_t even = vaddq_s16(curs, prvd);
   3561       int16x8_t odd  = vaddq_s16(curs, nxtd);
   3562 
   3563       // undo scaling and round, then store with even/odd phases interleaved
   3564       uint8x8x2_t o;
   3565       o.val[0] = vqrshrun_n_s16(even, 4);
   3566       o.val[1] = vqrshrun_n_s16(odd,  4);
   3567       vst2_u8(out + i*2, o);
   3568 #endif
   3569 
   3570       // "previous" value for next iter
   3571       t1 = 3*in_near[i+7] + in_far[i+7];
   3572    }
   3573 
   3574    t0 = t1;
   3575    t1 = 3*in_near[i] + in_far[i];
   3576    out[i*2] = stbi__div16(3*t1 + t0 + 8);
   3577 
   3578    for (++i; i < w; ++i) {
   3579       t0 = t1;
   3580       t1 = 3*in_near[i]+in_far[i];
   3581       out[i*2-1] = stbi__div16(3*t0 + t1 + 8);
   3582       out[i*2  ] = stbi__div16(3*t1 + t0 + 8);
   3583    }
   3584    out[w*2-1] = stbi__div4(t1+2);
   3585 
   3586    STBI_NOTUSED(hs);
   3587 
   3588    return out;
   3589 }
   3590 #endif
   3591 
   3592 static stbi_uc *stbi__resample_row_generic(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
   3593 {
   3594    // resample with nearest-neighbor
   3595    int i,j;
   3596    STBI_NOTUSED(in_far);
   3597    for (i=0; i < w; ++i)
   3598       for (j=0; j < hs; ++j)
   3599          out[i*hs+j] = in_near[i];
   3600    return out;
   3601 }
   3602 
   3603 // this is a reduced-precision calculation of YCbCr-to-RGB introduced
   3604 // to make sure the code produces the same results in both SIMD and scalar
   3605 #define stbi__float2fixed(x)  (((int) ((x) * 4096.0f + 0.5f)) << 8)
   3606 static void stbi__YCbCr_to_RGB_row(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step)
   3607 {
   3608    int i;
   3609    for (i=0; i < count; ++i) {
   3610       int y_fixed = (y[i] << 20) + (1<<19); // rounding
   3611       int r,g,b;
   3612       int cr = pcr[i] - 128;
   3613       int cb = pcb[i] - 128;
   3614       r = y_fixed +  cr* stbi__float2fixed(1.40200f);
   3615       g = y_fixed + (cr*-stbi__float2fixed(0.71414f)) + ((cb*-stbi__float2fixed(0.34414f)) & 0xffff0000);
   3616       b = y_fixed                                     +   cb* stbi__float2fixed(1.77200f);
   3617       r >>= 20;
   3618       g >>= 20;
   3619       b >>= 20;
   3620       if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
   3621       if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
   3622       if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
   3623       out[0] = (stbi_uc)r;
   3624       out[1] = (stbi_uc)g;
   3625       out[2] = (stbi_uc)b;
   3626       out[3] = 255;
   3627       out += step;
   3628    }
   3629 }
   3630 
   3631 #if defined(STBI_SSE2) || defined(STBI_NEON)
   3632 static void stbi__YCbCr_to_RGB_simd(stbi_uc *out, stbi_uc const *y, stbi_uc const *pcb, stbi_uc const *pcr, int count, int step)
   3633 {
   3634    int i = 0;
   3635 
   3636 #ifdef STBI_SSE2
   3637    // step == 3 is pretty ugly on the final interleave, and i'm not convinced
   3638    // it's useful in practice (you wouldn't use it for textures, for example).
   3639    // so just accelerate step == 4 case.
   3640    if (step == 4) {
   3641       // this is a fairly straightforward implementation and not super-optimized.
   3642       __m128i signflip  = _mm_set1_epi8(-0x80);
   3643       __m128i cr_const0 = _mm_set1_epi16(   (short) ( 1.40200f*4096.0f+0.5f));
   3644       __m128i cr_const1 = _mm_set1_epi16( - (short) ( 0.71414f*4096.0f+0.5f));
   3645       __m128i cb_const0 = _mm_set1_epi16( - (short) ( 0.34414f*4096.0f+0.5f));
   3646       __m128i cb_const1 = _mm_set1_epi16(   (short) ( 1.77200f*4096.0f+0.5f));
   3647       __m128i y_bias = _mm_set1_epi8((char) (unsigned char) 128);
   3648       __m128i xw = _mm_set1_epi16(255); // alpha channel
   3649 
   3650       for (; i+7 < count; i += 8) {
   3651          // load
   3652          __m128i y_bytes = _mm_loadl_epi64((__m128i *) (y+i));
   3653          __m128i cr_bytes = _mm_loadl_epi64((__m128i *) (pcr+i));
   3654          __m128i cb_bytes = _mm_loadl_epi64((__m128i *) (pcb+i));
   3655          __m128i cr_biased = _mm_xor_si128(cr_bytes, signflip); // -128
   3656          __m128i cb_biased = _mm_xor_si128(cb_bytes, signflip); // -128
   3657 
   3658          // unpack to short (and left-shift cr, cb by 8)
   3659          __m128i yw  = _mm_unpacklo_epi8(y_bias, y_bytes);
   3660          __m128i crw = _mm_unpacklo_epi8(_mm_setzero_si128(), cr_biased);
   3661          __m128i cbw = _mm_unpacklo_epi8(_mm_setzero_si128(), cb_biased);
   3662 
   3663          // color transform
   3664          __m128i yws = _mm_srli_epi16(yw, 4);
   3665          __m128i cr0 = _mm_mulhi_epi16(cr_const0, crw);
   3666          __m128i cb0 = _mm_mulhi_epi16(cb_const0, cbw);
   3667          __m128i cb1 = _mm_mulhi_epi16(cbw, cb_const1);
   3668          __m128i cr1 = _mm_mulhi_epi16(crw, cr_const1);
   3669          __m128i rws = _mm_add_epi16(cr0, yws);
   3670          __m128i gwt = _mm_add_epi16(cb0, yws);
   3671          __m128i bws = _mm_add_epi16(yws, cb1);
   3672          __m128i gws = _mm_add_epi16(gwt, cr1);
   3673 
   3674          // descale
   3675          __m128i rw = _mm_srai_epi16(rws, 4);
   3676          __m128i bw = _mm_srai_epi16(bws, 4);
   3677          __m128i gw = _mm_srai_epi16(gws, 4);
   3678 
   3679          // back to byte, set up for transpose
   3680          __m128i brb = _mm_packus_epi16(rw, bw);
   3681          __m128i gxb = _mm_packus_epi16(gw, xw);
   3682 
   3683          // transpose to interleave channels
   3684          __m128i t0 = _mm_unpacklo_epi8(brb, gxb);
   3685          __m128i t1 = _mm_unpackhi_epi8(brb, gxb);
   3686          __m128i o0 = _mm_unpacklo_epi16(t0, t1);
   3687          __m128i o1 = _mm_unpackhi_epi16(t0, t1);
   3688 
   3689          // store
   3690          _mm_storeu_si128((__m128i *) (out + 0), o0);
   3691          _mm_storeu_si128((__m128i *) (out + 16), o1);
   3692          out += 32;
   3693       }
   3694    }
   3695 #endif
   3696 
   3697 #ifdef STBI_NEON
   3698    // in this version, step=3 support would be easy to add. but is there demand?
   3699    if (step == 4) {
   3700       // this is a fairly straightforward implementation and not super-optimized.
   3701       uint8x8_t signflip = vdup_n_u8(0x80);
   3702       int16x8_t cr_const0 = vdupq_n_s16(   (short) ( 1.40200f*4096.0f+0.5f));
   3703       int16x8_t cr_const1 = vdupq_n_s16( - (short) ( 0.71414f*4096.0f+0.5f));
   3704       int16x8_t cb_const0 = vdupq_n_s16( - (short) ( 0.34414f*4096.0f+0.5f));
   3705       int16x8_t cb_const1 = vdupq_n_s16(   (short) ( 1.77200f*4096.0f+0.5f));
   3706 
   3707       for (; i+7 < count; i += 8) {
   3708          // load
   3709          uint8x8_t y_bytes  = vld1_u8(y + i);
   3710          uint8x8_t cr_bytes = vld1_u8(pcr + i);
   3711          uint8x8_t cb_bytes = vld1_u8(pcb + i);
   3712          int8x8_t cr_biased = vreinterpret_s8_u8(vsub_u8(cr_bytes, signflip));
   3713          int8x8_t cb_biased = vreinterpret_s8_u8(vsub_u8(cb_bytes, signflip));
   3714 
   3715          // expand to s16
   3716          int16x8_t yws = vreinterpretq_s16_u16(vshll_n_u8(y_bytes, 4));
   3717          int16x8_t crw = vshll_n_s8(cr_biased, 7);
   3718          int16x8_t cbw = vshll_n_s8(cb_biased, 7);
   3719 
   3720          // color transform
   3721          int16x8_t cr0 = vqdmulhq_s16(crw, cr_const0);
   3722          int16x8_t cb0 = vqdmulhq_s16(cbw, cb_const0);
   3723          int16x8_t cr1 = vqdmulhq_s16(crw, cr_const1);
   3724          int16x8_t cb1 = vqdmulhq_s16(cbw, cb_const1);
   3725          int16x8_t rws = vaddq_s16(yws, cr0);
   3726          int16x8_t gws = vaddq_s16(vaddq_s16(yws, cb0), cr1);
   3727          int16x8_t bws = vaddq_s16(yws, cb1);
   3728 
   3729          // undo scaling, round, convert to byte
   3730          uint8x8x4_t o;
   3731          o.val[0] = vqrshrun_n_s16(rws, 4);
   3732          o.val[1] = vqrshrun_n_s16(gws, 4);
   3733          o.val[2] = vqrshrun_n_s16(bws, 4);
   3734          o.val[3] = vdup_n_u8(255);
   3735 
   3736          // store, interleaving r/g/b/a
   3737          vst4_u8(out, o);
   3738          out += 8*4;
   3739       }
   3740    }
   3741 #endif
   3742 
   3743    for (; i < count; ++i) {
   3744       int y_fixed = (y[i] << 20) + (1<<19); // rounding
   3745       int r,g,b;
   3746       int cr = pcr[i] - 128;
   3747       int cb = pcb[i] - 128;
   3748       r = y_fixed + cr* stbi__float2fixed(1.40200f);
   3749       g = y_fixed + cr*-stbi__float2fixed(0.71414f) + ((cb*-stbi__float2fixed(0.34414f)) & 0xffff0000);
   3750       b = y_fixed                                   +   cb* stbi__float2fixed(1.77200f);
   3751       r >>= 20;
   3752       g >>= 20;
   3753       b >>= 20;
   3754       if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
   3755       if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
   3756       if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
   3757       out[0] = (stbi_uc)r;
   3758       out[1] = (stbi_uc)g;
   3759       out[2] = (stbi_uc)b;
   3760       out[3] = 255;
   3761       out += step;
   3762    }
   3763 }
   3764 #endif
   3765 
   3766 // set up the kernels
   3767 static void stbi__setup_jpeg(stbi__jpeg *j)
   3768 {
   3769    j->idct_block_kernel = stbi__idct_block;
   3770    j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_row;
   3771    j->resample_row_hv_2_kernel = stbi__resample_row_hv_2;
   3772 
   3773 #ifdef STBI_SSE2
   3774    if (stbi__sse2_available()) {
   3775       j->idct_block_kernel = stbi__idct_simd;
   3776       j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd;
   3777       j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd;
   3778    }
   3779 #endif
   3780 
   3781 #ifdef STBI_NEON
   3782    j->idct_block_kernel = stbi__idct_simd;
   3783    j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd;
   3784    j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd;
   3785 #endif
   3786 }
   3787 
   3788 // clean up the temporary component buffers
   3789 static void stbi__cleanup_jpeg(stbi__jpeg *j)
   3790 {
   3791    stbi__free_jpeg_components(j, j->s->img_n, 0);
   3792 }
   3793 
   3794 typedef struct
   3795 {
   3796    resample_row_func resample;
   3797    stbi_uc *line0,*line1;
   3798    int hs,vs;   // expansion factor in each axis
   3799    int w_lores; // horizontal pixels pre-expansion
   3800    int ystep;   // how far through vertical expansion we are
   3801    int ypos;    // which pre-expansion row we're on
   3802 } stbi__resample;
   3803 
   3804 // fast 0..255 * 0..255 => 0..255 rounded multiplication
   3805 static stbi_uc stbi__blinn_8x8(stbi_uc x, stbi_uc y)
   3806 {
   3807    unsigned int t = x*y + 128;
   3808    return (stbi_uc) ((t + (t >>8)) >> 8);
   3809 }
   3810 
   3811 static stbi_uc *load_jpeg_image(stbi__jpeg *z, int *out_x, int *out_y, int *comp, int req_comp)
   3812 {
   3813    int n, decode_n, is_rgb;
   3814    z->s->img_n = 0; // make stbi__cleanup_jpeg safe
   3815 
   3816    // validate req_comp
   3817    if (req_comp < 0 || req_comp > 4) return stbi__errpuc("bad req_comp", "Internal error");
   3818 
   3819    // load a jpeg image from whichever source, but leave in YCbCr format
   3820    if (!stbi__decode_jpeg_image(z)) { stbi__cleanup_jpeg(z); return NULL; }
   3821 
   3822    // determine actual number of components to generate
   3823    n = req_comp ? req_comp : z->s->img_n >= 3 ? 3 : 1;
   3824 
   3825    is_rgb = z->s->img_n == 3 && (z->rgb == 3 || (z->app14_color_transform == 0 && !z->jfif));
   3826 
   3827    if (z->s->img_n == 3 && n < 3 && !is_rgb)
   3828       decode_n = 1;
   3829    else
   3830       decode_n = z->s->img_n;
   3831 
   3832    // nothing to do if no components requested; check this now to avoid
   3833    // accessing uninitialized coutput[0] later
   3834    if (decode_n <= 0) { stbi__cleanup_jpeg(z); return NULL; }
   3835 
   3836    // resample and color-convert
   3837    {
   3838       int k;
   3839       unsigned int i,j;
   3840       stbi_uc *output;
   3841       stbi_uc *coutput[4] = { NULL, NULL, NULL, NULL };
   3842 
   3843       stbi__resample res_comp[4];
   3844 
   3845       for (k=0; k < decode_n; ++k) {
   3846          stbi__resample *r = &res_comp[k];
   3847 
   3848          // allocate line buffer big enough for upsampling off the edges
   3849          // with upsample factor of 4
   3850          z->img_comp[k].linebuf = (stbi_uc *) stbi__malloc(z->s->img_x + 3);
   3851          if (!z->img_comp[k].linebuf) { stbi__cleanup_jpeg(z); return stbi__errpuc("outofmem", "Out of memory"); }
   3852 
   3853          r->hs      = z->img_h_max / z->img_comp[k].h;
   3854          r->vs      = z->img_v_max / z->img_comp[k].v;
   3855          r->ystep   = r->vs >> 1;
   3856          r->w_lores = (z->s->img_x + r->hs-1) / r->hs;
   3857          r->ypos    = 0;
   3858          r->line0   = r->line1 = z->img_comp[k].data;
   3859 
   3860          if      (r->hs == 1 && r->vs == 1) r->resample = resample_row_1;
   3861          else if (r->hs == 1 && r->vs == 2) r->resample = stbi__resample_row_v_2;
   3862          else if (r->hs == 2 && r->vs == 1) r->resample = stbi__resample_row_h_2;
   3863          else if (r->hs == 2 && r->vs == 2) r->resample = z->resample_row_hv_2_kernel;
   3864          else                               r->resample = stbi__resample_row_generic;
   3865       }
   3866 
   3867       // can't error after this so, this is safe
   3868       output = (stbi_uc *) stbi__malloc_mad3(n, z->s->img_x, z->s->img_y, 1);
   3869       if (!output) { stbi__cleanup_jpeg(z); return stbi__errpuc("outofmem", "Out of memory"); }
   3870 
   3871       // now go ahead and resample
   3872       for (j=0; j < z->s->img_y; ++j) {
   3873          stbi_uc *out = output + n * z->s->img_x * j;
   3874          for (k=0; k < decode_n; ++k) {
   3875             stbi__resample *r = &res_comp[k];
   3876             int y_bot = r->ystep >= (r->vs >> 1);
   3877             coutput[k] = r->resample(z->img_comp[k].linebuf,
   3878                                      y_bot ? r->line1 : r->line0,
   3879                                      y_bot ? r->line0 : r->line1,
   3880                                      r->w_lores, r->hs);
   3881             if (++r->ystep >= r->vs) {
   3882                r->ystep = 0;
   3883                r->line0 = r->line1;
   3884                if (++r->ypos < z->img_comp[k].y)
   3885                   r->line1 += z->img_comp[k].w2;
   3886             }
   3887          }
   3888          if (n >= 3) {
   3889             stbi_uc *y = coutput[0];
   3890             if (z->s->img_n == 3) {
   3891                if (is_rgb) {
   3892                   for (i=0; i < z->s->img_x; ++i) {
   3893                      out[0] = y[i];
   3894                      out[1] = coutput[1][i];
   3895                      out[2] = coutput[2][i];
   3896                      out[3] = 255;
   3897                      out += n;
   3898                   }
   3899                } else {
   3900                   z->YCbCr_to_RGB_kernel(out, y, coutput[1], coutput[2], z->s->img_x, n);
   3901                }
   3902             } else if (z->s->img_n == 4) {
   3903                if (z->app14_color_transform == 0) { // CMYK
   3904                   for (i=0; i < z->s->img_x; ++i) {
   3905                      stbi_uc m = coutput[3][i];
   3906                      out[0] = stbi__blinn_8x8(coutput[0][i], m);
   3907                      out[1] = stbi__blinn_8x8(coutput[1][i], m);
   3908                      out[2] = stbi__blinn_8x8(coutput[2][i], m);
   3909                      out[3] = 255;
   3910                      out += n;
   3911                   }
   3912                } else if (z->app14_color_transform == 2) { // YCCK
   3913                   z->YCbCr_to_RGB_kernel(out, y, coutput[1], coutput[2], z->s->img_x, n);
   3914                   for (i=0; i < z->s->img_x; ++i) {
   3915                      stbi_uc m = coutput[3][i];
   3916                      out[0] = stbi__blinn_8x8(255 - out[0], m);
   3917                      out[1] = stbi__blinn_8x8(255 - out[1], m);
   3918                      out[2] = stbi__blinn_8x8(255 - out[2], m);
   3919                      out += n;
   3920                   }
   3921                } else { // YCbCr + alpha?  Ignore the fourth channel for now
   3922                   z->YCbCr_to_RGB_kernel(out, y, coutput[1], coutput[2], z->s->img_x, n);
   3923                }
   3924             } else
   3925                for (i=0; i < z->s->img_x; ++i) {
   3926                   out[0] = out[1] = out[2] = y[i];
   3927                   out[3] = 255; // not used if n==3
   3928                   out += n;
   3929                }
   3930          } else {
   3931             if (is_rgb) {
   3932                if (n == 1)
   3933                   for (i=0; i < z->s->img_x; ++i)
   3934                      *out++ = stbi__compute_y(coutput[0][i], coutput[1][i], coutput[2][i]);
   3935                else {
   3936                   for (i=0; i < z->s->img_x; ++i, out += 2) {
   3937                      out[0] = stbi__compute_y(coutput[0][i], coutput[1][i], coutput[2][i]);
   3938                      out[1] = 255;
   3939                   }
   3940                }
   3941             } else if (z->s->img_n == 4 && z->app14_color_transform == 0) {
   3942                for (i=0; i < z->s->img_x; ++i) {
   3943                   stbi_uc m = coutput[3][i];
   3944                   stbi_uc r = stbi__blinn_8x8(coutput[0][i], m);
   3945                   stbi_uc g = stbi__blinn_8x8(coutput[1][i], m);
   3946                   stbi_uc b = stbi__blinn_8x8(coutput[2][i], m);
   3947                   out[0] = stbi__compute_y(r, g, b);
   3948                   out[1] = 255;
   3949                   out += n;
   3950                }
   3951             } else if (z->s->img_n == 4 && z->app14_color_transform == 2) {
   3952                for (i=0; i < z->s->img_x; ++i) {
   3953                   out[0] = stbi__blinn_8x8(255 - coutput[0][i], coutput[3][i]);
   3954                   out[1] = 255;
   3955                   out += n;
   3956                }
   3957             } else {
   3958                stbi_uc *y = coutput[0];
   3959                if (n == 1)
   3960                   for (i=0; i < z->s->img_x; ++i) out[i] = y[i];
   3961                else
   3962                   for (i=0; i < z->s->img_x; ++i) { *out++ = y[i]; *out++ = 255; }
   3963             }
   3964          }
   3965       }
   3966       stbi__cleanup_jpeg(z);
   3967       *out_x = z->s->img_x;
   3968       *out_y = z->s->img_y;
   3969       if (comp) *comp = z->s->img_n >= 3 ? 3 : 1; // report original components, not output
   3970       return output;
   3971    }
   3972 }
   3973 
   3974 static void *stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
   3975 {
   3976    unsigned char* result;
   3977    stbi__jpeg* j = (stbi__jpeg*) stbi__malloc(sizeof(stbi__jpeg));
   3978    if (!j) return stbi__errpuc("outofmem", "Out of memory");
   3979    STBI_NOTUSED(ri);
   3980    j->s = s;
   3981    stbi__setup_jpeg(j);
   3982    result = load_jpeg_image(j, x,y,comp,req_comp);
   3983    STBI_FREE(j);
   3984    return result;
   3985 }
   3986 
   3987 static int stbi__jpeg_test(stbi__context *s)
   3988 {
   3989    int r;
   3990    stbi__jpeg* j = (stbi__jpeg*)stbi__malloc(sizeof(stbi__jpeg));
   3991    if (!j) return stbi__err("outofmem", "Out of memory");
   3992    j->s = s;
   3993    stbi__setup_jpeg(j);
   3994    r = stbi__decode_jpeg_header(j, STBI__SCAN_type);
   3995    stbi__rewind(s);
   3996    STBI_FREE(j);
   3997    return r;
   3998 }
   3999 
   4000 static int stbi__jpeg_info_raw(stbi__jpeg *j, int *x, int *y, int *comp)
   4001 {
   4002    if (!stbi__decode_jpeg_header(j, STBI__SCAN_header)) {
   4003       stbi__rewind( j->s );
   4004       return 0;
   4005    }
   4006    if (x) *x = j->s->img_x;
   4007    if (y) *y = j->s->img_y;
   4008    if (comp) *comp = j->s->img_n >= 3 ? 3 : 1;
   4009    return 1;
   4010 }
   4011 
   4012 static int stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp)
   4013 {
   4014    int result;
   4015    stbi__jpeg* j = (stbi__jpeg*) (stbi__malloc(sizeof(stbi__jpeg)));
   4016    if (!j) return stbi__err("outofmem", "Out of memory");
   4017    j->s = s;
   4018    result = stbi__jpeg_info_raw(j, x, y, comp);
   4019    STBI_FREE(j);
   4020    return result;
   4021 }
   4022 #endif
   4023 
   4024 // public domain zlib decode    v0.2  Sean Barrett 2006-11-18
   4025 //    simple implementation
   4026 //      - all input must be provided in an upfront buffer
   4027 //      - all output is written to a single output buffer (can malloc/realloc)
   4028 //    performance
   4029 //      - fast huffman
   4030 
   4031 #ifndef STBI_NO_ZLIB
   4032 
   4033 // fast-way is faster to check than jpeg huffman, but slow way is slower
   4034 #define STBI__ZFAST_BITS  9 // accelerate all cases in default tables
   4035 #define STBI__ZFAST_MASK  ((1 << STBI__ZFAST_BITS) - 1)
   4036 #define STBI__ZNSYMS 288 // number of symbols in literal/length alphabet
   4037 
   4038 // zlib-style huffman encoding
   4039 // (jpegs packs from left, zlib from right, so can't share code)
   4040 typedef struct
   4041 {
   4042    stbi__uint16 fast[1 << STBI__ZFAST_BITS];
   4043    stbi__uint16 firstcode[16];
   4044    int maxcode[17];
   4045    stbi__uint16 firstsymbol[16];
   4046    stbi_uc  size[STBI__ZNSYMS];
   4047    stbi__uint16 value[STBI__ZNSYMS];
   4048 } stbi__zhuffman;
   4049 
   4050 stbi_inline static int stbi__bitreverse16(int n)
   4051 {
   4052   n = ((n & 0xAAAA) >>  1) | ((n & 0x5555) << 1);
   4053   n = ((n & 0xCCCC) >>  2) | ((n & 0x3333) << 2);
   4054   n = ((n & 0xF0F0) >>  4) | ((n & 0x0F0F) << 4);
   4055   n = ((n & 0xFF00) >>  8) | ((n & 0x00FF) << 8);
   4056   return n;
   4057 }
   4058 
   4059 stbi_inline static int stbi__bit_reverse(int v, int bits)
   4060 {
   4061    STBI_ASSERT(bits <= 16);
   4062    // to bit reverse n bits, reverse 16 and shift
   4063    // e.g. 11 bits, bit reverse and shift away 5
   4064    return stbi__bitreverse16(v) >> (16-bits);
   4065 }
   4066 
   4067 static int stbi__zbuild_huffman(stbi__zhuffman *z, const stbi_uc *sizelist, int num)
   4068 {
   4069    int i,k=0;
   4070    int code, next_code[16], sizes[17];
   4071 
   4072    // DEFLATE spec for generating codes
   4073    memset(sizes, 0, sizeof(sizes));
   4074    memset(z->fast, 0, sizeof(z->fast));
   4075    for (i=0; i < num; ++i)
   4076       ++sizes[sizelist[i]];
   4077    sizes[0] = 0;
   4078    for (i=1; i < 16; ++i)
   4079       if (sizes[i] > (1 << i))
   4080          return stbi__err("bad sizes", "Corrupt PNG");
   4081    code = 0;
   4082    for (i=1; i < 16; ++i) {
   4083       next_code[i] = code;
   4084       z->firstcode[i] = (stbi__uint16) code;
   4085       z->firstsymbol[i] = (stbi__uint16) k;
   4086       code = (code + sizes[i]);
   4087       if (sizes[i])
   4088          if (code-1 >= (1 << i)) return stbi__err("bad codelengths","Corrupt PNG");
   4089       z->maxcode[i] = code << (16-i); // preshift for inner loop
   4090       code <<= 1;
   4091       k += sizes[i];
   4092    }
   4093    z->maxcode[16] = 0x10000; // sentinel
   4094    for (i=0; i < num; ++i) {
   4095       int s = sizelist[i];
   4096       if (s) {
   4097          int c = next_code[s] - z->firstcode[s] + z->firstsymbol[s];
   4098          stbi__uint16 fastv = (stbi__uint16) ((s << 9) | i);
   4099          z->size [c] = (stbi_uc     ) s;
   4100          z->value[c] = (stbi__uint16) i;
   4101          if (s <= STBI__ZFAST_BITS) {
   4102             int j = stbi__bit_reverse(next_code[s],s);
   4103             while (j < (1 << STBI__ZFAST_BITS)) {
   4104                z->fast[j] = fastv;
   4105                j += (1 << s);
   4106             }
   4107          }
   4108          ++next_code[s];
   4109       }
   4110    }
   4111    return 1;
   4112 }
   4113 
   4114 // zlib-from-memory implementation for PNG reading
   4115 //    because PNG allows splitting the zlib stream arbitrarily,
   4116 //    and it's annoying structurally to have PNG call ZLIB call PNG,
   4117 //    we require PNG read all the IDATs and combine them into a single
   4118 //    memory buffer
   4119 
   4120 typedef struct
   4121 {
   4122    stbi_uc *zbuffer, *zbuffer_end;
   4123    int num_bits;
   4124    stbi__uint32 code_buffer;
   4125 
   4126    char *zout;
   4127    char *zout_start;
   4128    char *zout_end;
   4129    int   z_expandable;
   4130 
   4131    stbi__zhuffman z_length, z_distance;
   4132 } stbi__zbuf;
   4133 
   4134 stbi_inline static int stbi__zeof(stbi__zbuf *z)
   4135 {
   4136    return (z->zbuffer >= z->zbuffer_end);
   4137 }
   4138 
   4139 stbi_inline static stbi_uc stbi__zget8(stbi__zbuf *z)
   4140 {
   4141    return stbi__zeof(z) ? 0 : *z->zbuffer++;
   4142 }
   4143 
   4144 static void stbi__fill_bits(stbi__zbuf *z)
   4145 {
   4146    do {
   4147       if (z->code_buffer >= (1U << z->num_bits)) {
   4148         z->zbuffer = z->zbuffer_end;  /* treat this as EOF so we fail. */
   4149         return;
   4150       }
   4151       z->code_buffer |= (unsigned int) stbi__zget8(z) << z->num_bits;
   4152       z->num_bits += 8;
   4153    } while (z->num_bits <= 24);
   4154 }
   4155 
   4156 stbi_inline static unsigned int stbi__zreceive(stbi__zbuf *z, int n)
   4157 {
   4158    unsigned int k;
   4159    if (z->num_bits < n) stbi__fill_bits(z);
   4160    k = z->code_buffer & ((1 << n) - 1);
   4161    z->code_buffer >>= n;
   4162    z->num_bits -= n;
   4163    return k;
   4164 }
   4165 
   4166 static int stbi__zhuffman_decode_slowpath(stbi__zbuf *a, stbi__zhuffman *z)
   4167 {
   4168    int b,s,k;
   4169    // not resolved by fast table, so compute it the slow way
   4170    // use jpeg approach, which requires MSbits at top
   4171    k = stbi__bit_reverse(a->code_buffer, 16);
   4172    for (s=STBI__ZFAST_BITS+1; ; ++s)
   4173       if (k < z->maxcode[s])
   4174          break;
   4175    if (s >= 16) return -1; // invalid code!
   4176    // code size is s, so:
   4177    b = (k >> (16-s)) - z->firstcode[s] + z->firstsymbol[s];
   4178    if (b >= STBI__ZNSYMS) return -1; // some data was corrupt somewhere!
   4179    if (z->size[b] != s) return -1;  // was originally an assert, but report failure instead.
   4180    a->code_buffer >>= s;
   4181    a->num_bits -= s;
   4182    return z->value[b];
   4183 }
   4184 
   4185 stbi_inline static int stbi__zhuffman_decode(stbi__zbuf *a, stbi__zhuffman *z)
   4186 {
   4187    int b,s;
   4188    if (a->num_bits < 16) {
   4189       if (stbi__zeof(a)) {
   4190          return -1;   /* report error for unexpected end of data. */
   4191       }
   4192       stbi__fill_bits(a);
   4193    }
   4194    b = z->fast[a->code_buffer & STBI__ZFAST_MASK];
   4195    if (b) {
   4196       s = b >> 9;
   4197       a->code_buffer >>= s;
   4198       a->num_bits -= s;
   4199       return b & 511;
   4200    }
   4201    return stbi__zhuffman_decode_slowpath(a, z);
   4202 }
   4203 
   4204 static int stbi__zexpand(stbi__zbuf *z, char *zout, int n)  // need to make room for n bytes
   4205 {
   4206    char *q;
   4207    unsigned int cur, limit, old_limit;
   4208    z->zout = zout;
   4209    if (!z->z_expandable) return stbi__err("output buffer limit","Corrupt PNG");
   4210    cur   = (unsigned int) (z->zout - z->zout_start);
   4211    limit = old_limit = (unsigned) (z->zout_end - z->zout_start);
   4212    if (UINT_MAX - cur < (unsigned) n) return stbi__err("outofmem", "Out of memory");
   4213    while (cur + n > limit) {
   4214       if(limit > UINT_MAX / 2) return stbi__err("outofmem", "Out of memory");
   4215       limit *= 2;
   4216    }
   4217    q = (char *) STBI_REALLOC_SIZED(z->zout_start, old_limit, limit);
   4218    STBI_NOTUSED(old_limit);
   4219    if (q == NULL) return stbi__err("outofmem", "Out of memory");
   4220    z->zout_start = q;
   4221    z->zout       = q + cur;
   4222    z->zout_end   = q + limit;
   4223    return 1;
   4224 }
   4225 
   4226 static const int stbi__zlength_base[31] = {
   4227    3,4,5,6,7,8,9,10,11,13,
   4228    15,17,19,23,27,31,35,43,51,59,
   4229    67,83,99,115,131,163,195,227,258,0,0 };
   4230 
   4231 static const int stbi__zlength_extra[31]=
   4232 { 0,0,0,0,0,0,0,0,1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5,0,0,0 };
   4233 
   4234 static const int stbi__zdist_base[32] = { 1,2,3,4,5,7,9,13,17,25,33,49,65,97,129,193,
   4235 257,385,513,769,1025,1537,2049,3073,4097,6145,8193,12289,16385,24577,0,0};
   4236 
   4237 static const int stbi__zdist_extra[32] =
   4238 { 0,0,0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12,13,13};
   4239 
   4240 static int stbi__parse_huffman_block(stbi__zbuf *a)
   4241 {
   4242    char *zout = a->zout;
   4243    for(;;) {
   4244       int z = stbi__zhuffman_decode(a, &a->z_length);
   4245       if (z < 256) {
   4246          if (z < 0) return stbi__err("bad huffman code","Corrupt PNG"); // error in huffman codes
   4247          if (zout >= a->zout_end) {
   4248             if (!stbi__zexpand(a, zout, 1)) return 0;
   4249             zout = a->zout;
   4250          }
   4251          *zout++ = (char) z;
   4252       } else {
   4253          stbi_uc *p;
   4254          int len,dist;
   4255          if (z == 256) {
   4256             a->zout = zout;
   4257             return 1;
   4258          }
   4259          z -= 257;
   4260          len = stbi__zlength_base[z];
   4261          if (stbi__zlength_extra[z]) len += stbi__zreceive(a, stbi__zlength_extra[z]);
   4262          z = stbi__zhuffman_decode(a, &a->z_distance);
   4263          if (z < 0) return stbi__err("bad huffman code","Corrupt PNG");
   4264          dist = stbi__zdist_base[z];
   4265          if (stbi__zdist_extra[z]) dist += stbi__zreceive(a, stbi__zdist_extra[z]);
   4266          if (zout - a->zout_start < dist) return stbi__err("bad dist","Corrupt PNG");
   4267          if (zout + len > a->zout_end) {
   4268             if (!stbi__zexpand(a, zout, len)) return 0;
   4269             zout = a->zout;
   4270          }
   4271          p = (stbi_uc *) (zout - dist);
   4272          if (dist == 1) { // run of one byte; common in images.
   4273             stbi_uc v = *p;
   4274             if (len) { do *zout++ = v; while (--len); }
   4275          } else {
   4276             if (len) { do *zout++ = *p++; while (--len); }
   4277          }
   4278       }
   4279    }
   4280 }
   4281 
   4282 static int stbi__compute_huffman_codes(stbi__zbuf *a)
   4283 {
   4284    static const stbi_uc length_dezigzag[19] = { 16,17,18,0,8,7,9,6,10,5,11,4,12,3,13,2,14,1,15 };
   4285    stbi__zhuffman z_codelength;
   4286    stbi_uc lencodes[286+32+137];//padding for maximum single op
   4287    stbi_uc codelength_sizes[19];
   4288    int i,n;
   4289 
   4290    int hlit  = stbi__zreceive(a,5) + 257;
   4291    int hdist = stbi__zreceive(a,5) + 1;
   4292    int hclen = stbi__zreceive(a,4) + 4;
   4293    int ntot  = hlit + hdist;
   4294 
   4295    memset(codelength_sizes, 0, sizeof(codelength_sizes));
   4296    for (i=0; i < hclen; ++i) {
   4297       int s = stbi__zreceive(a,3);
   4298       codelength_sizes[length_dezigzag[i]] = (stbi_uc) s;
   4299    }
   4300    if (!stbi__zbuild_huffman(&z_codelength, codelength_sizes, 19)) return 0;
   4301 
   4302    n = 0;
   4303    while (n < ntot) {
   4304       int c = stbi__zhuffman_decode(a, &z_codelength);
   4305       if (c < 0 || c >= 19) return stbi__err("bad codelengths", "Corrupt PNG");
   4306       if (c < 16)
   4307          lencodes[n++] = (stbi_uc) c;
   4308       else {
   4309          stbi_uc fill = 0;
   4310          if (c == 16) {
   4311             c = stbi__zreceive(a,2)+3;
   4312             if (n == 0) return stbi__err("bad codelengths", "Corrupt PNG");
   4313             fill = lencodes[n-1];
   4314          } else if (c == 17) {
   4315             c = stbi__zreceive(a,3)+3;
   4316          } else if (c == 18) {
   4317             c = stbi__zreceive(a,7)+11;
   4318          } else {
   4319             return stbi__err("bad codelengths", "Corrupt PNG");
   4320          }
   4321          if (ntot - n < c) return stbi__err("bad codelengths", "Corrupt PNG");
   4322          memset(lencodes+n, fill, c);
   4323          n += c;
   4324       }
   4325    }
   4326    if (n != ntot) return stbi__err("bad codelengths","Corrupt PNG");
   4327    if (!stbi__zbuild_huffman(&a->z_length, lencodes, hlit)) return 0;
   4328    if (!stbi__zbuild_huffman(&a->z_distance, lencodes+hlit, hdist)) return 0;
   4329    return 1;
   4330 }
   4331 
   4332 static int stbi__parse_uncompressed_block(stbi__zbuf *a)
   4333 {
   4334    stbi_uc header[4];
   4335    int len,nlen,k;
   4336    if (a->num_bits & 7)
   4337       stbi__zreceive(a, a->num_bits & 7); // discard
   4338    // drain the bit-packed data into header
   4339    k = 0;
   4340    while (a->num_bits > 0) {
   4341       header[k++] = (stbi_uc) (a->code_buffer & 255); // suppress MSVC run-time check
   4342       a->code_buffer >>= 8;
   4343       a->num_bits -= 8;
   4344    }
   4345    if (a->num_bits < 0) return stbi__err("zlib corrupt","Corrupt PNG");
   4346    // now fill header the normal way
   4347    while (k < 4)
   4348       header[k++] = stbi__zget8(a);
   4349    len  = header[1] * 256 + header[0];
   4350    nlen = header[3] * 256 + header[2];
   4351    if (nlen != (len ^ 0xffff)) return stbi__err("zlib corrupt","Corrupt PNG");
   4352    if (a->zbuffer + len > a->zbuffer_end) return stbi__err("read past buffer","Corrupt PNG");
   4353    if (a->zout + len > a->zout_end)
   4354       if (!stbi__zexpand(a, a->zout, len)) return 0;
   4355    memcpy(a->zout, a->zbuffer, len);
   4356    a->zbuffer += len;
   4357    a->zout += len;
   4358    return 1;
   4359 }
   4360 
   4361 static int stbi__parse_zlib_header(stbi__zbuf *a)
   4362 {
   4363    int cmf   = stbi__zget8(a);
   4364    int cm    = cmf & 15;
   4365    /* int cinfo = cmf >> 4; */
   4366    int flg   = stbi__zget8(a);
   4367    if (stbi__zeof(a)) return stbi__err("bad zlib header","Corrupt PNG"); // zlib spec
   4368    if ((cmf*256+flg) % 31 != 0) return stbi__err("bad zlib header","Corrupt PNG"); // zlib spec
   4369    if (flg & 32) return stbi__err("no preset dict","Corrupt PNG"); // preset dictionary not allowed in png
   4370    if (cm != 8) return stbi__err("bad compression","Corrupt PNG"); // DEFLATE required for png
   4371    // window = 1 << (8 + cinfo)... but who cares, we fully buffer output
   4372    return 1;
   4373 }
   4374 
   4375 static const stbi_uc stbi__zdefault_length[STBI__ZNSYMS] =
   4376 {
   4377    8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
   4378    8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
   4379    8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
   4380    8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
   4381    8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,
   4382    9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9, 9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,
   4383    9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9, 9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,
   4384    9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9, 9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,
   4385    7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7, 7,7,7,7,7,7,7,7,8,8,8,8,8,8,8,8
   4386 };
   4387 static const stbi_uc stbi__zdefault_distance[32] =
   4388 {
   4389    5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5
   4390 };
   4391 /*
   4392 Init algorithm:
   4393 {
   4394    int i;   // use <= to match clearly with spec
   4395    for (i=0; i <= 143; ++i)     stbi__zdefault_length[i]   = 8;
   4396    for (   ; i <= 255; ++i)     stbi__zdefault_length[i]   = 9;
   4397    for (   ; i <= 279; ++i)     stbi__zdefault_length[i]   = 7;
   4398    for (   ; i <= 287; ++i)     stbi__zdefault_length[i]   = 8;
   4399 
   4400    for (i=0; i <=  31; ++i)     stbi__zdefault_distance[i] = 5;
   4401 }
   4402 */
   4403 
   4404 static int stbi__parse_zlib(stbi__zbuf *a, int parse_header)
   4405 {
   4406    int final, type;
   4407    if (parse_header)
   4408       if (!stbi__parse_zlib_header(a)) return 0;
   4409    a->num_bits = 0;
   4410    a->code_buffer = 0;
   4411    do {
   4412       final = stbi__zreceive(a,1);
   4413       type = stbi__zreceive(a,2);
   4414       if (type == 0) {
   4415          if (!stbi__parse_uncompressed_block(a)) return 0;
   4416       } else if (type == 3) {
   4417          return 0;
   4418       } else {
   4419          if (type == 1) {
   4420             // use fixed code lengths
   4421             if (!stbi__zbuild_huffman(&a->z_length  , stbi__zdefault_length  , STBI__ZNSYMS)) return 0;
   4422             if (!stbi__zbuild_huffman(&a->z_distance, stbi__zdefault_distance,  32)) return 0;
   4423          } else {
   4424             if (!stbi__compute_huffman_codes(a)) return 0;
   4425          }
   4426          if (!stbi__parse_huffman_block(a)) return 0;
   4427       }
   4428    } while (!final);
   4429    return 1;
   4430 }
   4431 
   4432 static int stbi__do_zlib(stbi__zbuf *a, char *obuf, int olen, int exp, int parse_header)
   4433 {
   4434    a->zout_start = obuf;
   4435    a->zout       = obuf;
   4436    a->zout_end   = obuf + olen;
   4437    a->z_expandable = exp;
   4438 
   4439    return stbi__parse_zlib(a, parse_header);
   4440 }
   4441 
   4442 STBIDEF char *stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size, int *outlen)
   4443 {
   4444    stbi__zbuf a;
   4445    char *p = (char *) stbi__malloc(initial_size);
   4446    if (p == NULL) return NULL;
   4447    a.zbuffer = (stbi_uc *) buffer;
   4448    a.zbuffer_end = (stbi_uc *) buffer + len;
   4449    if (stbi__do_zlib(&a, p, initial_size, 1, 1)) {
   4450       if (outlen) *outlen = (int) (a.zout - a.zout_start);
   4451       return a.zout_start;
   4452    } else {
   4453       STBI_FREE(a.zout_start);
   4454       return NULL;
   4455    }
   4456 }
   4457 
   4458 STBIDEF char *stbi_zlib_decode_malloc(char const *buffer, int len, int *outlen)
   4459 {
   4460    return stbi_zlib_decode_malloc_guesssize(buffer, len, 16384, outlen);
   4461 }
   4462 
   4463 STBIDEF char *stbi_zlib_decode_malloc_guesssize_headerflag(const char *buffer, int len, int initial_size, int *outlen, int parse_header)
   4464 {
   4465    stbi__zbuf a;
   4466    char *p = (char *) stbi__malloc(initial_size);
   4467    if (p == NULL) return NULL;
   4468    a.zbuffer = (stbi_uc *) buffer;
   4469    a.zbuffer_end = (stbi_uc *) buffer + len;
   4470    if (stbi__do_zlib(&a, p, initial_size, 1, parse_header)) {
   4471       if (outlen) *outlen = (int) (a.zout - a.zout_start);
   4472       return a.zout_start;
   4473    } else {
   4474       STBI_FREE(a.zout_start);
   4475       return NULL;
   4476    }
   4477 }
   4478 
   4479 STBIDEF int stbi_zlib_decode_buffer(char *obuffer, int olen, char const *ibuffer, int ilen)
   4480 {
   4481    stbi__zbuf a;
   4482    a.zbuffer = (stbi_uc *) ibuffer;
   4483    a.zbuffer_end = (stbi_uc *) ibuffer + ilen;
   4484    if (stbi__do_zlib(&a, obuffer, olen, 0, 1))
   4485       return (int) (a.zout - a.zout_start);
   4486    else
   4487       return -1;
   4488 }
   4489 
   4490 STBIDEF char *stbi_zlib_decode_noheader_malloc(char const *buffer, int len, int *outlen)
   4491 {
   4492    stbi__zbuf a;
   4493    char *p = (char *) stbi__malloc(16384);
   4494    if (p == NULL) return NULL;
   4495    a.zbuffer = (stbi_uc *) buffer;
   4496    a.zbuffer_end = (stbi_uc *) buffer+len;
   4497    if (stbi__do_zlib(&a, p, 16384, 1, 0)) {
   4498       if (outlen) *outlen = (int) (a.zout - a.zout_start);
   4499       return a.zout_start;
   4500    } else {
   4501       STBI_FREE(a.zout_start);
   4502       return NULL;
   4503    }
   4504 }
   4505 
   4506 STBIDEF int stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer, int ilen)
   4507 {
   4508    stbi__zbuf a;
   4509    a.zbuffer = (stbi_uc *) ibuffer;
   4510    a.zbuffer_end = (stbi_uc *) ibuffer + ilen;
   4511    if (stbi__do_zlib(&a, obuffer, olen, 0, 0))
   4512       return (int) (a.zout - a.zout_start);
   4513    else
   4514       return -1;
   4515 }
   4516 #endif
   4517 
   4518 // public domain "baseline" PNG decoder   v0.10  Sean Barrett 2006-11-18
   4519 //    simple implementation
   4520 //      - only 8-bit samples
   4521 //      - no CRC checking
   4522 //      - allocates lots of intermediate memory
   4523 //        - avoids problem of streaming data between subsystems
   4524 //        - avoids explicit window management
   4525 //    performance
   4526 //      - uses stb_zlib, a PD zlib implementation with fast huffman decoding
   4527 
   4528 #ifndef STBI_NO_PNG
   4529 typedef struct
   4530 {
   4531    stbi__uint32 length;
   4532    stbi__uint32 type;
   4533 } stbi__pngchunk;
   4534 
   4535 static stbi__pngchunk stbi__get_chunk_header(stbi__context *s)
   4536 {
   4537    stbi__pngchunk c;
   4538    c.length = stbi__get32be(s);
   4539    c.type   = stbi__get32be(s);
   4540    return c;
   4541 }
   4542 
   4543 static int stbi__check_png_header(stbi__context *s)
   4544 {
   4545    static const stbi_uc png_sig[8] = { 137,80,78,71,13,10,26,10 };
   4546    int i;
   4547    for (i=0; i < 8; ++i)
   4548       if (stbi__get8(s) != png_sig[i]) return stbi__err("bad png sig","Not a PNG");
   4549    return 1;
   4550 }
   4551 
   4552 typedef struct
   4553 {
   4554    stbi__context *s;
   4555    stbi_uc *idata, *expanded, *out;
   4556    int depth;
   4557 } stbi__png;
   4558 
   4559 
   4560 enum {
   4561    STBI__F_none=0,
   4562    STBI__F_sub=1,
   4563    STBI__F_up=2,
   4564    STBI__F_avg=3,
   4565    STBI__F_paeth=4,
   4566    // synthetic filters used for first scanline to avoid needing a dummy row of 0s
   4567    STBI__F_avg_first,
   4568    STBI__F_paeth_first
   4569 };
   4570 
   4571 static stbi_uc first_row_filter[5] =
   4572 {
   4573    STBI__F_none,
   4574    STBI__F_sub,
   4575    STBI__F_none,
   4576    STBI__F_avg_first,
   4577    STBI__F_paeth_first
   4578 };
   4579 
   4580 static int stbi__paeth(int a, int b, int c)
   4581 {
   4582    int p = a + b - c;
   4583    int pa = abs(p-a);
   4584    int pb = abs(p-b);
   4585    int pc = abs(p-c);
   4586    if (pa <= pb && pa <= pc) return a;
   4587    if (pb <= pc) return b;
   4588    return c;
   4589 }
   4590 
   4591 static const stbi_uc stbi__depth_scale_table[9] = { 0, 0xff, 0x55, 0, 0x11, 0,0,0, 0x01 };
   4592 
   4593 // create the png data from post-deflated data
   4594 static int stbi__create_png_image_raw(stbi__png *a, stbi_uc *raw, stbi__uint32 raw_len, int out_n, stbi__uint32 x, stbi__uint32 y, int depth, int color)
   4595 {
   4596    int bytes = (depth == 16? 2 : 1);
   4597    stbi__context *s = a->s;
   4598    stbi__uint32 i,j,stride = x*out_n*bytes;
   4599    stbi__uint32 img_len, img_width_bytes;
   4600    int k;
   4601    int img_n = s->img_n; // copy it into a local for later
   4602 
   4603    int output_bytes = out_n*bytes;
   4604    int filter_bytes = img_n*bytes;
   4605    int width = x;
   4606 
   4607    STBI_ASSERT(out_n == s->img_n || out_n == s->img_n+1);
   4608    a->out = (stbi_uc *) stbi__malloc_mad3(x, y, output_bytes, 0); // extra bytes to write off the end into
   4609    if (!a->out) return stbi__err("outofmem", "Out of memory");
   4610 
   4611    if (!stbi__mad3sizes_valid(img_n, x, depth, 7)) return stbi__err("too large", "Corrupt PNG");
   4612    img_width_bytes = (((img_n * x * depth) + 7) >> 3);
   4613    img_len = (img_width_bytes + 1) * y;
   4614 
   4615    // we used to check for exact match between raw_len and img_len on non-interlaced PNGs,
   4616    // but issue #276 reported a PNG in the wild that had extra data at the end (all zeros),
   4617    // so just check for raw_len < img_len always.
   4618    if (raw_len < img_len) return stbi__err("not enough pixels","Corrupt PNG");
   4619 
   4620    for (j=0; j < y; ++j) {
   4621       stbi_uc *cur = a->out + stride*j;
   4622       stbi_uc *prior;
   4623       int filter = *raw++;
   4624 
   4625       if (filter > 4)
   4626          return stbi__err("invalid filter","Corrupt PNG");
   4627 
   4628       if (depth < 8) {
   4629          if (img_width_bytes > x) return stbi__err("invalid width","Corrupt PNG");
   4630          cur += x*out_n - img_width_bytes; // store output to the rightmost img_len bytes, so we can decode in place
   4631          filter_bytes = 1;
   4632          width = img_width_bytes;
   4633       }
   4634       prior = cur - stride; // bugfix: need to compute this after 'cur +=' computation above
   4635 
   4636       // if first row, use special filter that doesn't sample previous row
   4637       if (j == 0) filter = first_row_filter[filter];
   4638 
   4639       // handle first byte explicitly
   4640       for (k=0; k < filter_bytes; ++k) {
   4641          switch (filter) {
   4642             case STBI__F_none       : cur[k] = raw[k]; break;
   4643             case STBI__F_sub        : cur[k] = raw[k]; break;
   4644             case STBI__F_up         : cur[k] = STBI__BYTECAST(raw[k] + prior[k]); break;
   4645             case STBI__F_avg        : cur[k] = STBI__BYTECAST(raw[k] + (prior[k]>>1)); break;
   4646             case STBI__F_paeth      : cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(0,prior[k],0)); break;
   4647             case STBI__F_avg_first  : cur[k] = raw[k]; break;
   4648             case STBI__F_paeth_first: cur[k] = raw[k]; break;
   4649          }
   4650       }
   4651 
   4652       if (depth == 8) {
   4653          if (img_n != out_n)
   4654             cur[img_n] = 255; // first pixel
   4655          raw += img_n;
   4656          cur += out_n;
   4657          prior += out_n;
   4658       } else if (depth == 16) {
   4659          if (img_n != out_n) {
   4660             cur[filter_bytes]   = 255; // first pixel top byte
   4661             cur[filter_bytes+1] = 255; // first pixel bottom byte
   4662          }
   4663          raw += filter_bytes;
   4664          cur += output_bytes;
   4665          prior += output_bytes;
   4666       } else {
   4667          raw += 1;
   4668          cur += 1;
   4669          prior += 1;
   4670       }
   4671 
   4672       // this is a little gross, so that we don't switch per-pixel or per-component
   4673       if (depth < 8 || img_n == out_n) {
   4674          int nk = (width - 1)*filter_bytes;
   4675          #define STBI__CASE(f) \
   4676              case f:     \
   4677                 for (k=0; k < nk; ++k)
   4678          switch (filter) {
   4679             // "none" filter turns into a memcpy here; make that explicit.
   4680             case STBI__F_none:         memcpy(cur, raw, nk); break;
   4681             STBI__CASE(STBI__F_sub)          { cur[k] = STBI__BYTECAST(raw[k] + cur[k-filter_bytes]); } break;
   4682             STBI__CASE(STBI__F_up)           { cur[k] = STBI__BYTECAST(raw[k] + prior[k]); } break;
   4683             STBI__CASE(STBI__F_avg)          { cur[k] = STBI__BYTECAST(raw[k] + ((prior[k] + cur[k-filter_bytes])>>1)); } break;
   4684             STBI__CASE(STBI__F_paeth)        { cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-filter_bytes],prior[k],prior[k-filter_bytes])); } break;
   4685             STBI__CASE(STBI__F_avg_first)    { cur[k] = STBI__BYTECAST(raw[k] + (cur[k-filter_bytes] >> 1)); } break;
   4686             STBI__CASE(STBI__F_paeth_first)  { cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-filter_bytes],0,0)); } break;
   4687          }
   4688          #undef STBI__CASE
   4689          raw += nk;
   4690       } else {
   4691          STBI_ASSERT(img_n+1 == out_n);
   4692          #define STBI__CASE(f) \
   4693              case f:     \
   4694                 for (i=x-1; i >= 1; --i, cur[filter_bytes]=255,raw+=filter_bytes,cur+=output_bytes,prior+=output_bytes) \
   4695                    for (k=0; k < filter_bytes; ++k)
   4696          switch (filter) {
   4697             STBI__CASE(STBI__F_none)         { cur[k] = raw[k]; } break;
   4698             STBI__CASE(STBI__F_sub)          { cur[k] = STBI__BYTECAST(raw[k] + cur[k- output_bytes]); } break;
   4699             STBI__CASE(STBI__F_up)           { cur[k] = STBI__BYTECAST(raw[k] + prior[k]); } break;
   4700             STBI__CASE(STBI__F_avg)          { cur[k] = STBI__BYTECAST(raw[k] + ((prior[k] + cur[k- output_bytes])>>1)); } break;
   4701             STBI__CASE(STBI__F_paeth)        { cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k- output_bytes],prior[k],prior[k- output_bytes])); } break;
   4702             STBI__CASE(STBI__F_avg_first)    { cur[k] = STBI__BYTECAST(raw[k] + (cur[k- output_bytes] >> 1)); } break;
   4703             STBI__CASE(STBI__F_paeth_first)  { cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k- output_bytes],0,0)); } break;
   4704          }
   4705          #undef STBI__CASE
   4706 
   4707          // the loop above sets the high byte of the pixels' alpha, but for
   4708          // 16 bit png files we also need the low byte set. we'll do that here.
   4709          if (depth == 16) {
   4710             cur = a->out + stride*j; // start at the beginning of the row again
   4711             for (i=0; i < x; ++i,cur+=output_bytes) {
   4712                cur[filter_bytes+1] = 255;
   4713             }
   4714          }
   4715       }
   4716    }
   4717 
   4718    // we make a separate pass to expand bits to pixels; for performance,
   4719    // this could run two scanlines behind the above code, so it won't
   4720    // intefere with filtering but will still be in the cache.
   4721    if (depth < 8) {
   4722       for (j=0; j < y; ++j) {
   4723          stbi_uc *cur = a->out + stride*j;
   4724          stbi_uc *in  = a->out + stride*j + x*out_n - img_width_bytes;
   4725          // unpack 1/2/4-bit into a 8-bit buffer. allows us to keep the common 8-bit path optimal at minimal cost for 1/2/4-bit
   4726          // png guarante byte alignment, if width is not multiple of 8/4/2 we'll decode dummy trailing data that will be skipped in the later loop
   4727          stbi_uc scale = (color == 0) ? stbi__depth_scale_table[depth] : 1; // scale grayscale values to 0..255 range
   4728 
   4729          // note that the final byte might overshoot and write more data than desired.
   4730          // we can allocate enough data that this never writes out of memory, but it
   4731          // could also overwrite the next scanline. can it overwrite non-empty data
   4732          // on the next scanline? yes, consider 1-pixel-wide scanlines with 1-bit-per-pixel.
   4733          // so we need to explicitly clamp the final ones
   4734 
   4735          if (depth == 4) {
   4736             for (k=x*img_n; k >= 2; k-=2, ++in) {
   4737                *cur++ = scale * ((*in >> 4)       );
   4738                *cur++ = scale * ((*in     ) & 0x0f);
   4739             }
   4740             if (k > 0) *cur++ = scale * ((*in >> 4)       );
   4741          } else if (depth == 2) {
   4742             for (k=x*img_n; k >= 4; k-=4, ++in) {
   4743                *cur++ = scale * ((*in >> 6)       );
   4744                *cur++ = scale * ((*in >> 4) & 0x03);
   4745                *cur++ = scale * ((*in >> 2) & 0x03);
   4746                *cur++ = scale * ((*in     ) & 0x03);
   4747             }
   4748             if (k > 0) *cur++ = scale * ((*in >> 6)       );
   4749             if (k > 1) *cur++ = scale * ((*in >> 4) & 0x03);
   4750             if (k > 2) *cur++ = scale * ((*in >> 2) & 0x03);
   4751          } else if (depth == 1) {
   4752             for (k=x*img_n; k >= 8; k-=8, ++in) {
   4753                *cur++ = scale * ((*in >> 7)       );
   4754                *cur++ = scale * ((*in >> 6) & 0x01);
   4755                *cur++ = scale * ((*in >> 5) & 0x01);
   4756                *cur++ = scale * ((*in >> 4) & 0x01);
   4757                *cur++ = scale * ((*in >> 3) & 0x01);
   4758                *cur++ = scale * ((*in >> 2) & 0x01);
   4759                *cur++ = scale * ((*in >> 1) & 0x01);
   4760                *cur++ = scale * ((*in     ) & 0x01);
   4761             }
   4762             if (k > 0) *cur++ = scale * ((*in >> 7)       );
   4763             if (k > 1) *cur++ = scale * ((*in >> 6) & 0x01);
   4764             if (k > 2) *cur++ = scale * ((*in >> 5) & 0x01);
   4765             if (k > 3) *cur++ = scale * ((*in >> 4) & 0x01);
   4766             if (k > 4) *cur++ = scale * ((*in >> 3) & 0x01);
   4767             if (k > 5) *cur++ = scale * ((*in >> 2) & 0x01);
   4768             if (k > 6) *cur++ = scale * ((*in >> 1) & 0x01);
   4769          }
   4770          if (img_n != out_n) {
   4771             int q;
   4772             // insert alpha = 255
   4773             cur = a->out + stride*j;
   4774             if (img_n == 1) {
   4775                for (q=x-1; q >= 0; --q) {
   4776                   cur[q*2+1] = 255;
   4777                   cur[q*2+0] = cur[q];
   4778                }
   4779             } else {
   4780                STBI_ASSERT(img_n == 3);
   4781                for (q=x-1; q >= 0; --q) {
   4782                   cur[q*4+3] = 255;
   4783                   cur[q*4+2] = cur[q*3+2];
   4784                   cur[q*4+1] = cur[q*3+1];
   4785                   cur[q*4+0] = cur[q*3+0];
   4786                }
   4787             }
   4788          }
   4789       }
   4790    } else if (depth == 16) {
   4791       // force the image data from big-endian to platform-native.
   4792       // this is done in a separate pass due to the decoding relying
   4793       // on the data being untouched, but could probably be done
   4794       // per-line during decode if care is taken.
   4795       stbi_uc *cur = a->out;
   4796       stbi__uint16 *cur16 = (stbi__uint16*)cur;
   4797 
   4798       for(i=0; i < x*y*out_n; ++i,cur16++,cur+=2) {
   4799          *cur16 = (cur[0] << 8) | cur[1];
   4800       }
   4801    }
   4802 
   4803    return 1;
   4804 }
   4805 
   4806 static int stbi__create_png_image(stbi__png *a, stbi_uc *image_data, stbi__uint32 image_data_len, int out_n, int depth, int color, int interlaced)
   4807 {
   4808    int bytes = (depth == 16 ? 2 : 1);
   4809    int out_bytes = out_n * bytes;
   4810    stbi_uc *final;
   4811    int p;
   4812    if (!interlaced)
   4813       return stbi__create_png_image_raw(a, image_data, image_data_len, out_n, a->s->img_x, a->s->img_y, depth, color);
   4814 
   4815    // de-interlacing
   4816    final = (stbi_uc *) stbi__malloc_mad3(a->s->img_x, a->s->img_y, out_bytes, 0);
   4817    if (!final) return stbi__err("outofmem", "Out of memory");
   4818    for (p=0; p < 7; ++p) {
   4819       int xorig[] = { 0,4,0,2,0,1,0 };
   4820       int yorig[] = { 0,0,4,0,2,0,1 };
   4821       int xspc[]  = { 8,8,4,4,2,2,1 };
   4822       int yspc[]  = { 8,8,8,4,4,2,2 };
   4823       int i,j,x,y;
   4824       // pass1_x[4] = 0, pass1_x[5] = 1, pass1_x[12] = 1
   4825       x = (a->s->img_x - xorig[p] + xspc[p]-1) / xspc[p];
   4826       y = (a->s->img_y - yorig[p] + yspc[p]-1) / yspc[p];
   4827       if (x && y) {
   4828          stbi__uint32 img_len = ((((a->s->img_n * x * depth) + 7) >> 3) + 1) * y;
   4829          if (!stbi__create_png_image_raw(a, image_data, image_data_len, out_n, x, y, depth, color)) {
   4830             STBI_FREE(final);
   4831             return 0;
   4832          }
   4833          for (j=0; j < y; ++j) {
   4834             for (i=0; i < x; ++i) {
   4835                int out_y = j*yspc[p]+yorig[p];
   4836                int out_x = i*xspc[p]+xorig[p];
   4837                memcpy(final + out_y*a->s->img_x*out_bytes + out_x*out_bytes,
   4838                       a->out + (j*x+i)*out_bytes, out_bytes);
   4839             }
   4840          }
   4841          STBI_FREE(a->out);
   4842          image_data += img_len;
   4843          image_data_len -= img_len;
   4844       }
   4845    }
   4846    a->out = final;
   4847 
   4848    return 1;
   4849 }
   4850 
   4851 static int stbi__compute_transparency(stbi__png *z, stbi_uc tc[3], int out_n)
   4852 {
   4853    stbi__context *s = z->s;
   4854    stbi__uint32 i, pixel_count = s->img_x * s->img_y;
   4855    stbi_uc *p = z->out;
   4856 
   4857    // compute color-based transparency, assuming we've
   4858    // already got 255 as the alpha value in the output
   4859    STBI_ASSERT(out_n == 2 || out_n == 4);
   4860 
   4861    if (out_n == 2) {
   4862       for (i=0; i < pixel_count; ++i) {
   4863          p[1] = (p[0] == tc[0] ? 0 : 255);
   4864          p += 2;
   4865       }
   4866    } else {
   4867       for (i=0; i < pixel_count; ++i) {
   4868          if (p[0] == tc[0] && p[1] == tc[1] && p[2] == tc[2])
   4869             p[3] = 0;
   4870          p += 4;
   4871       }
   4872    }
   4873    return 1;
   4874 }
   4875 
   4876 static int stbi__compute_transparency16(stbi__png *z, stbi__uint16 tc[3], int out_n)
   4877 {
   4878    stbi__context *s = z->s;
   4879    stbi__uint32 i, pixel_count = s->img_x * s->img_y;
   4880    stbi__uint16 *p = (stbi__uint16*) z->out;
   4881 
   4882    // compute color-based transparency, assuming we've
   4883    // already got 65535 as the alpha value in the output
   4884    STBI_ASSERT(out_n == 2 || out_n == 4);
   4885 
   4886    if (out_n == 2) {
   4887       for (i = 0; i < pixel_count; ++i) {
   4888          p[1] = (p[0] == tc[0] ? 0 : 65535);
   4889          p += 2;
   4890       }
   4891    } else {
   4892       for (i = 0; i < pixel_count; ++i) {
   4893          if (p[0] == tc[0] && p[1] == tc[1] && p[2] == tc[2])
   4894             p[3] = 0;
   4895          p += 4;
   4896       }
   4897    }
   4898    return 1;
   4899 }
   4900 
   4901 static int stbi__expand_png_palette(stbi__png *a, stbi_uc *palette, int len, int pal_img_n)
   4902 {
   4903    stbi__uint32 i, pixel_count = a->s->img_x * a->s->img_y;
   4904    stbi_uc *p, *temp_out, *orig = a->out;
   4905 
   4906    p = (stbi_uc *) stbi__malloc_mad2(pixel_count, pal_img_n, 0);
   4907    if (p == NULL) return stbi__err("outofmem", "Out of memory");
   4908 
   4909    // between here and free(out) below, exitting would leak
   4910    temp_out = p;
   4911 
   4912    if (pal_img_n == 3) {
   4913       for (i=0; i < pixel_count; ++i) {
   4914          int n = orig[i]*4;
   4915          p[0] = palette[n  ];
   4916          p[1] = palette[n+1];
   4917          p[2] = palette[n+2];
   4918          p += 3;
   4919       }
   4920    } else {
   4921       for (i=0; i < pixel_count; ++i) {
   4922          int n = orig[i]*4;
   4923          p[0] = palette[n  ];
   4924          p[1] = palette[n+1];
   4925          p[2] = palette[n+2];
   4926          p[3] = palette[n+3];
   4927          p += 4;
   4928       }
   4929    }
   4930    STBI_FREE(a->out);
   4931    a->out = temp_out;
   4932 
   4933    STBI_NOTUSED(len);
   4934 
   4935    return 1;
   4936 }
   4937 
   4938 static int stbi__unpremultiply_on_load_global = 0;
   4939 static int stbi__de_iphone_flag_global = 0;
   4940 
   4941 STBIDEF void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply)
   4942 {
   4943    stbi__unpremultiply_on_load_global = flag_true_if_should_unpremultiply;
   4944 }
   4945 
   4946 STBIDEF void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert)
   4947 {
   4948    stbi__de_iphone_flag_global = flag_true_if_should_convert;
   4949 }
   4950 
   4951 #ifndef STBI_THREAD_LOCAL
   4952 #define stbi__unpremultiply_on_load  stbi__unpremultiply_on_load_global
   4953 #define stbi__de_iphone_flag  stbi__de_iphone_flag_global
   4954 #else
   4955 static STBI_THREAD_LOCAL int stbi__unpremultiply_on_load_local, stbi__unpremultiply_on_load_set;
   4956 static STBI_THREAD_LOCAL int stbi__de_iphone_flag_local, stbi__de_iphone_flag_set;
   4957 
   4958 STBIDEF void stbi__unpremultiply_on_load_thread(int flag_true_if_should_unpremultiply)
   4959 {
   4960    stbi__unpremultiply_on_load_local = flag_true_if_should_unpremultiply;
   4961    stbi__unpremultiply_on_load_set = 1;
   4962 }
   4963 
   4964 STBIDEF void stbi_convert_iphone_png_to_rgb_thread(int flag_true_if_should_convert)
   4965 {
   4966    stbi__de_iphone_flag_local = flag_true_if_should_convert;
   4967    stbi__de_iphone_flag_set = 1;
   4968 }
   4969 
   4970 #define stbi__unpremultiply_on_load  (stbi__unpremultiply_on_load_set           \
   4971                                        ? stbi__unpremultiply_on_load_local      \
   4972                                        : stbi__unpremultiply_on_load_global)
   4973 #define stbi__de_iphone_flag  (stbi__de_iphone_flag_set                         \
   4974                                 ? stbi__de_iphone_flag_local                    \
   4975                                 : stbi__de_iphone_flag_global)
   4976 #endif // STBI_THREAD_LOCAL
   4977 
   4978 static void stbi__de_iphone(stbi__png *z)
   4979 {
   4980    stbi__context *s = z->s;
   4981    stbi__uint32 i, pixel_count = s->img_x * s->img_y;
   4982    stbi_uc *p = z->out;
   4983 
   4984    if (s->img_out_n == 3) {  // convert bgr to rgb
   4985       for (i=0; i < pixel_count; ++i) {
   4986          stbi_uc t = p[0];
   4987          p[0] = p[2];
   4988          p[2] = t;
   4989          p += 3;
   4990       }
   4991    } else {
   4992       STBI_ASSERT(s->img_out_n == 4);
   4993       if (stbi__unpremultiply_on_load) {
   4994          // convert bgr to rgb and unpremultiply
   4995          for (i=0; i < pixel_count; ++i) {
   4996             stbi_uc a = p[3];
   4997             stbi_uc t = p[0];
   4998             if (a) {
   4999                stbi_uc half = a / 2;
   5000                p[0] = (p[2] * 255 + half) / a;
   5001                p[1] = (p[1] * 255 + half) / a;
   5002                p[2] = ( t   * 255 + half) / a;
   5003             } else {
   5004                p[0] = p[2];
   5005                p[2] = t;
   5006             }
   5007             p += 4;
   5008          }
   5009       } else {
   5010          // convert bgr to rgb
   5011          for (i=0; i < pixel_count; ++i) {
   5012             stbi_uc t = p[0];
   5013             p[0] = p[2];
   5014             p[2] = t;
   5015             p += 4;
   5016          }
   5017       }
   5018    }
   5019 }
   5020 
   5021 #define STBI__PNG_TYPE(a,b,c,d)  (((unsigned) (a) << 24) + ((unsigned) (b) << 16) + ((unsigned) (c) << 8) + (unsigned) (d))
   5022 
   5023 static int stbi__parse_png_file(stbi__png *z, int scan, int req_comp)
   5024 {
   5025    stbi_uc palette[1024], pal_img_n=0;
   5026    stbi_uc has_trans=0, tc[3]={0};
   5027    stbi__uint16 tc16[3];
   5028    stbi__uint32 ioff=0, idata_limit=0, i, pal_len=0;
   5029    int first=1,k,interlace=0, color=0, is_iphone=0;
   5030    stbi__context *s = z->s;
   5031 
   5032    z->expanded = NULL;
   5033    z->idata = NULL;
   5034    z->out = NULL;
   5035 
   5036    if (!stbi__check_png_header(s)) return 0;
   5037 
   5038    if (scan == STBI__SCAN_type) return 1;
   5039 
   5040    for (;;) {
   5041       stbi__pngchunk c = stbi__get_chunk_header(s);
   5042       switch (c.type) {
   5043          case STBI__PNG_TYPE('C','g','B','I'):
   5044             is_iphone = 1;
   5045             stbi__skip(s, c.length);
   5046             break;
   5047          case STBI__PNG_TYPE('I','H','D','R'): {
   5048             int comp,filter;
   5049             if (!first) return stbi__err("multiple IHDR","Corrupt PNG");
   5050             first = 0;
   5051             if (c.length != 13) return stbi__err("bad IHDR len","Corrupt PNG");
   5052             s->img_x = stbi__get32be(s);
   5053             s->img_y = stbi__get32be(s);
   5054             if (s->img_y > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
   5055             if (s->img_x > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
   5056             z->depth = stbi__get8(s);  if (z->depth != 1 && z->depth != 2 && z->depth != 4 && z->depth != 8 && z->depth != 16)  return stbi__err("1/2/4/8/16-bit only","PNG not supported: 1/2/4/8/16-bit only");
   5057             color = stbi__get8(s);  if (color > 6)         return stbi__err("bad ctype","Corrupt PNG");
   5058             if (color == 3 && z->depth == 16)                  return stbi__err("bad ctype","Corrupt PNG");
   5059             if (color == 3) pal_img_n = 3; else if (color & 1) return stbi__err("bad ctype","Corrupt PNG");
   5060             comp  = stbi__get8(s);  if (comp) return stbi__err("bad comp method","Corrupt PNG");
   5061             filter= stbi__get8(s);  if (filter) return stbi__err("bad filter method","Corrupt PNG");
   5062             interlace = stbi__get8(s); if (interlace>1) return stbi__err("bad interlace method","Corrupt PNG");
   5063             if (!s->img_x || !s->img_y) return stbi__err("0-pixel image","Corrupt PNG");
   5064             if (!pal_img_n) {
   5065                s->img_n = (color & 2 ? 3 : 1) + (color & 4 ? 1 : 0);
   5066                if ((1 << 30) / s->img_x / s->img_n < s->img_y) return stbi__err("too large", "Image too large to decode");
   5067                if (scan == STBI__SCAN_header) return 1;
   5068             } else {
   5069                // if paletted, then pal_n is our final components, and
   5070                // img_n is # components to decompress/filter.
   5071                s->img_n = 1;
   5072                if ((1 << 30) / s->img_x / 4 < s->img_y) return stbi__err("too large","Corrupt PNG");
   5073                // if SCAN_header, have to scan to see if we have a tRNS
   5074             }
   5075             break;
   5076          }
   5077 
   5078          case STBI__PNG_TYPE('P','L','T','E'):  {
   5079             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
   5080             if (c.length > 256*3) return stbi__err("invalid PLTE","Corrupt PNG");
   5081             pal_len = c.length / 3;
   5082             if (pal_len * 3 != c.length) return stbi__err("invalid PLTE","Corrupt PNG");
   5083             for (i=0; i < pal_len; ++i) {
   5084                palette[i*4+0] = stbi__get8(s);
   5085                palette[i*4+1] = stbi__get8(s);
   5086                palette[i*4+2] = stbi__get8(s);
   5087                palette[i*4+3] = 255;
   5088             }
   5089             break;
   5090          }
   5091 
   5092          case STBI__PNG_TYPE('t','R','N','S'): {
   5093             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
   5094             if (z->idata) return stbi__err("tRNS after IDAT","Corrupt PNG");
   5095             if (pal_img_n) {
   5096                if (scan == STBI__SCAN_header) { s->img_n = 4; return 1; }
   5097                if (pal_len == 0) return stbi__err("tRNS before PLTE","Corrupt PNG");
   5098                if (c.length > pal_len) return stbi__err("bad tRNS len","Corrupt PNG");
   5099                pal_img_n = 4;
   5100                for (i=0; i < c.length; ++i)
   5101                   palette[i*4+3] = stbi__get8(s);
   5102             } else {
   5103                if (!(s->img_n & 1)) return stbi__err("tRNS with alpha","Corrupt PNG");
   5104                if (c.length != (stbi__uint32) s->img_n*2) return stbi__err("bad tRNS len","Corrupt PNG");
   5105                has_trans = 1;
   5106                if (z->depth == 16) {
   5107                   for (k = 0; k < s->img_n; ++k) tc16[k] = (stbi__uint16)stbi__get16be(s); // copy the values as-is
   5108                } else {
   5109                   for (k = 0; k < s->img_n; ++k) tc[k] = (stbi_uc)(stbi__get16be(s) & 255) * stbi__depth_scale_table[z->depth]; // non 8-bit images will be larger
   5110                }
   5111             }
   5112             break;
   5113          }
   5114 
   5115          case STBI__PNG_TYPE('I','D','A','T'): {
   5116             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
   5117             if (pal_img_n && !pal_len) return stbi__err("no PLTE","Corrupt PNG");
   5118             if (scan == STBI__SCAN_header) { s->img_n = pal_img_n; return 1; }
   5119             if ((int)(ioff + c.length) < (int)ioff) return 0;
   5120             if (ioff + c.length > idata_limit) {
   5121                stbi__uint32 idata_limit_old = idata_limit;
   5122                stbi_uc *p;
   5123                if (idata_limit == 0) idata_limit = c.length > 4096 ? c.length : 4096;
   5124                while (ioff + c.length > idata_limit)
   5125                   idata_limit *= 2;
   5126                STBI_NOTUSED(idata_limit_old);
   5127                p = (stbi_uc *) STBI_REALLOC_SIZED(z->idata, idata_limit_old, idata_limit); if (p == NULL) return stbi__err("outofmem", "Out of memory");
   5128                z->idata = p;
   5129             }
   5130             if (!stbi__getn(s, z->idata+ioff,c.length)) return stbi__err("outofdata","Corrupt PNG");
   5131             ioff += c.length;
   5132             break;
   5133          }
   5134 
   5135          case STBI__PNG_TYPE('I','E','N','D'): {
   5136             stbi__uint32 raw_len, bpl;
   5137             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
   5138             if (scan != STBI__SCAN_load) return 1;
   5139             if (z->idata == NULL) return stbi__err("no IDAT","Corrupt PNG");
   5140             // initial guess for decoded data size to avoid unnecessary reallocs
   5141             bpl = (s->img_x * z->depth + 7) / 8; // bytes per line, per component
   5142             raw_len = bpl * s->img_y * s->img_n /* pixels */ + s->img_y /* filter mode per row */;
   5143             z->expanded = (stbi_uc *) stbi_zlib_decode_malloc_guesssize_headerflag((char *) z->idata, ioff, raw_len, (int *) &raw_len, !is_iphone);
   5144             if (z->expanded == NULL) return 0; // zlib should set error
   5145             STBI_FREE(z->idata); z->idata = NULL;
   5146             if ((req_comp == s->img_n+1 && req_comp != 3 && !pal_img_n) || has_trans)
   5147                s->img_out_n = s->img_n+1;
   5148             else
   5149                s->img_out_n = s->img_n;
   5150             if (!stbi__create_png_image(z, z->expanded, raw_len, s->img_out_n, z->depth, color, interlace)) return 0;
   5151             if (has_trans) {
   5152                if (z->depth == 16) {
   5153                   if (!stbi__compute_transparency16(z, tc16, s->img_out_n)) return 0;
   5154                } else {
   5155                   if (!stbi__compute_transparency(z, tc, s->img_out_n)) return 0;
   5156                }
   5157             }
   5158             if (is_iphone && stbi__de_iphone_flag && s->img_out_n > 2)
   5159                stbi__de_iphone(z);
   5160             if (pal_img_n) {
   5161                // pal_img_n == 3 or 4
   5162                s->img_n = pal_img_n; // record the actual colors we had
   5163                s->img_out_n = pal_img_n;
   5164                if (req_comp >= 3) s->img_out_n = req_comp;
   5165                if (!stbi__expand_png_palette(z, palette, pal_len, s->img_out_n))
   5166                   return 0;
   5167             } else if (has_trans) {
   5168                // non-paletted image with tRNS -> source image has (constant) alpha
   5169                ++s->img_n;
   5170             }
   5171             STBI_FREE(z->expanded); z->expanded = NULL;
   5172             // end of PNG chunk, read and skip CRC
   5173             stbi__get32be(s);
   5174             return 1;
   5175          }
   5176 
   5177          default:
   5178             // if critical, fail
   5179             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
   5180             if ((c.type & (1 << 29)) == 0) {
   5181                #ifndef STBI_NO_FAILURE_STRINGS
   5182                // not threadsafe
   5183                static char invalid_chunk[] = "XXXX PNG chunk not known";
   5184                invalid_chunk[0] = STBI__BYTECAST(c.type >> 24);
   5185                invalid_chunk[1] = STBI__BYTECAST(c.type >> 16);
   5186                invalid_chunk[2] = STBI__BYTECAST(c.type >>  8);
   5187                invalid_chunk[3] = STBI__BYTECAST(c.type >>  0);
   5188                #endif
   5189                return stbi__err(invalid_chunk, "PNG not supported: unknown PNG chunk type");
   5190             }
   5191             stbi__skip(s, c.length);
   5192             break;
   5193       }
   5194       // end of PNG chunk, read and skip CRC
   5195       stbi__get32be(s);
   5196    }
   5197 }
   5198 
   5199 static void *stbi__do_png(stbi__png *p, int *x, int *y, int *n, int req_comp, stbi__result_info *ri)
   5200 {
   5201    void *result=NULL;
   5202    if (req_comp < 0 || req_comp > 4) return stbi__errpuc("bad req_comp", "Internal error");
   5203    if (stbi__parse_png_file(p, STBI__SCAN_load, req_comp)) {
   5204       if (p->depth <= 8)
   5205          ri->bits_per_channel = 8;
   5206       else if (p->depth == 16)
   5207          ri->bits_per_channel = 16;
   5208       else
   5209          return stbi__errpuc("bad bits_per_channel", "PNG not supported: unsupported color depth");
   5210       result = p->out;
   5211       p->out = NULL;
   5212       if (req_comp && req_comp != p->s->img_out_n) {
   5213          if (ri->bits_per_channel == 8)
   5214             result = stbi__convert_format((unsigned char *) result, p->s->img_out_n, req_comp, p->s->img_x, p->s->img_y);
   5215          else
   5216             result = stbi__convert_format16((stbi__uint16 *) result, p->s->img_out_n, req_comp, p->s->img_x, p->s->img_y);
   5217          p->s->img_out_n = req_comp;
   5218          if (result == NULL) return result;
   5219       }
   5220       *x = p->s->img_x;
   5221       *y = p->s->img_y;
   5222       if (n) *n = p->s->img_n;
   5223    }
   5224    STBI_FREE(p->out);      p->out      = NULL;
   5225    STBI_FREE(p->expanded); p->expanded = NULL;
   5226    STBI_FREE(p->idata);    p->idata    = NULL;
   5227 
   5228    return result;
   5229 }
   5230 
   5231 static void *stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
   5232 {
   5233    stbi__png p;
   5234    p.s = s;
   5235    return stbi__do_png(&p, x,y,comp,req_comp, ri);
   5236 }
   5237 
   5238 static int stbi__png_test(stbi__context *s)
   5239 {
   5240    int r;
   5241    r = stbi__check_png_header(s);
   5242    stbi__rewind(s);
   5243    return r;
   5244 }
   5245 
   5246 static int stbi__png_info_raw(stbi__png *p, int *x, int *y, int *comp)
   5247 {
   5248    if (!stbi__parse_png_file(p, STBI__SCAN_header, 0)) {
   5249       stbi__rewind( p->s );
   5250       return 0;
   5251    }
   5252    if (x) *x = p->s->img_x;
   5253    if (y) *y = p->s->img_y;
   5254    if (comp) *comp = p->s->img_n;
   5255    return 1;
   5256 }
   5257 
   5258 static int stbi__png_info(stbi__context *s, int *x, int *y, int *comp)
   5259 {
   5260    stbi__png p;
   5261    p.s = s;
   5262    return stbi__png_info_raw(&p, x, y, comp);
   5263 }
   5264 
   5265 static int stbi__png_is16(stbi__context *s)
   5266 {
   5267    stbi__png p;
   5268    p.s = s;
   5269    if (!stbi__png_info_raw(&p, NULL, NULL, NULL))
   5270 	   return 0;
   5271    if (p.depth != 16) {
   5272       stbi__rewind(p.s);
   5273       return 0;
   5274    }
   5275    return 1;
   5276 }
   5277 #endif
   5278 
   5279 // Microsoft/Windows BMP image
   5280 
   5281 #ifndef STBI_NO_BMP
   5282 static int stbi__bmp_test_raw(stbi__context *s)
   5283 {
   5284    int r;
   5285    int sz;
   5286    if (stbi__get8(s) != 'B') return 0;
   5287    if (stbi__get8(s) != 'M') return 0;
   5288    stbi__get32le(s); // discard filesize
   5289    stbi__get16le(s); // discard reserved
   5290    stbi__get16le(s); // discard reserved
   5291    stbi__get32le(s); // discard data offset
   5292    sz = stbi__get32le(s);
   5293    r = (sz == 12 || sz == 40 || sz == 56 || sz == 108 || sz == 124);
   5294    return r;
   5295 }
   5296 
   5297 static int stbi__bmp_test(stbi__context *s)
   5298 {
   5299    int r = stbi__bmp_test_raw(s);
   5300    stbi__rewind(s);
   5301    return r;
   5302 }
   5303 
   5304 
   5305 // returns 0..31 for the highest set bit
   5306 static int stbi__high_bit(unsigned int z)
   5307 {
   5308    int n=0;
   5309    if (z == 0) return -1;
   5310    if (z >= 0x10000) { n += 16; z >>= 16; }
   5311    if (z >= 0x00100) { n +=  8; z >>=  8; }
   5312    if (z >= 0x00010) { n +=  4; z >>=  4; }
   5313    if (z >= 0x00004) { n +=  2; z >>=  2; }
   5314    if (z >= 0x00002) { n +=  1;/* >>=  1;*/ }
   5315    return n;
   5316 }
   5317 
   5318 static int stbi__bitcount(unsigned int a)
   5319 {
   5320    a = (a & 0x55555555) + ((a >>  1) & 0x55555555); // max 2
   5321    a = (a & 0x33333333) + ((a >>  2) & 0x33333333); // max 4
   5322    a = (a + (a >> 4)) & 0x0f0f0f0f; // max 8 per 4, now 8 bits
   5323    a = (a + (a >> 8)); // max 16 per 8 bits
   5324    a = (a + (a >> 16)); // max 32 per 8 bits
   5325    return a & 0xff;
   5326 }
   5327 
   5328 // extract an arbitrarily-aligned N-bit value (N=bits)
   5329 // from v, and then make it 8-bits long and fractionally
   5330 // extend it to full full range.
   5331 static int stbi__shiftsigned(unsigned int v, int shift, int bits)
   5332 {
   5333    static unsigned int mul_table[9] = {
   5334       0,
   5335       0xff/*0b11111111*/, 0x55/*0b01010101*/, 0x49/*0b01001001*/, 0x11/*0b00010001*/,
   5336       0x21/*0b00100001*/, 0x41/*0b01000001*/, 0x81/*0b10000001*/, 0x01/*0b00000001*/,
   5337    };
   5338    static unsigned int shift_table[9] = {
   5339       0, 0,0,1,0,2,4,6,0,
   5340    };
   5341    if (shift < 0)
   5342       v <<= -shift;
   5343    else
   5344       v >>= shift;
   5345    STBI_ASSERT(v < 256);
   5346    v >>= (8-bits);
   5347    STBI_ASSERT(bits >= 0 && bits <= 8);
   5348    return (int) ((unsigned) v * mul_table[bits]) >> shift_table[bits];
   5349 }
   5350 
   5351 typedef struct
   5352 {
   5353    int bpp, offset, hsz;
   5354    unsigned int mr,mg,mb,ma, all_a;
   5355    int extra_read;
   5356 } stbi__bmp_data;
   5357 
   5358 static int stbi__bmp_set_mask_defaults(stbi__bmp_data *info, int compress)
   5359 {
   5360    // BI_BITFIELDS specifies masks explicitly, don't override
   5361    if (compress == 3)
   5362       return 1;
   5363 
   5364    if (compress == 0) {
   5365       if (info->bpp == 16) {
   5366          info->mr = 31u << 10;
   5367          info->mg = 31u <<  5;
   5368          info->mb = 31u <<  0;
   5369       } else if (info->bpp == 32) {
   5370          info->mr = 0xffu << 16;
   5371          info->mg = 0xffu <<  8;
   5372          info->mb = 0xffu <<  0;
   5373          info->ma = 0xffu << 24;
   5374          info->all_a = 0; // if all_a is 0 at end, then we loaded alpha channel but it was all 0
   5375       } else {
   5376          // otherwise, use defaults, which is all-0
   5377          info->mr = info->mg = info->mb = info->ma = 0;
   5378       }
   5379       return 1;
   5380    }
   5381    return 0; // error
   5382 }
   5383 
   5384 static void *stbi__bmp_parse_header(stbi__context *s, stbi__bmp_data *info)
   5385 {
   5386    int hsz;
   5387    if (stbi__get8(s) != 'B' || stbi__get8(s) != 'M') return stbi__errpuc("not BMP", "Corrupt BMP");
   5388    stbi__get32le(s); // discard filesize
   5389    stbi__get16le(s); // discard reserved
   5390    stbi__get16le(s); // discard reserved
   5391    info->offset = stbi__get32le(s);
   5392    info->hsz = hsz = stbi__get32le(s);
   5393    info->mr = info->mg = info->mb = info->ma = 0;
   5394    info->extra_read = 14;
   5395 
   5396    if (info->offset < 0) return stbi__errpuc("bad BMP", "bad BMP");
   5397 
   5398    if (hsz != 12 && hsz != 40 && hsz != 56 && hsz != 108 && hsz != 124) return stbi__errpuc("unknown BMP", "BMP type not supported: unknown");
   5399    if (hsz == 12) {
   5400       s->img_x = stbi__get16le(s);
   5401       s->img_y = stbi__get16le(s);
   5402    } else {
   5403       s->img_x = stbi__get32le(s);
   5404       s->img_y = stbi__get32le(s);
   5405    }
   5406    if (stbi__get16le(s) != 1) return stbi__errpuc("bad BMP", "bad BMP");
   5407    info->bpp = stbi__get16le(s);
   5408    if (hsz != 12) {
   5409       int compress = stbi__get32le(s);
   5410       if (compress == 1 || compress == 2) return stbi__errpuc("BMP RLE", "BMP type not supported: RLE");
   5411       if (compress >= 4) return stbi__errpuc("BMP JPEG/PNG", "BMP type not supported: unsupported compression"); // this includes PNG/JPEG modes
   5412       if (compress == 3 && info->bpp != 16 && info->bpp != 32) return stbi__errpuc("bad BMP", "bad BMP"); // bitfields requires 16 or 32 bits/pixel
   5413       stbi__get32le(s); // discard sizeof
   5414       stbi__get32le(s); // discard hres
   5415       stbi__get32le(s); // discard vres
   5416       stbi__get32le(s); // discard colorsused
   5417       stbi__get32le(s); // discard max important
   5418       if (hsz == 40 || hsz == 56) {
   5419          if (hsz == 56) {
   5420             stbi__get32le(s);
   5421             stbi__get32le(s);
   5422             stbi__get32le(s);
   5423             stbi__get32le(s);
   5424          }
   5425          if (info->bpp == 16 || info->bpp == 32) {
   5426             if (compress == 0) {
   5427                stbi__bmp_set_mask_defaults(info, compress);
   5428             } else if (compress == 3) {
   5429                info->mr = stbi__get32le(s);
   5430                info->mg = stbi__get32le(s);
   5431                info->mb = stbi__get32le(s);
   5432                info->extra_read += 12;
   5433                // not documented, but generated by photoshop and handled by mspaint
   5434                if (info->mr == info->mg && info->mg == info->mb) {
   5435                   // ?!?!?
   5436                   return stbi__errpuc("bad BMP", "bad BMP");
   5437                }
   5438             } else
   5439                return stbi__errpuc("bad BMP", "bad BMP");
   5440          }
   5441       } else {
   5442          // V4/V5 header
   5443          int i;
   5444          if (hsz != 108 && hsz != 124)
   5445             return stbi__errpuc("bad BMP", "bad BMP");
   5446          info->mr = stbi__get32le(s);
   5447          info->mg = stbi__get32le(s);
   5448          info->mb = stbi__get32le(s);
   5449          info->ma = stbi__get32le(s);
   5450          if (compress != 3) // override mr/mg/mb unless in BI_BITFIELDS mode, as per docs
   5451             stbi__bmp_set_mask_defaults(info, compress);
   5452          stbi__get32le(s); // discard color space
   5453          for (i=0; i < 12; ++i)
   5454             stbi__get32le(s); // discard color space parameters
   5455          if (hsz == 124) {
   5456             stbi__get32le(s); // discard rendering intent
   5457             stbi__get32le(s); // discard offset of profile data
   5458             stbi__get32le(s); // discard size of profile data
   5459             stbi__get32le(s); // discard reserved
   5460          }
   5461       }
   5462    }
   5463    return (void *) 1;
   5464 }
   5465 
   5466 
   5467 static void *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
   5468 {
   5469    stbi_uc *out;
   5470    unsigned int mr=0,mg=0,mb=0,ma=0, all_a;
   5471    stbi_uc pal[256][4];
   5472    int psize=0,i,j,width;
   5473    int flip_vertically, pad, target;
   5474    stbi__bmp_data info;
   5475    STBI_NOTUSED(ri);
   5476 
   5477    info.all_a = 255;
   5478    if (stbi__bmp_parse_header(s, &info) == NULL)
   5479       return NULL; // error code already set
   5480 
   5481    flip_vertically = ((int) s->img_y) > 0;
   5482    s->img_y = abs((int) s->img_y);
   5483 
   5484    if (s->img_y > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
   5485    if (s->img_x > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
   5486 
   5487    mr = info.mr;
   5488    mg = info.mg;
   5489    mb = info.mb;
   5490    ma = info.ma;
   5491    all_a = info.all_a;
   5492 
   5493    if (info.hsz == 12) {
   5494       if (info.bpp < 24)
   5495          psize = (info.offset - info.extra_read - 24) / 3;
   5496    } else {
   5497       if (info.bpp < 16)
   5498          psize = (info.offset - info.extra_read - info.hsz) >> 2;
   5499    }
   5500    if (psize == 0) {
   5501       if (info.offset != s->callback_already_read + (s->img_buffer - s->img_buffer_original)) {
   5502         return stbi__errpuc("bad offset", "Corrupt BMP");
   5503       }
   5504    }
   5505 
   5506    if (info.bpp == 24 && ma == 0xff000000)
   5507       s->img_n = 3;
   5508    else
   5509       s->img_n = ma ? 4 : 3;
   5510    if (req_comp && req_comp >= 3) // we can directly decode 3 or 4
   5511       target = req_comp;
   5512    else
   5513       target = s->img_n; // if they want monochrome, we'll post-convert
   5514 
   5515    // sanity-check size
   5516    if (!stbi__mad3sizes_valid(target, s->img_x, s->img_y, 0))
   5517       return stbi__errpuc("too large", "Corrupt BMP");
   5518 
   5519    out = (stbi_uc *) stbi__malloc_mad3(target, s->img_x, s->img_y, 0);
   5520    if (!out) return stbi__errpuc("outofmem", "Out of memory");
   5521    if (info.bpp < 16) {
   5522       int z=0;
   5523       if (psize == 0 || psize > 256) { STBI_FREE(out); return stbi__errpuc("invalid", "Corrupt BMP"); }
   5524       for (i=0; i < psize; ++i) {
   5525          pal[i][2] = stbi__get8(s);
   5526          pal[i][1] = stbi__get8(s);
   5527          pal[i][0] = stbi__get8(s);
   5528          if (info.hsz != 12) stbi__get8(s);
   5529          pal[i][3] = 255;
   5530       }
   5531       stbi__skip(s, info.offset - info.extra_read - info.hsz - psize * (info.hsz == 12 ? 3 : 4));
   5532       if (info.bpp == 1) width = (s->img_x + 7) >> 3;
   5533       else if (info.bpp == 4) width = (s->img_x + 1) >> 1;
   5534       else if (info.bpp == 8) width = s->img_x;
   5535       else { STBI_FREE(out); return stbi__errpuc("bad bpp", "Corrupt BMP"); }
   5536       pad = (-width)&3;
   5537       if (info.bpp == 1) {
   5538          for (j=0; j < (int) s->img_y; ++j) {
   5539             int bit_offset = 7, v = stbi__get8(s);
   5540             for (i=0; i < (int) s->img_x; ++i) {
   5541                int color = (v>>bit_offset)&0x1;
   5542                out[z++] = pal[color][0];
   5543                out[z++] = pal[color][1];
   5544                out[z++] = pal[color][2];
   5545                if (target == 4) out[z++] = 255;
   5546                if (i+1 == (int) s->img_x) break;
   5547                if((--bit_offset) < 0) {
   5548                   bit_offset = 7;
   5549                   v = stbi__get8(s);
   5550                }
   5551             }
   5552             stbi__skip(s, pad);
   5553          }
   5554       } else {
   5555          for (j=0; j < (int) s->img_y; ++j) {
   5556             for (i=0; i < (int) s->img_x; i += 2) {
   5557                int v=stbi__get8(s),v2=0;
   5558                if (info.bpp == 4) {
   5559                   v2 = v & 15;
   5560                   v >>= 4;
   5561                }
   5562                out[z++] = pal[v][0];
   5563                out[z++] = pal[v][1];
   5564                out[z++] = pal[v][2];
   5565                if (target == 4) out[z++] = 255;
   5566                if (i+1 == (int) s->img_x) break;
   5567                v = (info.bpp == 8) ? stbi__get8(s) : v2;
   5568                out[z++] = pal[v][0];
   5569                out[z++] = pal[v][1];
   5570                out[z++] = pal[v][2];
   5571                if (target == 4) out[z++] = 255;
   5572             }
   5573             stbi__skip(s, pad);
   5574          }
   5575       }
   5576    } else {
   5577       int rshift=0,gshift=0,bshift=0,ashift=0,rcount=0,gcount=0,bcount=0,acount=0;
   5578       int z = 0;
   5579       int easy=0;
   5580       stbi__skip(s, info.offset - info.extra_read - info.hsz);
   5581       if (info.bpp == 24) width = 3 * s->img_x;
   5582       else if (info.bpp == 16) width = 2*s->img_x;
   5583       else /* bpp = 32 and pad = 0 */ width=0;
   5584       pad = (-width) & 3;
   5585       if (info.bpp == 24) {
   5586          easy = 1;
   5587       } else if (info.bpp == 32) {
   5588          if (mb == 0xff && mg == 0xff00 && mr == 0x00ff0000 && ma == 0xff000000)
   5589             easy = 2;
   5590       }
   5591       if (!easy) {
   5592          if (!mr || !mg || !mb) { STBI_FREE(out); return stbi__errpuc("bad masks", "Corrupt BMP"); }
   5593          // right shift amt to put high bit in position #7
   5594          rshift = stbi__high_bit(mr)-7; rcount = stbi__bitcount(mr);
   5595          gshift = stbi__high_bit(mg)-7; gcount = stbi__bitcount(mg);
   5596          bshift = stbi__high_bit(mb)-7; bcount = stbi__bitcount(mb);
   5597          ashift = stbi__high_bit(ma)-7; acount = stbi__bitcount(ma);
   5598          if (rcount > 8 || gcount > 8 || bcount > 8 || acount > 8) { STBI_FREE(out); return stbi__errpuc("bad masks", "Corrupt BMP"); }
   5599       }
   5600       for (j=0; j < (int) s->img_y; ++j) {
   5601          if (easy) {
   5602             for (i=0; i < (int) s->img_x; ++i) {
   5603                unsigned char a;
   5604                out[z+2] = stbi__get8(s);
   5605                out[z+1] = stbi__get8(s);
   5606                out[z+0] = stbi__get8(s);
   5607                z += 3;
   5608                a = (easy == 2 ? stbi__get8(s) : 255);
   5609                all_a |= a;
   5610                if (target == 4) out[z++] = a;
   5611             }
   5612          } else {
   5613             int bpp = info.bpp;
   5614             for (i=0; i < (int) s->img_x; ++i) {
   5615                stbi__uint32 v = (bpp == 16 ? (stbi__uint32) stbi__get16le(s) : stbi__get32le(s));
   5616                unsigned int a;
   5617                out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mr, rshift, rcount));
   5618                out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mg, gshift, gcount));
   5619                out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mb, bshift, bcount));
   5620                a = (ma ? stbi__shiftsigned(v & ma, ashift, acount) : 255);
   5621                all_a |= a;
   5622                if (target == 4) out[z++] = STBI__BYTECAST(a);
   5623             }
   5624          }
   5625          stbi__skip(s, pad);
   5626       }
   5627    }
   5628 
   5629    // if alpha channel is all 0s, replace with all 255s
   5630    if (target == 4 && all_a == 0)
   5631       for (i=4*s->img_x*s->img_y-1; i >= 0; i -= 4)
   5632          out[i] = 255;
   5633 
   5634    if (flip_vertically) {
   5635       stbi_uc t;
   5636       for (j=0; j < (int) s->img_y>>1; ++j) {
   5637          stbi_uc *p1 = out +      j     *s->img_x*target;
   5638          stbi_uc *p2 = out + (s->img_y-1-j)*s->img_x*target;
   5639          for (i=0; i < (int) s->img_x*target; ++i) {
   5640             t = p1[i]; p1[i] = p2[i]; p2[i] = t;
   5641          }
   5642       }
   5643    }
   5644 
   5645    if (req_comp && req_comp != target) {
   5646       out = stbi__convert_format(out, target, req_comp, s->img_x, s->img_y);
   5647       if (out == NULL) return out; // stbi__convert_format frees input on failure
   5648    }
   5649 
   5650    *x = s->img_x;
   5651    *y = s->img_y;
   5652    if (comp) *comp = s->img_n;
   5653    return out;
   5654 }
   5655 #endif
   5656 
   5657 // Targa Truevision - TGA
   5658 // by Jonathan Dummer
   5659 #ifndef STBI_NO_TGA
   5660 // returns STBI_rgb or whatever, 0 on error
   5661 static int stbi__tga_get_comp(int bits_per_pixel, int is_grey, int* is_rgb16)
   5662 {
   5663    // only RGB or RGBA (incl. 16bit) or grey allowed
   5664    if (is_rgb16) *is_rgb16 = 0;
   5665    switch(bits_per_pixel) {
   5666       case 8:  return STBI_grey;
   5667       case 16: if(is_grey) return STBI_grey_alpha;
   5668                // fallthrough
   5669       case 15: if(is_rgb16) *is_rgb16 = 1;
   5670                return STBI_rgb;
   5671       case 24: // fallthrough
   5672       case 32: return bits_per_pixel/8;
   5673       default: return 0;
   5674    }
   5675 }
   5676 
   5677 static int stbi__tga_info(stbi__context *s, int *x, int *y, int *comp)
   5678 {
   5679     int tga_w, tga_h, tga_comp, tga_image_type, tga_bits_per_pixel, tga_colormap_bpp;
   5680     int sz, tga_colormap_type;
   5681     stbi__get8(s);                   // discard Offset
   5682     tga_colormap_type = stbi__get8(s); // colormap type
   5683     if( tga_colormap_type > 1 ) {
   5684         stbi__rewind(s);
   5685         return 0;      // only RGB or indexed allowed
   5686     }
   5687     tga_image_type = stbi__get8(s); // image type
   5688     if ( tga_colormap_type == 1 ) { // colormapped (paletted) image
   5689         if (tga_image_type != 1 && tga_image_type != 9) {
   5690             stbi__rewind(s);
   5691             return 0;
   5692         }
   5693         stbi__skip(s,4);       // skip index of first colormap entry and number of entries
   5694         sz = stbi__get8(s);    //   check bits per palette color entry
   5695         if ( (sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32) ) {
   5696             stbi__rewind(s);
   5697             return 0;
   5698         }
   5699         stbi__skip(s,4);       // skip image x and y origin
   5700         tga_colormap_bpp = sz;
   5701     } else { // "normal" image w/o colormap - only RGB or grey allowed, +/- RLE
   5702         if ( (tga_image_type != 2) && (tga_image_type != 3) && (tga_image_type != 10) && (tga_image_type != 11) ) {
   5703             stbi__rewind(s);
   5704             return 0; // only RGB or grey allowed, +/- RLE
   5705         }
   5706         stbi__skip(s,9); // skip colormap specification and image x/y origin
   5707         tga_colormap_bpp = 0;
   5708     }
   5709     tga_w = stbi__get16le(s);
   5710     if( tga_w < 1 ) {
   5711         stbi__rewind(s);
   5712         return 0;   // test width
   5713     }
   5714     tga_h = stbi__get16le(s);
   5715     if( tga_h < 1 ) {
   5716         stbi__rewind(s);
   5717         return 0;   // test height
   5718     }
   5719     tga_bits_per_pixel = stbi__get8(s); // bits per pixel
   5720     stbi__get8(s); // ignore alpha bits
   5721     if (tga_colormap_bpp != 0) {
   5722         if((tga_bits_per_pixel != 8) && (tga_bits_per_pixel != 16)) {
   5723             // when using a colormap, tga_bits_per_pixel is the size of the indexes
   5724             // I don't think anything but 8 or 16bit indexes makes sense
   5725             stbi__rewind(s);
   5726             return 0;
   5727         }
   5728         tga_comp = stbi__tga_get_comp(tga_colormap_bpp, 0, NULL);
   5729     } else {
   5730         tga_comp = stbi__tga_get_comp(tga_bits_per_pixel, (tga_image_type == 3) || (tga_image_type == 11), NULL);
   5731     }
   5732     if(!tga_comp) {
   5733       stbi__rewind(s);
   5734       return 0;
   5735     }
   5736     if (x) *x = tga_w;
   5737     if (y) *y = tga_h;
   5738     if (comp) *comp = tga_comp;
   5739     return 1;                   // seems to have passed everything
   5740 }
   5741 
   5742 static int stbi__tga_test(stbi__context *s)
   5743 {
   5744    int res = 0;
   5745    int sz, tga_color_type;
   5746    stbi__get8(s);      //   discard Offset
   5747    tga_color_type = stbi__get8(s);   //   color type
   5748    if ( tga_color_type > 1 ) goto errorEnd;   //   only RGB or indexed allowed
   5749    sz = stbi__get8(s);   //   image type
   5750    if ( tga_color_type == 1 ) { // colormapped (paletted) image
   5751       if (sz != 1 && sz != 9) goto errorEnd; // colortype 1 demands image type 1 or 9
   5752       stbi__skip(s,4);       // skip index of first colormap entry and number of entries
   5753       sz = stbi__get8(s);    //   check bits per palette color entry
   5754       if ( (sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32) ) goto errorEnd;
   5755       stbi__skip(s,4);       // skip image x and y origin
   5756    } else { // "normal" image w/o colormap
   5757       if ( (sz != 2) && (sz != 3) && (sz != 10) && (sz != 11) ) goto errorEnd; // only RGB or grey allowed, +/- RLE
   5758       stbi__skip(s,9); // skip colormap specification and image x/y origin
   5759    }
   5760    if ( stbi__get16le(s) < 1 ) goto errorEnd;      //   test width
   5761    if ( stbi__get16le(s) < 1 ) goto errorEnd;      //   test height
   5762    sz = stbi__get8(s);   //   bits per pixel
   5763    if ( (tga_color_type == 1) && (sz != 8) && (sz != 16) ) goto errorEnd; // for colormapped images, bpp is size of an index
   5764    if ( (sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32) ) goto errorEnd;
   5765 
   5766    res = 1; // if we got this far, everything's good and we can return 1 instead of 0
   5767 
   5768 errorEnd:
   5769    stbi__rewind(s);
   5770    return res;
   5771 }
   5772 
   5773 // read 16bit value and convert to 24bit RGB
   5774 static void stbi__tga_read_rgb16(stbi__context *s, stbi_uc* out)
   5775 {
   5776    stbi__uint16 px = (stbi__uint16)stbi__get16le(s);
   5777    stbi__uint16 fiveBitMask = 31;
   5778    // we have 3 channels with 5bits each
   5779    int r = (px >> 10) & fiveBitMask;
   5780    int g = (px >> 5) & fiveBitMask;
   5781    int b = px & fiveBitMask;
   5782    // Note that this saves the data in RGB(A) order, so it doesn't need to be swapped later
   5783    out[0] = (stbi_uc)((r * 255)/31);
   5784    out[1] = (stbi_uc)((g * 255)/31);
   5785    out[2] = (stbi_uc)((b * 255)/31);
   5786 
   5787    // some people claim that the most significant bit might be used for alpha
   5788    // (possibly if an alpha-bit is set in the "image descriptor byte")
   5789    // but that only made 16bit test images completely translucent..
   5790    // so let's treat all 15 and 16bit TGAs as RGB with no alpha.
   5791 }
   5792 
   5793 static void *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
   5794 {
   5795    //   read in the TGA header stuff
   5796    int tga_offset = stbi__get8(s);
   5797    int tga_indexed = stbi__get8(s);
   5798    int tga_image_type = stbi__get8(s);
   5799    int tga_is_RLE = 0;
   5800    int tga_palette_start = stbi__get16le(s);
   5801    int tga_palette_len = stbi__get16le(s);
   5802    int tga_palette_bits = stbi__get8(s);
   5803    int tga_x_origin = stbi__get16le(s);
   5804    int tga_y_origin = stbi__get16le(s);
   5805    int tga_width = stbi__get16le(s);
   5806    int tga_height = stbi__get16le(s);
   5807    int tga_bits_per_pixel = stbi__get8(s);
   5808    int tga_comp, tga_rgb16=0;
   5809    int tga_inverted = stbi__get8(s);
   5810    // int tga_alpha_bits = tga_inverted & 15; // the 4 lowest bits - unused (useless?)
   5811    //   image data
   5812    unsigned char *tga_data;
   5813    unsigned char *tga_palette = NULL;
   5814    int i, j;
   5815    unsigned char raw_data[4] = {0};
   5816    int RLE_count = 0;
   5817    int RLE_repeating = 0;
   5818    int read_next_pixel = 1;
   5819    STBI_NOTUSED(ri);
   5820    STBI_NOTUSED(tga_x_origin); // @TODO
   5821    STBI_NOTUSED(tga_y_origin); // @TODO
   5822 
   5823    if (tga_height > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
   5824    if (tga_width > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
   5825 
   5826    //   do a tiny bit of precessing
   5827    if ( tga_image_type >= 8 )
   5828    {
   5829       tga_image_type -= 8;
   5830       tga_is_RLE = 1;
   5831    }
   5832    tga_inverted = 1 - ((tga_inverted >> 5) & 1);
   5833 
   5834    //   If I'm paletted, then I'll use the number of bits from the palette
   5835    if ( tga_indexed ) tga_comp = stbi__tga_get_comp(tga_palette_bits, 0, &tga_rgb16);
   5836    else tga_comp = stbi__tga_get_comp(tga_bits_per_pixel, (tga_image_type == 3), &tga_rgb16);
   5837 
   5838    if(!tga_comp) // shouldn't really happen, stbi__tga_test() should have ensured basic consistency
   5839       return stbi__errpuc("bad format", "Can't find out TGA pixelformat");
   5840 
   5841    //   tga info
   5842    *x = tga_width;
   5843    *y = tga_height;
   5844    if (comp) *comp = tga_comp;
   5845 
   5846    if (!stbi__mad3sizes_valid(tga_width, tga_height, tga_comp, 0))
   5847       return stbi__errpuc("too large", "Corrupt TGA");
   5848 
   5849    tga_data = (unsigned char*)stbi__malloc_mad3(tga_width, tga_height, tga_comp, 0);
   5850    if (!tga_data) return stbi__errpuc("outofmem", "Out of memory");
   5851 
   5852    // skip to the data's starting position (offset usually = 0)
   5853    stbi__skip(s, tga_offset );
   5854 
   5855    if ( !tga_indexed && !tga_is_RLE && !tga_rgb16 ) {
   5856       for (i=0; i < tga_height; ++i) {
   5857          int row = tga_inverted ? tga_height -i - 1 : i;
   5858          stbi_uc *tga_row = tga_data + row*tga_width*tga_comp;
   5859          stbi__getn(s, tga_row, tga_width * tga_comp);
   5860       }
   5861    } else  {
   5862       //   do I need to load a palette?
   5863       if ( tga_indexed)
   5864       {
   5865          if (tga_palette_len == 0) {  /* you have to have at least one entry! */
   5866             STBI_FREE(tga_data);
   5867             return stbi__errpuc("bad palette", "Corrupt TGA");
   5868          }
   5869 
   5870          //   any data to skip? (offset usually = 0)
   5871          stbi__skip(s, tga_palette_start );
   5872          //   load the palette
   5873          tga_palette = (unsigned char*)stbi__malloc_mad2(tga_palette_len, tga_comp, 0);
   5874          if (!tga_palette) {
   5875             STBI_FREE(tga_data);
   5876             return stbi__errpuc("outofmem", "Out of memory");
   5877          }
   5878          if (tga_rgb16) {
   5879             stbi_uc *pal_entry = tga_palette;
   5880             STBI_ASSERT(tga_comp == STBI_rgb);
   5881             for (i=0; i < tga_palette_len; ++i) {
   5882                stbi__tga_read_rgb16(s, pal_entry);
   5883                pal_entry += tga_comp;
   5884             }
   5885          } else if (!stbi__getn(s, tga_palette, tga_palette_len * tga_comp)) {
   5886                STBI_FREE(tga_data);
   5887                STBI_FREE(tga_palette);
   5888                return stbi__errpuc("bad palette", "Corrupt TGA");
   5889          }
   5890       }
   5891       //   load the data
   5892       for (i=0; i < tga_width * tga_height; ++i)
   5893       {
   5894          //   if I'm in RLE mode, do I need to get a RLE stbi__pngchunk?
   5895          if ( tga_is_RLE )
   5896          {
   5897             if ( RLE_count == 0 )
   5898             {
   5899                //   yep, get the next byte as a RLE command
   5900                int RLE_cmd = stbi__get8(s);
   5901                RLE_count = 1 + (RLE_cmd & 127);
   5902                RLE_repeating = RLE_cmd >> 7;
   5903                read_next_pixel = 1;
   5904             } else if ( !RLE_repeating )
   5905             {
   5906                read_next_pixel = 1;
   5907             }
   5908          } else
   5909          {
   5910             read_next_pixel = 1;
   5911          }
   5912          //   OK, if I need to read a pixel, do it now
   5913          if ( read_next_pixel )
   5914          {
   5915             //   load however much data we did have
   5916             if ( tga_indexed )
   5917             {
   5918                // read in index, then perform the lookup
   5919                int pal_idx = (tga_bits_per_pixel == 8) ? stbi__get8(s) : stbi__get16le(s);
   5920                if ( pal_idx >= tga_palette_len ) {
   5921                   // invalid index
   5922                   pal_idx = 0;
   5923                }
   5924                pal_idx *= tga_comp;
   5925                for (j = 0; j < tga_comp; ++j) {
   5926                   raw_data[j] = tga_palette[pal_idx+j];
   5927                }
   5928             } else if(tga_rgb16) {
   5929                STBI_ASSERT(tga_comp == STBI_rgb);
   5930                stbi__tga_read_rgb16(s, raw_data);
   5931             } else {
   5932                //   read in the data raw
   5933                for (j = 0; j < tga_comp; ++j) {
   5934                   raw_data[j] = stbi__get8(s);
   5935                }
   5936             }
   5937             //   clear the reading flag for the next pixel
   5938             read_next_pixel = 0;
   5939          } // end of reading a pixel
   5940 
   5941          // copy data
   5942          for (j = 0; j < tga_comp; ++j)
   5943            tga_data[i*tga_comp+j] = raw_data[j];
   5944 
   5945          //   in case we're in RLE mode, keep counting down
   5946          --RLE_count;
   5947       }
   5948       //   do I need to invert the image?
   5949       if ( tga_inverted )
   5950       {
   5951          for (j = 0; j*2 < tga_height; ++j)
   5952          {
   5953             int index1 = j * tga_width * tga_comp;
   5954             int index2 = (tga_height - 1 - j) * tga_width * tga_comp;
   5955             for (i = tga_width * tga_comp; i > 0; --i)
   5956             {
   5957                unsigned char temp = tga_data[index1];
   5958                tga_data[index1] = tga_data[index2];
   5959                tga_data[index2] = temp;
   5960                ++index1;
   5961                ++index2;
   5962             }
   5963          }
   5964       }
   5965       //   clear my palette, if I had one
   5966       if ( tga_palette != NULL )
   5967       {
   5968          STBI_FREE( tga_palette );
   5969       }
   5970    }
   5971 
   5972    // swap RGB - if the source data was RGB16, it already is in the right order
   5973    if (tga_comp >= 3 && !tga_rgb16)
   5974    {
   5975       unsigned char* tga_pixel = tga_data;
   5976       for (i=0; i < tga_width * tga_height; ++i)
   5977       {
   5978          unsigned char temp = tga_pixel[0];
   5979          tga_pixel[0] = tga_pixel[2];
   5980          tga_pixel[2] = temp;
   5981          tga_pixel += tga_comp;
   5982       }
   5983    }
   5984 
   5985    // convert to target component count
   5986    if (req_comp && req_comp != tga_comp)
   5987       tga_data = stbi__convert_format(tga_data, tga_comp, req_comp, tga_width, tga_height);
   5988 
   5989    //   the things I do to get rid of an error message, and yet keep
   5990    //   Microsoft's C compilers happy... [8^(
   5991    tga_palette_start = tga_palette_len = tga_palette_bits =
   5992          tga_x_origin = tga_y_origin = 0;
   5993    STBI_NOTUSED(tga_palette_start);
   5994    //   OK, done
   5995    return tga_data;
   5996 }
   5997 #endif
   5998 
   5999 // *************************************************************************************************
   6000 // Photoshop PSD loader -- PD by Thatcher Ulrich, integration by Nicolas Schulz, tweaked by STB
   6001 
   6002 #ifndef STBI_NO_PSD
   6003 static int stbi__psd_test(stbi__context *s)
   6004 {
   6005    int r = (stbi__get32be(s) == 0x38425053);
   6006    stbi__rewind(s);
   6007    return r;
   6008 }
   6009 
   6010 static int stbi__psd_decode_rle(stbi__context *s, stbi_uc *p, int pixelCount)
   6011 {
   6012    int count, nleft, len;
   6013 
   6014    count = 0;
   6015    while ((nleft = pixelCount - count) > 0) {
   6016       len = stbi__get8(s);
   6017       if (len == 128) {
   6018          // No-op.
   6019       } else if (len < 128) {
   6020          // Copy next len+1 bytes literally.
   6021          len++;
   6022          if (len > nleft) return 0; // corrupt data
   6023          count += len;
   6024          while (len) {
   6025             *p = stbi__get8(s);
   6026             p += 4;
   6027             len--;
   6028          }
   6029       } else if (len > 128) {
   6030          stbi_uc   val;
   6031          // Next -len+1 bytes in the dest are replicated from next source byte.
   6032          // (Interpret len as a negative 8-bit int.)
   6033          len = 257 - len;
   6034          if (len > nleft) return 0; // corrupt data
   6035          val = stbi__get8(s);
   6036          count += len;
   6037          while (len) {
   6038             *p = val;
   6039             p += 4;
   6040             len--;
   6041          }
   6042       }
   6043    }
   6044 
   6045    return 1;
   6046 }
   6047 
   6048 static void *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri, int bpc)
   6049 {
   6050    int pixelCount;
   6051    int channelCount, compression;
   6052    int channel, i;
   6053    int bitdepth;
   6054    int w,h;
   6055    stbi_uc *out;
   6056    STBI_NOTUSED(ri);
   6057 
   6058    // Check identifier
   6059    if (stbi__get32be(s) != 0x38425053)   // "8BPS"
   6060       return stbi__errpuc("not PSD", "Corrupt PSD image");
   6061 
   6062    // Check file type version.
   6063    if (stbi__get16be(s) != 1)
   6064       return stbi__errpuc("wrong version", "Unsupported version of PSD image");
   6065 
   6066    // Skip 6 reserved bytes.
   6067    stbi__skip(s, 6 );
   6068 
   6069    // Read the number of channels (R, G, B, A, etc).
   6070    channelCount = stbi__get16be(s);
   6071    if (channelCount < 0 || channelCount > 16)
   6072       return stbi__errpuc("wrong channel count", "Unsupported number of channels in PSD image");
   6073 
   6074    // Read the rows and columns of the image.
   6075    h = stbi__get32be(s);
   6076    w = stbi__get32be(s);
   6077 
   6078    if (h > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
   6079    if (w > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
   6080 
   6081    // Make sure the depth is 8 bits.
   6082    bitdepth = stbi__get16be(s);
   6083    if (bitdepth != 8 && bitdepth != 16)
   6084       return stbi__errpuc("unsupported bit depth", "PSD bit depth is not 8 or 16 bit");
   6085 
   6086    // Make sure the color mode is RGB.
   6087    // Valid options are:
   6088    //   0: Bitmap
   6089    //   1: Grayscale
   6090    //   2: Indexed color
   6091    //   3: RGB color
   6092    //   4: CMYK color
   6093    //   7: Multichannel
   6094    //   8: Duotone
   6095    //   9: Lab color
   6096    if (stbi__get16be(s) != 3)
   6097       return stbi__errpuc("wrong color format", "PSD is not in RGB color format");
   6098 
   6099    // Skip the Mode Data.  (It's the palette for indexed color; other info for other modes.)
   6100    stbi__skip(s,stbi__get32be(s) );
   6101 
   6102    // Skip the image resources.  (resolution, pen tool paths, etc)
   6103    stbi__skip(s, stbi__get32be(s) );
   6104 
   6105    // Skip the reserved data.
   6106    stbi__skip(s, stbi__get32be(s) );
   6107 
   6108    // Find out if the data is compressed.
   6109    // Known values:
   6110    //   0: no compression
   6111    //   1: RLE compressed
   6112    compression = stbi__get16be(s);
   6113    if (compression > 1)
   6114       return stbi__errpuc("bad compression", "PSD has an unknown compression format");
   6115 
   6116    // Check size
   6117    if (!stbi__mad3sizes_valid(4, w, h, 0))
   6118       return stbi__errpuc("too large", "Corrupt PSD");
   6119 
   6120    // Create the destination image.
   6121 
   6122    if (!compression && bitdepth == 16 && bpc == 16) {
   6123       out = (stbi_uc *) stbi__malloc_mad3(8, w, h, 0);
   6124       ri->bits_per_channel = 16;
   6125    } else
   6126       out = (stbi_uc *) stbi__malloc(4 * w*h);
   6127 
   6128    if (!out) return stbi__errpuc("outofmem", "Out of memory");
   6129    pixelCount = w*h;
   6130 
   6131    // Initialize the data to zero.
   6132    //memset( out, 0, pixelCount * 4 );
   6133 
   6134    // Finally, the image data.
   6135    if (compression) {
   6136       // RLE as used by .PSD and .TIFF
   6137       // Loop until you get the number of unpacked bytes you are expecting:
   6138       //     Read the next source byte into n.
   6139       //     If n is between 0 and 127 inclusive, copy the next n+1 bytes literally.
   6140       //     Else if n is between -127 and -1 inclusive, copy the next byte -n+1 times.
   6141       //     Else if n is 128, noop.
   6142       // Endloop
   6143 
   6144       // The RLE-compressed data is preceded by a 2-byte data count for each row in the data,
   6145       // which we're going to just skip.
   6146       stbi__skip(s, h * channelCount * 2 );
   6147 
   6148       // Read the RLE data by channel.
   6149       for (channel = 0; channel < 4; channel++) {
   6150          stbi_uc *p;
   6151 
   6152          p = out+channel;
   6153          if (channel >= channelCount) {
   6154             // Fill this channel with default data.
   6155             for (i = 0; i < pixelCount; i++, p += 4)
   6156                *p = (channel == 3 ? 255 : 0);
   6157          } else {
   6158             // Read the RLE data.
   6159             if (!stbi__psd_decode_rle(s, p, pixelCount)) {
   6160                STBI_FREE(out);
   6161                return stbi__errpuc("corrupt", "bad RLE data");
   6162             }
   6163          }
   6164       }
   6165 
   6166    } else {
   6167       // We're at the raw image data.  It's each channel in order (Red, Green, Blue, Alpha, ...)
   6168       // where each channel consists of an 8-bit (or 16-bit) value for each pixel in the image.
   6169 
   6170       // Read the data by channel.
   6171       for (channel = 0; channel < 4; channel++) {
   6172          if (channel >= channelCount) {
   6173             // Fill this channel with default data.
   6174             if (bitdepth == 16 && bpc == 16) {
   6175                stbi__uint16 *q = ((stbi__uint16 *) out) + channel;
   6176                stbi__uint16 val = channel == 3 ? 65535 : 0;
   6177                for (i = 0; i < pixelCount; i++, q += 4)
   6178                   *q = val;
   6179             } else {
   6180                stbi_uc *p = out+channel;
   6181                stbi_uc val = channel == 3 ? 255 : 0;
   6182                for (i = 0; i < pixelCount; i++, p += 4)
   6183                   *p = val;
   6184             }
   6185          } else {
   6186             if (ri->bits_per_channel == 16) {    // output bpc
   6187                stbi__uint16 *q = ((stbi__uint16 *) out) + channel;
   6188                for (i = 0; i < pixelCount; i++, q += 4)
   6189                   *q = (stbi__uint16) stbi__get16be(s);
   6190             } else {
   6191                stbi_uc *p = out+channel;
   6192                if (bitdepth == 16) {  // input bpc
   6193                   for (i = 0; i < pixelCount; i++, p += 4)
   6194                      *p = (stbi_uc) (stbi__get16be(s) >> 8);
   6195                } else {
   6196                   for (i = 0; i < pixelCount; i++, p += 4)
   6197                      *p = stbi__get8(s);
   6198                }
   6199             }
   6200          }
   6201       }
   6202    }
   6203 
   6204    // remove weird white matte from PSD
   6205    if (channelCount >= 4) {
   6206       if (ri->bits_per_channel == 16) {
   6207          for (i=0; i < w*h; ++i) {
   6208             stbi__uint16 *pixel = (stbi__uint16 *) out + 4*i;
   6209             if (pixel[3] != 0 && pixel[3] != 65535) {
   6210                float a = pixel[3] / 65535.0f;
   6211                float ra = 1.0f / a;
   6212                float inv_a = 65535.0f * (1 - ra);
   6213                pixel[0] = (stbi__uint16) (pixel[0]*ra + inv_a);
   6214                pixel[1] = (stbi__uint16) (pixel[1]*ra + inv_a);
   6215                pixel[2] = (stbi__uint16) (pixel[2]*ra + inv_a);
   6216             }
   6217          }
   6218       } else {
   6219          for (i=0; i < w*h; ++i) {
   6220             unsigned char *pixel = out + 4*i;
   6221             if (pixel[3] != 0 && pixel[3] != 255) {
   6222                float a = pixel[3] / 255.0f;
   6223                float ra = 1.0f / a;
   6224                float inv_a = 255.0f * (1 - ra);
   6225                pixel[0] = (unsigned char) (pixel[0]*ra + inv_a);
   6226                pixel[1] = (unsigned char) (pixel[1]*ra + inv_a);
   6227                pixel[2] = (unsigned char) (pixel[2]*ra + inv_a);
   6228             }
   6229          }
   6230       }
   6231    }
   6232 
   6233    // convert to desired output format
   6234    if (req_comp && req_comp != 4) {
   6235       if (ri->bits_per_channel == 16)
   6236          out = (stbi_uc *) stbi__convert_format16((stbi__uint16 *) out, 4, req_comp, w, h);
   6237       else
   6238          out = stbi__convert_format(out, 4, req_comp, w, h);
   6239       if (out == NULL) return out; // stbi__convert_format frees input on failure
   6240    }
   6241 
   6242    if (comp) *comp = 4;
   6243    *y = h;
   6244    *x = w;
   6245 
   6246    return out;
   6247 }
   6248 #endif
   6249 
   6250 // *************************************************************************************************
   6251 // Softimage PIC loader
   6252 // by Tom Seddon
   6253 //
   6254 // See http://softimage.wiki.softimage.com/index.php/INFO:_PIC_file_format
   6255 // See http://ozviz.wasp.uwa.edu.au/~pbourke/dataformats/softimagepic/
   6256 
   6257 #ifndef STBI_NO_PIC
   6258 static int stbi__pic_is4(stbi__context *s,const char *str)
   6259 {
   6260    int i;
   6261    for (i=0; i<4; ++i)
   6262       if (stbi__get8(s) != (stbi_uc)str[i])
   6263          return 0;
   6264 
   6265    return 1;
   6266 }
   6267 
   6268 static int stbi__pic_test_core(stbi__context *s)
   6269 {
   6270    int i;
   6271 
   6272    if (!stbi__pic_is4(s,"\x53\x80\xF6\x34"))
   6273       return 0;
   6274 
   6275    for(i=0;i<84;++i)
   6276       stbi__get8(s);
   6277 
   6278    if (!stbi__pic_is4(s,"PICT"))
   6279       return 0;
   6280 
   6281    return 1;
   6282 }
   6283 
   6284 typedef struct
   6285 {
   6286    stbi_uc size,type,channel;
   6287 } stbi__pic_packet;
   6288 
   6289 static stbi_uc *stbi__readval(stbi__context *s, int channel, stbi_uc *dest)
   6290 {
   6291    int mask=0x80, i;
   6292 
   6293    for (i=0; i<4; ++i, mask>>=1) {
   6294       if (channel & mask) {
   6295          if (stbi__at_eof(s)) return stbi__errpuc("bad file","PIC file too short");
   6296          dest[i]=stbi__get8(s);
   6297       }
   6298    }
   6299 
   6300    return dest;
   6301 }
   6302 
   6303 static void stbi__copyval(int channel,stbi_uc *dest,const stbi_uc *src)
   6304 {
   6305    int mask=0x80,i;
   6306 
   6307    for (i=0;i<4; ++i, mask>>=1)
   6308       if (channel&mask)
   6309          dest[i]=src[i];
   6310 }
   6311 
   6312 static stbi_uc *stbi__pic_load_core(stbi__context *s,int width,int height,int *comp, stbi_uc *result)
   6313 {
   6314    int act_comp=0,num_packets=0,y,chained;
   6315    stbi__pic_packet packets[10];
   6316 
   6317    // this will (should...) cater for even some bizarre stuff like having data
   6318     // for the same channel in multiple packets.
   6319    do {
   6320       stbi__pic_packet *packet;
   6321 
   6322       if (num_packets==sizeof(packets)/sizeof(packets[0]))
   6323          return stbi__errpuc("bad format","too many packets");
   6324 
   6325       packet = &packets[num_packets++];
   6326 
   6327       chained = stbi__get8(s);
   6328       packet->size    = stbi__get8(s);
   6329       packet->type    = stbi__get8(s);
   6330       packet->channel = stbi__get8(s);
   6331 
   6332       act_comp |= packet->channel;
   6333 
   6334       if (stbi__at_eof(s))          return stbi__errpuc("bad file","file too short (reading packets)");
   6335       if (packet->size != 8)  return stbi__errpuc("bad format","packet isn't 8bpp");
   6336    } while (chained);
   6337 
   6338    *comp = (act_comp & 0x10 ? 4 : 3); // has alpha channel?
   6339 
   6340    for(y=0; y<height; ++y) {
   6341       int packet_idx;
   6342 
   6343       for(packet_idx=0; packet_idx < num_packets; ++packet_idx) {
   6344          stbi__pic_packet *packet = &packets[packet_idx];
   6345          stbi_uc *dest = result+y*width*4;
   6346 
   6347          switch (packet->type) {
   6348             default:
   6349                return stbi__errpuc("bad format","packet has bad compression type");
   6350 
   6351             case 0: {//uncompressed
   6352                int x;
   6353 
   6354                for(x=0;x<width;++x, dest+=4)
   6355                   if (!stbi__readval(s,packet->channel,dest))
   6356                      return 0;
   6357                break;
   6358             }
   6359 
   6360             case 1://Pure RLE
   6361                {
   6362                   int left=width, i;
   6363 
   6364                   while (left>0) {
   6365                      stbi_uc count,value[4];
   6366 
   6367                      count=stbi__get8(s);
   6368                      if (stbi__at_eof(s))   return stbi__errpuc("bad file","file too short (pure read count)");
   6369 
   6370                      if (count > left)
   6371                         count = (stbi_uc) left;
   6372 
   6373                      if (!stbi__readval(s,packet->channel,value))  return 0;
   6374 
   6375                      for(i=0; i<count; ++i,dest+=4)
   6376                         stbi__copyval(packet->channel,dest,value);
   6377                      left -= count;
   6378                   }
   6379                }
   6380                break;
   6381 
   6382             case 2: {//Mixed RLE
   6383                int left=width;
   6384                while (left>0) {
   6385                   int count = stbi__get8(s), i;
   6386                   if (stbi__at_eof(s))  return stbi__errpuc("bad file","file too short (mixed read count)");
   6387 
   6388                   if (count >= 128) { // Repeated
   6389                      stbi_uc value[4];
   6390 
   6391                      if (count==128)
   6392                         count = stbi__get16be(s);
   6393                      else
   6394                         count -= 127;
   6395                      if (count > left)
   6396                         return stbi__errpuc("bad file","scanline overrun");
   6397 
   6398                      if (!stbi__readval(s,packet->channel,value))
   6399                         return 0;
   6400 
   6401                      for(i=0;i<count;++i, dest += 4)
   6402                         stbi__copyval(packet->channel,dest,value);
   6403                   } else { // Raw
   6404                      ++count;
   6405                      if (count>left) return stbi__errpuc("bad file","scanline overrun");
   6406 
   6407                      for(i=0;i<count;++i, dest+=4)
   6408                         if (!stbi__readval(s,packet->channel,dest))
   6409                            return 0;
   6410                   }
   6411                   left-=count;
   6412                }
   6413                break;
   6414             }
   6415          }
   6416       }
   6417    }
   6418 
   6419    return result;
   6420 }
   6421 
   6422 static void *stbi__pic_load(stbi__context *s,int *px,int *py,int *comp,int req_comp, stbi__result_info *ri)
   6423 {
   6424    stbi_uc *result;
   6425    int i, x,y, internal_comp;
   6426    STBI_NOTUSED(ri);
   6427 
   6428    if (!comp) comp = &internal_comp;
   6429 
   6430    for (i=0; i<92; ++i)
   6431       stbi__get8(s);
   6432 
   6433    x = stbi__get16be(s);
   6434    y = stbi__get16be(s);
   6435 
   6436    if (y > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
   6437    if (x > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
   6438 
   6439    if (stbi__at_eof(s))  return stbi__errpuc("bad file","file too short (pic header)");
   6440    if (!stbi__mad3sizes_valid(x, y, 4, 0)) return stbi__errpuc("too large", "PIC image too large to decode");
   6441 
   6442    stbi__get32be(s); //skip `ratio'
   6443    stbi__get16be(s); //skip `fields'
   6444    stbi__get16be(s); //skip `pad'
   6445 
   6446    // intermediate buffer is RGBA
   6447    result = (stbi_uc *) stbi__malloc_mad3(x, y, 4, 0);
   6448    if (!result) return stbi__errpuc("outofmem", "Out of memory");
   6449    memset(result, 0xff, x*y*4);
   6450 
   6451    if (!stbi__pic_load_core(s,x,y,comp, result)) {
   6452       STBI_FREE(result);
   6453       result=0;
   6454    }
   6455    *px = x;
   6456    *py = y;
   6457    if (req_comp == 0) req_comp = *comp;
   6458    result=stbi__convert_format(result,4,req_comp,x,y);
   6459 
   6460    return result;
   6461 }
   6462 
   6463 static int stbi__pic_test(stbi__context *s)
   6464 {
   6465    int r = stbi__pic_test_core(s);
   6466    stbi__rewind(s);
   6467    return r;
   6468 }
   6469 #endif
   6470 
   6471 // *************************************************************************************************
   6472 // GIF loader -- public domain by Jean-Marc Lienher -- simplified/shrunk by stb
   6473 
   6474 #ifndef STBI_NO_GIF
   6475 typedef struct
   6476 {
   6477    stbi__int16 prefix;
   6478    stbi_uc first;
   6479    stbi_uc suffix;
   6480 } stbi__gif_lzw;
   6481 
   6482 typedef struct
   6483 {
   6484    int w,h;
   6485    stbi_uc *out;                 // output buffer (always 4 components)
   6486    stbi_uc *background;          // The current "background" as far as a gif is concerned
   6487    stbi_uc *history;
   6488    int flags, bgindex, ratio, transparent, eflags;
   6489    stbi_uc  pal[256][4];
   6490    stbi_uc lpal[256][4];
   6491    stbi__gif_lzw codes[8192];
   6492    stbi_uc *color_table;
   6493    int parse, step;
   6494    int lflags;
   6495    int start_x, start_y;
   6496    int max_x, max_y;
   6497    int cur_x, cur_y;
   6498    int line_size;
   6499    int delay;
   6500 } stbi__gif;
   6501 
   6502 static int stbi__gif_test_raw(stbi__context *s)
   6503 {
   6504    int sz;
   6505    if (stbi__get8(s) != 'G' || stbi__get8(s) != 'I' || stbi__get8(s) != 'F' || stbi__get8(s) != '8') return 0;
   6506    sz = stbi__get8(s);
   6507    if (sz != '9' && sz != '7') return 0;
   6508    if (stbi__get8(s) != 'a') return 0;
   6509    return 1;
   6510 }
   6511 
   6512 static int stbi__gif_test(stbi__context *s)
   6513 {
   6514    int r = stbi__gif_test_raw(s);
   6515    stbi__rewind(s);
   6516    return r;
   6517 }
   6518 
   6519 static void stbi__gif_parse_colortable(stbi__context *s, stbi_uc pal[256][4], int num_entries, int transp)
   6520 {
   6521    int i;
   6522    for (i=0; i < num_entries; ++i) {
   6523       pal[i][2] = stbi__get8(s);
   6524       pal[i][1] = stbi__get8(s);
   6525       pal[i][0] = stbi__get8(s);
   6526       pal[i][3] = transp == i ? 0 : 255;
   6527    }
   6528 }
   6529 
   6530 static int stbi__gif_header(stbi__context *s, stbi__gif *g, int *comp, int is_info)
   6531 {
   6532    stbi_uc version;
   6533    if (stbi__get8(s) != 'G' || stbi__get8(s) != 'I' || stbi__get8(s) != 'F' || stbi__get8(s) != '8')
   6534       return stbi__err("not GIF", "Corrupt GIF");
   6535 
   6536    version = stbi__get8(s);
   6537    if (version != '7' && version != '9')    return stbi__err("not GIF", "Corrupt GIF");
   6538    if (stbi__get8(s) != 'a')                return stbi__err("not GIF", "Corrupt GIF");
   6539 
   6540    stbi__g_failure_reason = "";
   6541    g->w = stbi__get16le(s);
   6542    g->h = stbi__get16le(s);
   6543    g->flags = stbi__get8(s);
   6544    g->bgindex = stbi__get8(s);
   6545    g->ratio = stbi__get8(s);
   6546    g->transparent = -1;
   6547 
   6548    if (g->w > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
   6549    if (g->h > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
   6550 
   6551    if (comp != 0) *comp = 4;  // can't actually tell whether it's 3 or 4 until we parse the comments
   6552 
   6553    if (is_info) return 1;
   6554 
   6555    if (g->flags & 0x80)
   6556       stbi__gif_parse_colortable(s,g->pal, 2 << (g->flags & 7), -1);
   6557 
   6558    return 1;
   6559 }
   6560 
   6561 static int stbi__gif_info_raw(stbi__context *s, int *x, int *y, int *comp)
   6562 {
   6563    stbi__gif* g = (stbi__gif*) stbi__malloc(sizeof(stbi__gif));
   6564    if (!g) return stbi__err("outofmem", "Out of memory");
   6565    if (!stbi__gif_header(s, g, comp, 1)) {
   6566       STBI_FREE(g);
   6567       stbi__rewind( s );
   6568       return 0;
   6569    }
   6570    if (x) *x = g->w;
   6571    if (y) *y = g->h;
   6572    STBI_FREE(g);
   6573    return 1;
   6574 }
   6575 
   6576 static void stbi__out_gif_code(stbi__gif *g, stbi__uint16 code)
   6577 {
   6578    stbi_uc *p, *c;
   6579    int idx;
   6580 
   6581    // recurse to decode the prefixes, since the linked-list is backwards,
   6582    // and working backwards through an interleaved image would be nasty
   6583    if (g->codes[code].prefix >= 0)
   6584       stbi__out_gif_code(g, g->codes[code].prefix);
   6585 
   6586    if (g->cur_y >= g->max_y) return;
   6587 
   6588    idx = g->cur_x + g->cur_y;
   6589    p = &g->out[idx];
   6590    g->history[idx / 4] = 1;
   6591 
   6592    c = &g->color_table[g->codes[code].suffix * 4];
   6593    if (c[3] > 128) { // don't render transparent pixels;
   6594       p[0] = c[2];
   6595       p[1] = c[1];
   6596       p[2] = c[0];
   6597       p[3] = c[3];
   6598    }
   6599    g->cur_x += 4;
   6600 
   6601    if (g->cur_x >= g->max_x) {
   6602       g->cur_x = g->start_x;
   6603       g->cur_y += g->step;
   6604 
   6605       while (g->cur_y >= g->max_y && g->parse > 0) {
   6606          g->step = (1 << g->parse) * g->line_size;
   6607          g->cur_y = g->start_y + (g->step >> 1);
   6608          --g->parse;
   6609       }
   6610    }
   6611 }
   6612 
   6613 static stbi_uc *stbi__process_gif_raster(stbi__context *s, stbi__gif *g)
   6614 {
   6615    stbi_uc lzw_cs;
   6616    stbi__int32 len, init_code;
   6617    stbi__uint32 first;
   6618    stbi__int32 codesize, codemask, avail, oldcode, bits, valid_bits, clear;
   6619    stbi__gif_lzw *p;
   6620 
   6621    lzw_cs = stbi__get8(s);
   6622    if (lzw_cs > 12) return NULL;
   6623    clear = 1 << lzw_cs;
   6624    first = 1;
   6625    codesize = lzw_cs + 1;
   6626    codemask = (1 << codesize) - 1;
   6627    bits = 0;
   6628    valid_bits = 0;
   6629    for (init_code = 0; init_code < clear; init_code++) {
   6630       g->codes[init_code].prefix = -1;
   6631       g->codes[init_code].first = (stbi_uc) init_code;
   6632       g->codes[init_code].suffix = (stbi_uc) init_code;
   6633    }
   6634 
   6635    // support no starting clear code
   6636    avail = clear+2;
   6637    oldcode = -1;
   6638 
   6639    len = 0;
   6640    for(;;) {
   6641       if (valid_bits < codesize) {
   6642          if (len == 0) {
   6643             len = stbi__get8(s); // start new block
   6644             if (len == 0)
   6645                return g->out;
   6646          }
   6647          --len;
   6648          bits |= (stbi__int32) stbi__get8(s) << valid_bits;
   6649          valid_bits += 8;
   6650       } else {
   6651          stbi__int32 code = bits & codemask;
   6652          bits >>= codesize;
   6653          valid_bits -= codesize;
   6654          // @OPTIMIZE: is there some way we can accelerate the non-clear path?
   6655          if (code == clear) {  // clear code
   6656             codesize = lzw_cs + 1;
   6657             codemask = (1 << codesize) - 1;
   6658             avail = clear + 2;
   6659             oldcode = -1;
   6660             first = 0;
   6661          } else if (code == clear + 1) { // end of stream code
   6662             stbi__skip(s, len);
   6663             while ((len = stbi__get8(s)) > 0)
   6664                stbi__skip(s,len);
   6665             return g->out;
   6666          } else if (code <= avail) {
   6667             if (first) {
   6668                return stbi__errpuc("no clear code", "Corrupt GIF");
   6669             }
   6670 
   6671             if (oldcode >= 0) {
   6672                p = &g->codes[avail++];
   6673                if (avail > 8192) {
   6674                   return stbi__errpuc("too many codes", "Corrupt GIF");
   6675                }
   6676 
   6677                p->prefix = (stbi__int16) oldcode;
   6678                p->first = g->codes[oldcode].first;
   6679                p->suffix = (code == avail) ? p->first : g->codes[code].first;
   6680             } else if (code == avail)
   6681                return stbi__errpuc("illegal code in raster", "Corrupt GIF");
   6682 
   6683             stbi__out_gif_code(g, (stbi__uint16) code);
   6684 
   6685             if ((avail & codemask) == 0 && avail <= 0x0FFF) {
   6686                codesize++;
   6687                codemask = (1 << codesize) - 1;
   6688             }
   6689 
   6690             oldcode = code;
   6691          } else {
   6692             return stbi__errpuc("illegal code in raster", "Corrupt GIF");
   6693          }
   6694       }
   6695    }
   6696 }
   6697 
   6698 // this function is designed to support animated gifs, although stb_image doesn't support it
   6699 // two back is the image from two frames ago, used for a very specific disposal format
   6700 static stbi_uc *stbi__gif_load_next(stbi__context *s, stbi__gif *g, int *comp, int req_comp, stbi_uc *two_back)
   6701 {
   6702    int dispose;
   6703    int first_frame;
   6704    int pi;
   6705    int pcount;
   6706    STBI_NOTUSED(req_comp);
   6707 
   6708    // on first frame, any non-written pixels get the background colour (non-transparent)
   6709    first_frame = 0;
   6710    if (g->out == 0) {
   6711       if (!stbi__gif_header(s, g, comp,0)) return 0; // stbi__g_failure_reason set by stbi__gif_header
   6712       if (!stbi__mad3sizes_valid(4, g->w, g->h, 0))
   6713          return stbi__errpuc("too large", "GIF image is too large");
   6714       pcount = g->w * g->h;
   6715       g->out = (stbi_uc *) stbi__malloc(4 * pcount);
   6716       g->background = (stbi_uc *) stbi__malloc(4 * pcount);
   6717       g->history = (stbi_uc *) stbi__malloc(pcount);
   6718       if (!g->out || !g->background || !g->history)
   6719          return stbi__errpuc("outofmem", "Out of memory");
   6720 
   6721       // image is treated as "transparent" at the start - ie, nothing overwrites the current background;
   6722       // background colour is only used for pixels that are not rendered first frame, after that "background"
   6723       // color refers to the color that was there the previous frame.
   6724       memset(g->out, 0x00, 4 * pcount);
   6725       memset(g->background, 0x00, 4 * pcount); // state of the background (starts transparent)
   6726       memset(g->history, 0x00, pcount);        // pixels that were affected previous frame
   6727       first_frame = 1;
   6728    } else {
   6729       // second frame - how do we dispose of the previous one?
   6730       dispose = (g->eflags & 0x1C) >> 2;
   6731       pcount = g->w * g->h;
   6732 
   6733       if ((dispose == 3) && (two_back == 0)) {
   6734          dispose = 2; // if I don't have an image to revert back to, default to the old background
   6735       }
   6736 
   6737       if (dispose == 3) { // use previous graphic
   6738          for (pi = 0; pi < pcount; ++pi) {
   6739             if (g->history[pi]) {
   6740                memcpy( &g->out[pi * 4], &two_back[pi * 4], 4 );
   6741             }
   6742          }
   6743       } else if (dispose == 2) {
   6744          // restore what was changed last frame to background before that frame;
   6745          for (pi = 0; pi < pcount; ++pi) {
   6746             if (g->history[pi]) {
   6747                memcpy( &g->out[pi * 4], &g->background[pi * 4], 4 );
   6748             }
   6749          }
   6750       } else {
   6751          // This is a non-disposal case eithe way, so just
   6752          // leave the pixels as is, and they will become the new background
   6753          // 1: do not dispose
   6754          // 0:  not specified.
   6755       }
   6756 
   6757       // background is what out is after the undoing of the previou frame;
   6758       memcpy( g->background, g->out, 4 * g->w * g->h );
   6759    }
   6760 
   6761    // clear my history;
   6762    memset( g->history, 0x00, g->w * g->h );        // pixels that were affected previous frame
   6763 
   6764    for (;;) {
   6765       int tag = stbi__get8(s);
   6766       switch (tag) {
   6767          case 0x2C: /* Image Descriptor */
   6768          {
   6769             stbi__int32 x, y, w, h;
   6770             stbi_uc *o;
   6771 
   6772             x = stbi__get16le(s);
   6773             y = stbi__get16le(s);
   6774             w = stbi__get16le(s);
   6775             h = stbi__get16le(s);
   6776             if (((x + w) > (g->w)) || ((y + h) > (g->h)))
   6777                return stbi__errpuc("bad Image Descriptor", "Corrupt GIF");
   6778 
   6779             g->line_size = g->w * 4;
   6780             g->start_x = x * 4;
   6781             g->start_y = y * g->line_size;
   6782             g->max_x   = g->start_x + w * 4;
   6783             g->max_y   = g->start_y + h * g->line_size;
   6784             g->cur_x   = g->start_x;
   6785             g->cur_y   = g->start_y;
   6786 
   6787             // if the width of the specified rectangle is 0, that means
   6788             // we may not see *any* pixels or the image is malformed;
   6789             // to make sure this is caught, move the current y down to
   6790             // max_y (which is what out_gif_code checks).
   6791             if (w == 0)
   6792                g->cur_y = g->max_y;
   6793 
   6794             g->lflags = stbi__get8(s);
   6795 
   6796             if (g->lflags & 0x40) {
   6797                g->step = 8 * g->line_size; // first interlaced spacing
   6798                g->parse = 3;
   6799             } else {
   6800                g->step = g->line_size;
   6801                g->parse = 0;
   6802             }
   6803 
   6804             if (g->lflags & 0x80) {
   6805                stbi__gif_parse_colortable(s,g->lpal, 2 << (g->lflags & 7), g->eflags & 0x01 ? g->transparent : -1);
   6806                g->color_table = (stbi_uc *) g->lpal;
   6807             } else if (g->flags & 0x80) {
   6808                g->color_table = (stbi_uc *) g->pal;
   6809             } else
   6810                return stbi__errpuc("missing color table", "Corrupt GIF");
   6811 
   6812             o = stbi__process_gif_raster(s, g);
   6813             if (!o) return NULL;
   6814 
   6815             // if this was the first frame,
   6816             pcount = g->w * g->h;
   6817             if (first_frame && (g->bgindex > 0)) {
   6818                // if first frame, any pixel not drawn to gets the background color
   6819                for (pi = 0; pi < pcount; ++pi) {
   6820                   if (g->history[pi] == 0) {
   6821                      g->pal[g->bgindex][3] = 255; // just in case it was made transparent, undo that; It will be reset next frame if need be;
   6822                      memcpy( &g->out[pi * 4], &g->pal[g->bgindex], 4 );
   6823                   }
   6824                }
   6825             }
   6826 
   6827             return o;
   6828          }
   6829 
   6830          case 0x21: // Comment Extension.
   6831          {
   6832             int len;
   6833             int ext = stbi__get8(s);
   6834             if (ext == 0xF9) { // Graphic Control Extension.
   6835                len = stbi__get8(s);
   6836                if (len == 4) {
   6837                   g->eflags = stbi__get8(s);
   6838                   g->delay = 10 * stbi__get16le(s); // delay - 1/100th of a second, saving as 1/1000ths.
   6839 
   6840                   // unset old transparent
   6841                   if (g->transparent >= 0) {
   6842                      g->pal[g->transparent][3] = 255;
   6843                   }
   6844                   if (g->eflags & 0x01) {
   6845                      g->transparent = stbi__get8(s);
   6846                      if (g->transparent >= 0) {
   6847                         g->pal[g->transparent][3] = 0;
   6848                      }
   6849                   } else {
   6850                      // don't need transparent
   6851                      stbi__skip(s, 1);
   6852                      g->transparent = -1;
   6853                   }
   6854                } else {
   6855                   stbi__skip(s, len);
   6856                   break;
   6857                }
   6858             }
   6859             while ((len = stbi__get8(s)) != 0) {
   6860                stbi__skip(s, len);
   6861             }
   6862             break;
   6863          }
   6864 
   6865          case 0x3B: // gif stream termination code
   6866             return (stbi_uc *) s; // using '1' causes warning on some compilers
   6867 
   6868          default:
   6869             return stbi__errpuc("unknown code", "Corrupt GIF");
   6870       }
   6871    }
   6872 }
   6873 
   6874 static void *stbi__load_gif_main_outofmem(stbi__gif *g, stbi_uc *out, int **delays)
   6875 {
   6876    STBI_FREE(g->out);
   6877    STBI_FREE(g->history);
   6878    STBI_FREE(g->background);
   6879 
   6880    if (out) STBI_FREE(out);
   6881    if (delays && *delays) STBI_FREE(*delays);
   6882    return stbi__errpuc("outofmem", "Out of memory");
   6883 }
   6884 
   6885 static void *stbi__load_gif_main(stbi__context *s, int **delays, int *x, int *y, int *z, int *comp, int req_comp)
   6886 {
   6887    if (stbi__gif_test(s)) {
   6888       int layers = 0;
   6889       stbi_uc *u = 0;
   6890       stbi_uc *out = 0;
   6891       stbi_uc *two_back = 0;
   6892       stbi__gif g;
   6893       int stride;
   6894       int out_size = 0;
   6895       int delays_size = 0;
   6896 
   6897       STBI_NOTUSED(out_size);
   6898       STBI_NOTUSED(delays_size);
   6899 
   6900       memset(&g, 0, sizeof(g));
   6901       if (delays) {
   6902          *delays = 0;
   6903       }
   6904 
   6905       do {
   6906          u = stbi__gif_load_next(s, &g, comp, req_comp, two_back);
   6907          if (u == (stbi_uc *) s) u = 0;  // end of animated gif marker
   6908 
   6909          if (u) {
   6910             *x = g.w;
   6911             *y = g.h;
   6912             ++layers;
   6913             stride = g.w * g.h * 4;
   6914 
   6915             if (out) {
   6916                void *tmp = (stbi_uc*) STBI_REALLOC_SIZED( out, out_size, layers * stride );
   6917                if (!tmp)
   6918                   return stbi__load_gif_main_outofmem(&g, out, delays);
   6919                else {
   6920                    out = (stbi_uc*) tmp;
   6921                    out_size = layers * stride;
   6922                }
   6923 
   6924                if (delays) {
   6925                   int *new_delays = (int*) STBI_REALLOC_SIZED( *delays, delays_size, sizeof(int) * layers );
   6926                   if (!new_delays)
   6927                      return stbi__load_gif_main_outofmem(&g, out, delays);
   6928                   *delays = new_delays;
   6929                   delays_size = layers * sizeof(int);
   6930                }
   6931             } else {
   6932                out = (stbi_uc*)stbi__malloc( layers * stride );
   6933                if (!out)
   6934                   return stbi__load_gif_main_outofmem(&g, out, delays);
   6935                out_size = layers * stride;
   6936                if (delays) {
   6937                   *delays = (int*) stbi__malloc( layers * sizeof(int) );
   6938                   if (!*delays)
   6939                      return stbi__load_gif_main_outofmem(&g, out, delays);
   6940                   delays_size = layers * sizeof(int);
   6941                }
   6942             }
   6943             memcpy( out + ((layers - 1) * stride), u, stride );
   6944             if (layers >= 2) {
   6945                two_back = out - 2 * stride;
   6946             }
   6947 
   6948             if (delays) {
   6949                (*delays)[layers - 1U] = g.delay;
   6950             }
   6951          }
   6952       } while (u != 0);
   6953 
   6954       // free temp buffer;
   6955       STBI_FREE(g.out);
   6956       STBI_FREE(g.history);
   6957       STBI_FREE(g.background);
   6958 
   6959       // do the final conversion after loading everything;
   6960       if (req_comp && req_comp != 4)
   6961          out = stbi__convert_format(out, 4, req_comp, layers * g.w, g.h);
   6962 
   6963       *z = layers;
   6964       return out;
   6965    } else {
   6966       return stbi__errpuc("not GIF", "Image was not as a gif type.");
   6967    }
   6968 }
   6969 
   6970 static void *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
   6971 {
   6972    stbi_uc *u = 0;
   6973    stbi__gif g;
   6974    memset(&g, 0, sizeof(g));
   6975    STBI_NOTUSED(ri);
   6976 
   6977    u = stbi__gif_load_next(s, &g, comp, req_comp, 0);
   6978    if (u == (stbi_uc *) s) u = 0;  // end of animated gif marker
   6979    if (u) {
   6980       *x = g.w;
   6981       *y = g.h;
   6982 
   6983       // moved conversion to after successful load so that the same
   6984       // can be done for multiple frames.
   6985       if (req_comp && req_comp != 4)
   6986          u = stbi__convert_format(u, 4, req_comp, g.w, g.h);
   6987    } else if (g.out) {
   6988       // if there was an error and we allocated an image buffer, free it!
   6989       STBI_FREE(g.out);
   6990    }
   6991 
   6992    // free buffers needed for multiple frame loading;
   6993    STBI_FREE(g.history);
   6994    STBI_FREE(g.background);
   6995 
   6996    return u;
   6997 }
   6998 
   6999 static int stbi__gif_info(stbi__context *s, int *x, int *y, int *comp)
   7000 {
   7001    return stbi__gif_info_raw(s,x,y,comp);
   7002 }
   7003 #endif
   7004 
   7005 // *************************************************************************************************
   7006 // Radiance RGBE HDR loader
   7007 // originally by Nicolas Schulz
   7008 #ifndef STBI_NO_HDR
   7009 static int stbi__hdr_test_core(stbi__context *s, const char *signature)
   7010 {
   7011    int i;
   7012    for (i=0; signature[i]; ++i)
   7013       if (stbi__get8(s) != signature[i])
   7014           return 0;
   7015    stbi__rewind(s);
   7016    return 1;
   7017 }
   7018 
   7019 static int stbi__hdr_test(stbi__context* s)
   7020 {
   7021    int r = stbi__hdr_test_core(s, "#?RADIANCE\n");
   7022    stbi__rewind(s);
   7023    if(!r) {
   7024        r = stbi__hdr_test_core(s, "#?RGBE\n");
   7025        stbi__rewind(s);
   7026    }
   7027    return r;
   7028 }
   7029 
   7030 #define STBI__HDR_BUFLEN  1024
   7031 static char *stbi__hdr_gettoken(stbi__context *z, char *buffer)
   7032 {
   7033    int len=0;
   7034    char c = '\0';
   7035 
   7036    c = (char) stbi__get8(z);
   7037 
   7038    while (!stbi__at_eof(z) && c != '\n') {
   7039       buffer[len++] = c;
   7040       if (len == STBI__HDR_BUFLEN-1) {
   7041          // flush to end of line
   7042          while (!stbi__at_eof(z) && stbi__get8(z) != '\n')
   7043             ;
   7044          break;
   7045       }
   7046       c = (char) stbi__get8(z);
   7047    }
   7048 
   7049    buffer[len] = 0;
   7050    return buffer;
   7051 }
   7052 
   7053 static void stbi__hdr_convert(float *output, stbi_uc *input, int req_comp)
   7054 {
   7055    if ( input[3] != 0 ) {
   7056       float f1;
   7057       // Exponent
   7058       f1 = (float) ldexp(1.0f, input[3] - (int)(128 + 8));
   7059       if (req_comp <= 2)
   7060          output[0] = (input[0] + input[1] + input[2]) * f1 / 3;
   7061       else {
   7062          output[0] = input[0] * f1;
   7063          output[1] = input[1] * f1;
   7064          output[2] = input[2] * f1;
   7065       }
   7066       if (req_comp == 2) output[1] = 1;
   7067       if (req_comp == 4) output[3] = 1;
   7068    } else {
   7069       switch (req_comp) {
   7070          case 4: output[3] = 1; /* fallthrough */
   7071          case 3: output[0] = output[1] = output[2] = 0;
   7072                  break;
   7073          case 2: output[1] = 1; /* fallthrough */
   7074          case 1: output[0] = 0;
   7075                  break;
   7076       }
   7077    }
   7078 }
   7079 
   7080 static float *stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
   7081 {
   7082    char buffer[STBI__HDR_BUFLEN];
   7083    char *token;
   7084    int valid = 0;
   7085    int width, height;
   7086    stbi_uc *scanline;
   7087    float *hdr_data;
   7088    int len;
   7089    unsigned char count, value;
   7090    int i, j, k, c1,c2, z;
   7091    const char *headerToken;
   7092    STBI_NOTUSED(ri);
   7093 
   7094    // Check identifier
   7095    headerToken = stbi__hdr_gettoken(s,buffer);
   7096    if (strcmp(headerToken, "#?RADIANCE") != 0 && strcmp(headerToken, "#?RGBE") != 0)
   7097       return stbi__errpf("not HDR", "Corrupt HDR image");
   7098 
   7099    // Parse header
   7100    for(;;) {
   7101       token = stbi__hdr_gettoken(s,buffer);
   7102       if (token[0] == 0) break;
   7103       if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0) valid = 1;
   7104    }
   7105 
   7106    if (!valid)    return stbi__errpf("unsupported format", "Unsupported HDR format");
   7107 
   7108    // Parse width and height
   7109    // can't use sscanf() if we're not using stdio!
   7110    token = stbi__hdr_gettoken(s,buffer);
   7111    if (strncmp(token, "-Y ", 3))  return stbi__errpf("unsupported data layout", "Unsupported HDR format");
   7112    token += 3;
   7113    height = (int) strtol(token, &token, 10);
   7114    while (*token == ' ') ++token;
   7115    if (strncmp(token, "+X ", 3))  return stbi__errpf("unsupported data layout", "Unsupported HDR format");
   7116    token += 3;
   7117    width = (int) strtol(token, NULL, 10);
   7118 
   7119    if (height > STBI_MAX_DIMENSIONS) return stbi__errpf("too large","Very large image (corrupt?)");
   7120    if (width > STBI_MAX_DIMENSIONS) return stbi__errpf("too large","Very large image (corrupt?)");
   7121 
   7122    *x = width;
   7123    *y = height;
   7124 
   7125    if (comp) *comp = 3;
   7126    if (req_comp == 0) req_comp = 3;
   7127 
   7128    if (!stbi__mad4sizes_valid(width, height, req_comp, sizeof(float), 0))
   7129       return stbi__errpf("too large", "HDR image is too large");
   7130 
   7131    // Read data
   7132    hdr_data = (float *) stbi__malloc_mad4(width, height, req_comp, sizeof(float), 0);
   7133    if (!hdr_data)
   7134       return stbi__errpf("outofmem", "Out of memory");
   7135 
   7136    // Load image data
   7137    // image data is stored as some number of sca
   7138    if ( width < 8 || width >= 32768) {
   7139       // Read flat data
   7140       for (j=0; j < height; ++j) {
   7141          for (i=0; i < width; ++i) {
   7142             stbi_uc rgbe[4];
   7143            main_decode_loop:
   7144             stbi__getn(s, rgbe, 4);
   7145             stbi__hdr_convert(hdr_data + j * width * req_comp + i * req_comp, rgbe, req_comp);
   7146          }
   7147       }
   7148    } else {
   7149       // Read RLE-encoded data
   7150       scanline = NULL;
   7151 
   7152       for (j = 0; j < height; ++j) {
   7153          c1 = stbi__get8(s);
   7154          c2 = stbi__get8(s);
   7155          len = stbi__get8(s);
   7156          if (c1 != 2 || c2 != 2 || (len & 0x80)) {
   7157             // not run-length encoded, so we have to actually use THIS data as a decoded
   7158             // pixel (note this can't be a valid pixel--one of RGB must be >= 128)
   7159             stbi_uc rgbe[4];
   7160             rgbe[0] = (stbi_uc) c1;
   7161             rgbe[1] = (stbi_uc) c2;
   7162             rgbe[2] = (stbi_uc) len;
   7163             rgbe[3] = (stbi_uc) stbi__get8(s);
   7164             stbi__hdr_convert(hdr_data, rgbe, req_comp);
   7165             i = 1;
   7166             j = 0;
   7167             STBI_FREE(scanline);
   7168             goto main_decode_loop; // yes, this makes no sense
   7169          }
   7170          len <<= 8;
   7171          len |= stbi__get8(s);
   7172          if (len != width) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("invalid decoded scanline length", "corrupt HDR"); }
   7173          if (scanline == NULL) {
   7174             scanline = (stbi_uc *) stbi__malloc_mad2(width, 4, 0);
   7175             if (!scanline) {
   7176                STBI_FREE(hdr_data);
   7177                return stbi__errpf("outofmem", "Out of memory");
   7178             }
   7179          }
   7180 
   7181          for (k = 0; k < 4; ++k) {
   7182             int nleft;
   7183             i = 0;
   7184             while ((nleft = width - i) > 0) {
   7185                count = stbi__get8(s);
   7186                if (count > 128) {
   7187                   // Run
   7188                   value = stbi__get8(s);
   7189                   count -= 128;
   7190                   if (count > nleft) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("corrupt", "bad RLE data in HDR"); }
   7191                   for (z = 0; z < count; ++z)
   7192                      scanline[i++ * 4 + k] = value;
   7193                } else {
   7194                   // Dump
   7195                   if (count > nleft) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("corrupt", "bad RLE data in HDR"); }
   7196                   for (z = 0; z < count; ++z)
   7197                      scanline[i++ * 4 + k] = stbi__get8(s);
   7198                }
   7199             }
   7200          }
   7201          for (i=0; i < width; ++i)
   7202             stbi__hdr_convert(hdr_data+(j*width + i)*req_comp, scanline + i*4, req_comp);
   7203       }
   7204       if (scanline)
   7205          STBI_FREE(scanline);
   7206    }
   7207 
   7208    return hdr_data;
   7209 }
   7210 
   7211 static int stbi__hdr_info(stbi__context *s, int *x, int *y, int *comp)
   7212 {
   7213    char buffer[STBI__HDR_BUFLEN];
   7214    char *token;
   7215    int valid = 0;
   7216    int dummy;
   7217 
   7218    if (!x) x = &dummy;
   7219    if (!y) y = &dummy;
   7220    if (!comp) comp = &dummy;
   7221 
   7222    if (stbi__hdr_test(s) == 0) {
   7223        stbi__rewind( s );
   7224        return 0;
   7225    }
   7226 
   7227    for(;;) {
   7228       token = stbi__hdr_gettoken(s,buffer);
   7229       if (token[0] == 0) break;
   7230       if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0) valid = 1;
   7231    }
   7232 
   7233    if (!valid) {
   7234        stbi__rewind( s );
   7235        return 0;
   7236    }
   7237    token = stbi__hdr_gettoken(s,buffer);
   7238    if (strncmp(token, "-Y ", 3)) {
   7239        stbi__rewind( s );
   7240        return 0;
   7241    }
   7242    token += 3;
   7243    *y = (int) strtol(token, &token, 10);
   7244    while (*token == ' ') ++token;
   7245    if (strncmp(token, "+X ", 3)) {
   7246        stbi__rewind( s );
   7247        return 0;
   7248    }
   7249    token += 3;
   7250    *x = (int) strtol(token, NULL, 10);
   7251    *comp = 3;
   7252    return 1;
   7253 }
   7254 #endif // STBI_NO_HDR
   7255 
   7256 #ifndef STBI_NO_BMP
   7257 static int stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp)
   7258 {
   7259    void *p;
   7260    stbi__bmp_data info;
   7261 
   7262    info.all_a = 255;
   7263    p = stbi__bmp_parse_header(s, &info);
   7264    if (p == NULL) {
   7265       stbi__rewind( s );
   7266       return 0;
   7267    }
   7268    if (x) *x = s->img_x;
   7269    if (y) *y = s->img_y;
   7270    if (comp) {
   7271       if (info.bpp == 24 && info.ma == 0xff000000)
   7272          *comp = 3;
   7273       else
   7274          *comp = info.ma ? 4 : 3;
   7275    }
   7276    return 1;
   7277 }
   7278 #endif
   7279 
   7280 #ifndef STBI_NO_PSD
   7281 static int stbi__psd_info(stbi__context *s, int *x, int *y, int *comp)
   7282 {
   7283    int channelCount, dummy, depth;
   7284    if (!x) x = &dummy;
   7285    if (!y) y = &dummy;
   7286    if (!comp) comp = &dummy;
   7287    if (stbi__get32be(s) != 0x38425053) {
   7288        stbi__rewind( s );
   7289        return 0;
   7290    }
   7291    if (stbi__get16be(s) != 1) {
   7292        stbi__rewind( s );
   7293        return 0;
   7294    }
   7295    stbi__skip(s, 6);
   7296    channelCount = stbi__get16be(s);
   7297    if (channelCount < 0 || channelCount > 16) {
   7298        stbi__rewind( s );
   7299        return 0;
   7300    }
   7301    *y = stbi__get32be(s);
   7302    *x = stbi__get32be(s);
   7303    depth = stbi__get16be(s);
   7304    if (depth != 8 && depth != 16) {
   7305        stbi__rewind( s );
   7306        return 0;
   7307    }
   7308    if (stbi__get16be(s) != 3) {
   7309        stbi__rewind( s );
   7310        return 0;
   7311    }
   7312    *comp = 4;
   7313    return 1;
   7314 }
   7315 
   7316 static int stbi__psd_is16(stbi__context *s)
   7317 {
   7318    int channelCount, depth;
   7319    if (stbi__get32be(s) != 0x38425053) {
   7320        stbi__rewind( s );
   7321        return 0;
   7322    }
   7323    if (stbi__get16be(s) != 1) {
   7324        stbi__rewind( s );
   7325        return 0;
   7326    }
   7327    stbi__skip(s, 6);
   7328    channelCount = stbi__get16be(s);
   7329    if (channelCount < 0 || channelCount > 16) {
   7330        stbi__rewind( s );
   7331        return 0;
   7332    }
   7333    STBI_NOTUSED(stbi__get32be(s));
   7334    STBI_NOTUSED(stbi__get32be(s));
   7335    depth = stbi__get16be(s);
   7336    if (depth != 16) {
   7337        stbi__rewind( s );
   7338        return 0;
   7339    }
   7340    return 1;
   7341 }
   7342 #endif
   7343 
   7344 #ifndef STBI_NO_PIC
   7345 static int stbi__pic_info(stbi__context *s, int *x, int *y, int *comp)
   7346 {
   7347    int act_comp=0,num_packets=0,chained,dummy;
   7348    stbi__pic_packet packets[10];
   7349 
   7350    if (!x) x = &dummy;
   7351    if (!y) y = &dummy;
   7352    if (!comp) comp = &dummy;
   7353 
   7354    if (!stbi__pic_is4(s,"\x53\x80\xF6\x34")) {
   7355       stbi__rewind(s);
   7356       return 0;
   7357    }
   7358 
   7359    stbi__skip(s, 88);
   7360 
   7361    *x = stbi__get16be(s);
   7362    *y = stbi__get16be(s);
   7363    if (stbi__at_eof(s)) {
   7364       stbi__rewind( s);
   7365       return 0;
   7366    }
   7367    if ( (*x) != 0 && (1 << 28) / (*x) < (*y)) {
   7368       stbi__rewind( s );
   7369       return 0;
   7370    }
   7371 
   7372    stbi__skip(s, 8);
   7373 
   7374    do {
   7375       stbi__pic_packet *packet;
   7376 
   7377       if (num_packets==sizeof(packets)/sizeof(packets[0]))
   7378          return 0;
   7379 
   7380       packet = &packets[num_packets++];
   7381       chained = stbi__get8(s);
   7382       packet->size    = stbi__get8(s);
   7383       packet->type    = stbi__get8(s);
   7384       packet->channel = stbi__get8(s);
   7385       act_comp |= packet->channel;
   7386 
   7387       if (stbi__at_eof(s)) {
   7388           stbi__rewind( s );
   7389           return 0;
   7390       }
   7391       if (packet->size != 8) {
   7392           stbi__rewind( s );
   7393           return 0;
   7394       }
   7395    } while (chained);
   7396 
   7397    *comp = (act_comp & 0x10 ? 4 : 3);
   7398 
   7399    return 1;
   7400 }
   7401 #endif
   7402 
   7403 // *************************************************************************************************
   7404 // Portable Gray Map and Portable Pixel Map loader
   7405 // by Ken Miller
   7406 //
   7407 // PGM: http://netpbm.sourceforge.net/doc/pgm.html
   7408 // PPM: http://netpbm.sourceforge.net/doc/ppm.html
   7409 //
   7410 // Known limitations:
   7411 //    Does not support comments in the header section
   7412 //    Does not support ASCII image data (formats P2 and P3)
   7413 
   7414 #ifndef STBI_NO_PNM
   7415 
   7416 static int      stbi__pnm_test(stbi__context *s)
   7417 {
   7418    char p, t;
   7419    p = (char) stbi__get8(s);
   7420    t = (char) stbi__get8(s);
   7421    if (p != 'P' || (t != '5' && t != '6')) {
   7422        stbi__rewind( s );
   7423        return 0;
   7424    }
   7425    return 1;
   7426 }
   7427 
   7428 static void *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
   7429 {
   7430    stbi_uc *out;
   7431    STBI_NOTUSED(ri);
   7432 
   7433    ri->bits_per_channel = stbi__pnm_info(s, (int *)&s->img_x, (int *)&s->img_y, (int *)&s->img_n);
   7434    if (ri->bits_per_channel == 0)
   7435       return 0;
   7436 
   7437    if (s->img_y > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
   7438    if (s->img_x > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
   7439 
   7440    *x = s->img_x;
   7441    *y = s->img_y;
   7442    if (comp) *comp = s->img_n;
   7443 
   7444    if (!stbi__mad4sizes_valid(s->img_n, s->img_x, s->img_y, ri->bits_per_channel / 8, 0))
   7445       return stbi__errpuc("too large", "PNM too large");
   7446 
   7447    out = (stbi_uc *) stbi__malloc_mad4(s->img_n, s->img_x, s->img_y, ri->bits_per_channel / 8, 0);
   7448    if (!out) return stbi__errpuc("outofmem", "Out of memory");
   7449    stbi__getn(s, out, s->img_n * s->img_x * s->img_y * (ri->bits_per_channel / 8));
   7450 
   7451    if (req_comp && req_comp != s->img_n) {
   7452       out = stbi__convert_format(out, s->img_n, req_comp, s->img_x, s->img_y);
   7453       if (out == NULL) return out; // stbi__convert_format frees input on failure
   7454    }
   7455    return out;
   7456 }
   7457 
   7458 static int      stbi__pnm_isspace(char c)
   7459 {
   7460    return c == ' ' || c == '\t' || c == '\n' || c == '\v' || c == '\f' || c == '\r';
   7461 }
   7462 
   7463 static void     stbi__pnm_skip_whitespace(stbi__context *s, char *c)
   7464 {
   7465    for (;;) {
   7466       while (!stbi__at_eof(s) && stbi__pnm_isspace(*c))
   7467          *c = (char) stbi__get8(s);
   7468 
   7469       if (stbi__at_eof(s) || *c != '#')
   7470          break;
   7471 
   7472       while (!stbi__at_eof(s) && *c != '\n' && *c != '\r' )
   7473          *c = (char) stbi__get8(s);
   7474    }
   7475 }
   7476 
   7477 static int      stbi__pnm_isdigit(char c)
   7478 {
   7479    return c >= '0' && c <= '9';
   7480 }
   7481 
   7482 static int      stbi__pnm_getinteger(stbi__context *s, char *c)
   7483 {
   7484    int value = 0;
   7485 
   7486    while (!stbi__at_eof(s) && stbi__pnm_isdigit(*c)) {
   7487       value = value*10 + (*c - '0');
   7488       *c = (char) stbi__get8(s);
   7489    }
   7490 
   7491    return value;
   7492 }
   7493 
   7494 static int      stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp)
   7495 {
   7496    int maxv, dummy;
   7497    char c, p, t;
   7498 
   7499    if (!x) x = &dummy;
   7500    if (!y) y = &dummy;
   7501    if (!comp) comp = &dummy;
   7502 
   7503    stbi__rewind(s);
   7504 
   7505    // Get identifier
   7506    p = (char) stbi__get8(s);
   7507    t = (char) stbi__get8(s);
   7508    if (p != 'P' || (t != '5' && t != '6')) {
   7509        stbi__rewind(s);
   7510        return 0;
   7511    }
   7512 
   7513    *comp = (t == '6') ? 3 : 1;  // '5' is 1-component .pgm; '6' is 3-component .ppm
   7514 
   7515    c = (char) stbi__get8(s);
   7516    stbi__pnm_skip_whitespace(s, &c);
   7517 
   7518    *x = stbi__pnm_getinteger(s, &c); // read width
   7519    stbi__pnm_skip_whitespace(s, &c);
   7520 
   7521    *y = stbi__pnm_getinteger(s, &c); // read height
   7522    stbi__pnm_skip_whitespace(s, &c);
   7523 
   7524    maxv = stbi__pnm_getinteger(s, &c);  // read max value
   7525    if (maxv > 65535)
   7526       return stbi__err("max value > 65535", "PPM image supports only 8-bit and 16-bit images");
   7527    else if (maxv > 255)
   7528       return 16;
   7529    else
   7530       return 8;
   7531 }
   7532 
   7533 static int stbi__pnm_is16(stbi__context *s)
   7534 {
   7535    if (stbi__pnm_info(s, NULL, NULL, NULL) == 16)
   7536 	   return 1;
   7537    return 0;
   7538 }
   7539 #endif
   7540 
   7541 static int stbi__info_main(stbi__context *s, int *x, int *y, int *comp)
   7542 {
   7543    #ifndef STBI_NO_JPEG
   7544    if (stbi__jpeg_info(s, x, y, comp)) return 1;
   7545    #endif
   7546 
   7547    #ifndef STBI_NO_PNG
   7548    if (stbi__png_info(s, x, y, comp))  return 1;
   7549    #endif
   7550 
   7551    #ifndef STBI_NO_GIF
   7552    if (stbi__gif_info(s, x, y, comp))  return 1;
   7553    #endif
   7554 
   7555    #ifndef STBI_NO_BMP
   7556    if (stbi__bmp_info(s, x, y, comp))  return 1;
   7557    #endif
   7558 
   7559    #ifndef STBI_NO_PSD
   7560    if (stbi__psd_info(s, x, y, comp))  return 1;
   7561    #endif
   7562 
   7563    #ifndef STBI_NO_PIC
   7564    if (stbi__pic_info(s, x, y, comp))  return 1;
   7565    #endif
   7566 
   7567    #ifndef STBI_NO_PNM
   7568    if (stbi__pnm_info(s, x, y, comp))  return 1;
   7569    #endif
   7570 
   7571    #ifndef STBI_NO_HDR
   7572    if (stbi__hdr_info(s, x, y, comp))  return 1;
   7573    #endif
   7574 
   7575    // test tga last because it's a crappy test!
   7576    #ifndef STBI_NO_TGA
   7577    if (stbi__tga_info(s, x, y, comp))
   7578        return 1;
   7579    #endif
   7580    return stbi__err("unknown image type", "Image not of any known type, or corrupt");
   7581 }
   7582 
   7583 static int stbi__is_16_main(stbi__context *s)
   7584 {
   7585    #ifndef STBI_NO_PNG
   7586    if (stbi__png_is16(s))  return 1;
   7587    #endif
   7588 
   7589    #ifndef STBI_NO_PSD
   7590    if (stbi__psd_is16(s))  return 1;
   7591    #endif
   7592 
   7593    #ifndef STBI_NO_PNM
   7594    if (stbi__pnm_is16(s))  return 1;
   7595    #endif
   7596    return 0;
   7597 }
   7598 
   7599 #ifndef STBI_NO_STDIO
   7600 STBIDEF int stbi_info(char const *filename, int *x, int *y, int *comp)
   7601 {
   7602     FILE *f = stbi__fopen(filename, "rb");
   7603     int result;
   7604     if (!f) return stbi__err("can't fopen", "Unable to open file");
   7605     result = stbi_info_from_file(f, x, y, comp);
   7606     fclose(f);
   7607     return result;
   7608 }
   7609 
   7610 STBIDEF int stbi_info_from_file(FILE *f, int *x, int *y, int *comp)
   7611 {
   7612    int r;
   7613    stbi__context s;
   7614    long pos = ftell(f);
   7615    stbi__start_file(&s, f);
   7616    r = stbi__info_main(&s,x,y,comp);
   7617    fseek(f,pos,SEEK_SET);
   7618    return r;
   7619 }
   7620 
   7621 STBIDEF int stbi_is_16_bit(char const *filename)
   7622 {
   7623     FILE *f = stbi__fopen(filename, "rb");
   7624     int result;
   7625     if (!f) return stbi__err("can't fopen", "Unable to open file");
   7626     result = stbi_is_16_bit_from_file(f);
   7627     fclose(f);
   7628     return result;
   7629 }
   7630 
   7631 STBIDEF int stbi_is_16_bit_from_file(FILE *f)
   7632 {
   7633    int r;
   7634    stbi__context s;
   7635    long pos = ftell(f);
   7636    stbi__start_file(&s, f);
   7637    r = stbi__is_16_main(&s);
   7638    fseek(f,pos,SEEK_SET);
   7639    return r;
   7640 }
   7641 #endif // !STBI_NO_STDIO
   7642 
   7643 STBIDEF int stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp)
   7644 {
   7645    stbi__context s;
   7646    stbi__start_mem(&s,buffer,len);
   7647    return stbi__info_main(&s,x,y,comp);
   7648 }
   7649 
   7650 STBIDEF int stbi_info_from_callbacks(stbi_io_callbacks const *c, void *user, int *x, int *y, int *comp)
   7651 {
   7652    stbi__context s;
   7653    stbi__start_callbacks(&s, (stbi_io_callbacks *) c, user);
   7654    return stbi__info_main(&s,x,y,comp);
   7655 }
   7656 
   7657 STBIDEF int stbi_is_16_bit_from_memory(stbi_uc const *buffer, int len)
   7658 {
   7659    stbi__context s;
   7660    stbi__start_mem(&s,buffer,len);
   7661    return stbi__is_16_main(&s);
   7662 }
   7663 
   7664 STBIDEF int stbi_is_16_bit_from_callbacks(stbi_io_callbacks const *c, void *user)
   7665 {
   7666    stbi__context s;
   7667    stbi__start_callbacks(&s, (stbi_io_callbacks *) c, user);
   7668    return stbi__is_16_main(&s);
   7669 }
   7670 
   7671 #endif // STB_IMAGE_IMPLEMENTATION
   7672 
   7673 /*
   7674    revision history:
   7675       2.20  (2019-02-07) support utf8 filenames in Windows; fix warnings and platform ifdefs
   7676       2.19  (2018-02-11) fix warning
   7677       2.18  (2018-01-30) fix warnings
   7678       2.17  (2018-01-29) change sbti__shiftsigned to avoid clang -O2 bug
   7679                          1-bit BMP
   7680                          *_is_16_bit api
   7681                          avoid warnings
   7682       2.16  (2017-07-23) all functions have 16-bit variants;
   7683                          STBI_NO_STDIO works again;
   7684                          compilation fixes;
   7685                          fix rounding in unpremultiply;
   7686                          optimize vertical flip;
   7687                          disable raw_len validation;
   7688                          documentation fixes
   7689       2.15  (2017-03-18) fix png-1,2,4 bug; now all Imagenet JPGs decode;
   7690                          warning fixes; disable run-time SSE detection on gcc;
   7691                          uniform handling of optional "return" values;
   7692                          thread-safe initialization of zlib tables
   7693       2.14  (2017-03-03) remove deprecated STBI_JPEG_OLD; fixes for Imagenet JPGs
   7694       2.13  (2016-11-29) add 16-bit API, only supported for PNG right now
   7695       2.12  (2016-04-02) fix typo in 2.11 PSD fix that caused crashes
   7696       2.11  (2016-04-02) allocate large structures on the stack
   7697                          remove white matting for transparent PSD
   7698                          fix reported channel count for PNG & BMP
   7699                          re-enable SSE2 in non-gcc 64-bit
   7700                          support RGB-formatted JPEG
   7701                          read 16-bit PNGs (only as 8-bit)
   7702       2.10  (2016-01-22) avoid warning introduced in 2.09 by STBI_REALLOC_SIZED
   7703       2.09  (2016-01-16) allow comments in PNM files
   7704                          16-bit-per-pixel TGA (not bit-per-component)
   7705                          info() for TGA could break due to .hdr handling
   7706                          info() for BMP to shares code instead of sloppy parse
   7707                          can use STBI_REALLOC_SIZED if allocator doesn't support realloc
   7708                          code cleanup
   7709       2.08  (2015-09-13) fix to 2.07 cleanup, reading RGB PSD as RGBA
   7710       2.07  (2015-09-13) fix compiler warnings
   7711                          partial animated GIF support
   7712                          limited 16-bpc PSD support
   7713                          #ifdef unused functions
   7714                          bug with < 92 byte PIC,PNM,HDR,TGA
   7715       2.06  (2015-04-19) fix bug where PSD returns wrong '*comp' value
   7716       2.05  (2015-04-19) fix bug in progressive JPEG handling, fix warning
   7717       2.04  (2015-04-15) try to re-enable SIMD on MinGW 64-bit
   7718       2.03  (2015-04-12) extra corruption checking (mmozeiko)
   7719                          stbi_set_flip_vertically_on_load (nguillemot)
   7720                          fix NEON support; fix mingw support
   7721       2.02  (2015-01-19) fix incorrect assert, fix warning
   7722       2.01  (2015-01-17) fix various warnings; suppress SIMD on gcc 32-bit without -msse2
   7723       2.00b (2014-12-25) fix STBI_MALLOC in progressive JPEG
   7724       2.00  (2014-12-25) optimize JPG, including x86 SSE2 & NEON SIMD (ryg)
   7725                          progressive JPEG (stb)
   7726                          PGM/PPM support (Ken Miller)
   7727                          STBI_MALLOC,STBI_REALLOC,STBI_FREE
   7728                          GIF bugfix -- seemingly never worked
   7729                          STBI_NO_*, STBI_ONLY_*
   7730       1.48  (2014-12-14) fix incorrectly-named assert()
   7731       1.47  (2014-12-14) 1/2/4-bit PNG support, both direct and paletted (Omar Cornut & stb)
   7732                          optimize PNG (ryg)
   7733                          fix bug in interlaced PNG with user-specified channel count (stb)
   7734       1.46  (2014-08-26)
   7735               fix broken tRNS chunk (colorkey-style transparency) in non-paletted PNG
   7736       1.45  (2014-08-16)
   7737               fix MSVC-ARM internal compiler error by wrapping malloc
   7738       1.44  (2014-08-07)
   7739               various warning fixes from Ronny Chevalier
   7740       1.43  (2014-07-15)
   7741               fix MSVC-only compiler problem in code changed in 1.42
   7742       1.42  (2014-07-09)
   7743               don't define _CRT_SECURE_NO_WARNINGS (affects user code)
   7744               fixes to stbi__cleanup_jpeg path
   7745               added STBI_ASSERT to avoid requiring assert.h
   7746       1.41  (2014-06-25)
   7747               fix search&replace from 1.36 that messed up comments/error messages
   7748       1.40  (2014-06-22)
   7749               fix gcc struct-initialization warning
   7750       1.39  (2014-06-15)
   7751               fix to TGA optimization when req_comp != number of components in TGA;
   7752               fix to GIF loading because BMP wasn't rewinding (whoops, no GIFs in my test suite)
   7753               add support for BMP version 5 (more ignored fields)
   7754       1.38  (2014-06-06)
   7755               suppress MSVC warnings on integer casts truncating values
   7756               fix accidental rename of 'skip' field of I/O
   7757       1.37  (2014-06-04)
   7758               remove duplicate typedef
   7759       1.36  (2014-06-03)
   7760               convert to header file single-file library
   7761               if de-iphone isn't set, load iphone images color-swapped instead of returning NULL
   7762       1.35  (2014-05-27)
   7763               various warnings
   7764               fix broken STBI_SIMD path
   7765               fix bug where stbi_load_from_file no longer left file pointer in correct place
   7766               fix broken non-easy path for 32-bit BMP (possibly never used)
   7767               TGA optimization by Arseny Kapoulkine
   7768       1.34  (unknown)
   7769               use STBI_NOTUSED in stbi__resample_row_generic(), fix one more leak in tga failure case
   7770       1.33  (2011-07-14)
   7771               make stbi_is_hdr work in STBI_NO_HDR (as specified), minor compiler-friendly improvements
   7772       1.32  (2011-07-13)
   7773               support for "info" function for all supported filetypes (SpartanJ)
   7774       1.31  (2011-06-20)
   7775               a few more leak fixes, bug in PNG handling (SpartanJ)
   7776       1.30  (2011-06-11)
   7777               added ability to load files via callbacks to accomidate custom input streams (Ben Wenger)
   7778               removed deprecated format-specific test/load functions
   7779               removed support for installable file formats (stbi_loader) -- would have been broken for IO callbacks anyway
   7780               error cases in bmp and tga give messages and don't leak (Raymond Barbiero, grisha)
   7781               fix inefficiency in decoding 32-bit BMP (David Woo)
   7782       1.29  (2010-08-16)
   7783               various warning fixes from Aurelien Pocheville
   7784       1.28  (2010-08-01)
   7785               fix bug in GIF palette transparency (SpartanJ)
   7786       1.27  (2010-08-01)
   7787               cast-to-stbi_uc to fix warnings
   7788       1.26  (2010-07-24)
   7789               fix bug in file buffering for PNG reported by SpartanJ
   7790       1.25  (2010-07-17)
   7791               refix trans_data warning (Won Chun)
   7792       1.24  (2010-07-12)
   7793               perf improvements reading from files on platforms with lock-heavy fgetc()
   7794               minor perf improvements for jpeg
   7795               deprecated type-specific functions so we'll get feedback if they're needed
   7796               attempt to fix trans_data warning (Won Chun)
   7797       1.23    fixed bug in iPhone support
   7798       1.22  (2010-07-10)
   7799               removed image *writing* support
   7800               stbi_info support from Jetro Lauha
   7801               GIF support from Jean-Marc Lienher
   7802               iPhone PNG-extensions from James Brown
   7803               warning-fixes from Nicolas Schulz and Janez Zemva (i.stbi__err. Janez (U+017D)emva)
   7804       1.21    fix use of 'stbi_uc' in header (reported by jon blow)
   7805       1.20    added support for Softimage PIC, by Tom Seddon
   7806       1.19    bug in interlaced PNG corruption check (found by ryg)
   7807       1.18  (2008-08-02)
   7808               fix a threading bug (local mutable static)
   7809       1.17    support interlaced PNG
   7810       1.16    major bugfix - stbi__convert_format converted one too many pixels
   7811       1.15    initialize some fields for thread safety
   7812       1.14    fix threadsafe conversion bug
   7813               header-file-only version (#define STBI_HEADER_FILE_ONLY before including)
   7814       1.13    threadsafe
   7815       1.12    const qualifiers in the API
   7816       1.11    Support installable IDCT, colorspace conversion routines
   7817       1.10    Fixes for 64-bit (don't use "unsigned long")
   7818               optimized upsampling by Fabian "ryg" Giesen
   7819       1.09    Fix format-conversion for PSD code (bad global variables!)
   7820       1.08    Thatcher Ulrich's PSD code integrated by Nicolas Schulz
   7821       1.07    attempt to fix C++ warning/errors again
   7822       1.06    attempt to fix C++ warning/errors again
   7823       1.05    fix TGA loading to return correct *comp and use good luminance calc
   7824       1.04    default float alpha is 1, not 255; use 'void *' for stbi_image_free
   7825       1.03    bugfixes to STBI_NO_STDIO, STBI_NO_HDR
   7826       1.02    support for (subset of) HDR files, float interface for preferred access to them
   7827       1.01    fix bug: possible bug in handling right-side up bmps... not sure
   7828               fix bug: the stbi__bmp_load() and stbi__tga_load() functions didn't work at all
   7829       1.00    interface to zlib that skips zlib header
   7830       0.99    correct handling of alpha in palette
   7831       0.98    TGA loader by lonesock; dynamically add loaders (untested)
   7832       0.97    jpeg errors on too large a file; also catch another malloc failure
   7833       0.96    fix detection of invalid v value - particleman@mollyrocket forum
   7834       0.95    during header scan, seek to markers in case of padding
   7835       0.94    STBI_NO_STDIO to disable stdio usage; rename all #defines the same
   7836       0.93    handle jpegtran output; verbose errors
   7837       0.92    read 4,8,16,24,32-bit BMP files of several formats
   7838       0.91    output 24-bit Windows 3.0 BMP files
   7839       0.90    fix a few more warnings; bump version number to approach 1.0
   7840       0.61    bugfixes due to Marc LeBlanc, Christopher Lloyd
   7841       0.60    fix compiling as c++
   7842       0.59    fix warnings: merge Dave Moore's -Wall fixes
   7843       0.58    fix bug: zlib uncompressed mode len/nlen was wrong endian
   7844       0.57    fix bug: jpg last huffman symbol before marker was >9 bits but less than 16 available
   7845       0.56    fix bug: zlib uncompressed mode len vs. nlen
   7846       0.55    fix bug: restart_interval not initialized to 0
   7847       0.54    allow NULL for 'int *comp'
   7848       0.53    fix bug in png 3->4; speedup png decoding
   7849       0.52    png handles req_comp=3,4 directly; minor cleanup; jpeg comments
   7850       0.51    obey req_comp requests, 1-component jpegs return as 1-component,
   7851               on 'test' only check type, not whether we support this variant
   7852       0.50  (2006-11-19)
   7853               first released version
   7854 */
   7855 
   7856 
   7857 /*
   7858 ------------------------------------------------------------------------------
   7859 This software is available under 2 licenses -- choose whichever you prefer.
   7860 ------------------------------------------------------------------------------
   7861 ALTERNATIVE A - MIT License
   7862 Copyright (c) 2017 Sean Barrett
   7863 Permission is hereby granted, free of charge, to any person obtaining a copy of
   7864 this software and associated documentation files (the "Software"), to deal in
   7865 the Software without restriction, including without limitation the rights to
   7866 use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
   7867 of the Software, and to permit persons to whom the Software is furnished to do
   7868 so, subject to the following conditions:
   7869 The above copyright notice and this permission notice shall be included in all
   7870 copies or substantial portions of the Software.
   7871 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
   7872 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
   7873 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
   7874 AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
   7875 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
   7876 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
   7877 SOFTWARE.
   7878 ------------------------------------------------------------------------------
   7879 ALTERNATIVE B - Public Domain (www.unlicense.org)
   7880 This is free and unencumbered software released into the public domain.
   7881 Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
   7882 software, either in source code form or as a compiled binary, for any purpose,
   7883 commercial or non-commercial, and by any means.
   7884 In jurisdictions that recognize copyright laws, the author or authors of this
   7885 software dedicate any and all copyright interest in the software to the public
   7886 domain. We make this dedication for the benefit of the public at large and to
   7887 the detriment of our heirs and successors. We intend this dedication to be an
   7888 overt act of relinquishment in perpetuity of all present and future rights to
   7889 this software under copyright law.
   7890 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
   7891 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
   7892 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
   7893 AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
   7894 ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
   7895 WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
   7896 ------------------------------------------------------------------------------
   7897 */