public static void main(String[] args) throws IOException { if (args.length < 1) { System.out.println("Usage: CompressionVerifier <filename>"); } byte[] onebyte = new byte[1]; String filename = args[0]; FileInputStream fis = new FileInputStream(new File(filename)); ByteArrayOutputStream baos = new ByteArrayOutputStream(); while (fis.read(onebyte) != -1) { baos.write(onebyte); } byte[] data = baos.toByteArray(); System.out.println("Compressing file: " + filename); //Now compress data. JobConf conf = new JobConf(); DefaultCodec cc = ReflectionUtils.newInstance(DefaultCodec.class, conf); cc.setConf(conf); Compressor zcom; zcom = new ZlibCompressor(ZlibCompressor.CompressionLevel.DEFAULT_COMPRESSION, ZlibCompressor.CompressionStrategy.FIXED, // Causes error //ZlibCompressor.CompressionStrategy.DEFAULT_STRATEGY, //Works fine ZlibCompressor.CompressionHeader.DEFAULT_HEADER, 64 * 1024); baos.reset(); CompressionOutputStream uncompressedByteStream = cc.createOutputStream(baos,zcom); uncompressedByteStream.write(data); uncompressedByteStream.close(); baos.close(); byte[] compressedData = baos.toByteArray(); System.out.println("Finished compressing"); DefaultCodec c2 = ReflectionUtils.newInstance(DefaultCodec.class, conf); c2.setConf(conf); CompressionInputStream inpStr = c2.createInputStream(new ByteArrayInputStream(compressedData)); System.out.println("Starting decompression"); while (inpStr.available() > 0) { inpStr.read(); } System.out.println("Verified File!"); }
On most inputs, it works fine, and the execution is uneventful. On certain inputs, you get an IOException for an "invalid distance code" --
Exception in thread "main" java.io.IOException: invalid distance code
at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.inflateBytesDirect(Native Method)
at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.decompress(ZlibDecompressor.java:221)
at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:80)
at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:74)
at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:62)
at com.ibm.utils.CompressionVerifier.main(CompressionVerifier.java:65)
Has anyone else run into this? Using CompressionStrategy.DEFAULT_STRATEGY fixes the problem, so i assume it is specific to the Z_FIXED strategy. If you know the zlib codebase, and care to help verify/fix the problem, let me know, and I can send you the input file that caused the problem.
No comments:
Post a Comment