Hive’s LOAD DATA fails to import many files with exception in org.apache.hadoop.hive.ql.exec.CopyTask

Interesting issue I came across recently, loading a large set of files that are coming from Localytics into Hive using Hive’s command line interface. The script that loads that data,basically contained something like 60k LOAD DATA statements that Hive was suppose to execute and LOAD DATA from each file into a table. This was all running smoothly on ElasticMapReduce, until, seemingly random exception, caused it to fail:

Failed with exception null
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.CopyTask

After some investigation I saw noticed a number of open files during the process was growing like crazy until it hit the limit which apparently was the root cause for the exception.

Some more googling around brought me to this unresolved bug report: https://issues.apache.org/jira/browse/HIVE-2485

I guess, not the most critical issue for Hive folks, but still not a pleasant one. My work around was the to split the .q files into chunks not larger then 28000 and iterate over them.