错误显示在Java中读取MAPREDUCE程序的CSV文件

以下代码是mapreduce中的Mapper类。我要编码的是读取CSV文件，并将两个列数据存储在每行(第1列(中，该数据表示userId和第6列，将CheckOutDateTime显示为book的HashMap。我认为StubMapper类中getMapFromCSV功能的代码似乎是错误的。有人可以启发我吗？在底部，我将输出输出错误。谢谢大家的任何帮助和建议。

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.Date;
import java.util.HashMap;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class StubMapper extends Mapper<LongWritable, Text, Text, MinMaxCountTuple> {
    private Text outUserId = new Text();
    private MinMaxCountTuple outTuple = new MinMaxCountTuple();
    private final static SimpleDateFormat frmt = 
            new SimpleDateFormat("yyyy-MM--dd'T'HH:mm:ss.SSS");
    public static HashMap<String, String> getMapFromCSV(String filePath) throws IOException
    {
        HashMap<String, String> words = new HashMap<String, String>();
        BufferedReader in = new BufferedReader(new FileReader(filePath));
        String line;
        //= in.readLine())
        while ((line = in.readLine()) != null) {
            String columns[] = line.split("t");
            if (!words.containsKey(columns[1])) {
                words.put(columns[1], columns[6]);
            }
        }
        //in.close();
        return words;

    }
@Override
  public void map(LongWritable key, Text value, Context context)
            throws IOException, InterruptedException {

      HashMap<String, String> parsed = getMapFromCSV(value.toString());
      //String columns[] = value.toString().split("t");
      String strDate = parsed.get("CheckoutDateTime");
      //String userId = columns[1];
      //String strDate = columns[6];
      String userId = parsed.get("BibNumber");
      try {
        Date creationDate = frmt.parse(strDate);
        outTuple.setMin(creationDate);
        outTuple.setMax(creationDate);
        outTuple.setCount(1);
        outUserId.set(userId);
        context.write(outUserId, outTuple);
      } catch (ParseException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }

  }
}

显示以下错误，我无法弄清楚它。我认为问题似乎发生在getMapFromCSV函数中，该功能在StubMapper类中。该功能的参数将具有CSV属性的信息。我试图将其存储到HashMap中的是关键，并且值为一对。但是，我不知道如何改变。请指定您是否知道如何解决。

java.io.FileNotFoundException: Code,Description,Code Type,Format Group,Format Subgroup,Category Group,Category Subgroup (No such file or directory)
    at java.io.FileInputStream.open(Native Method)
    at java.io.FileInputStream.<init>(FileInputStream.java:120)
    at java.io.FileInputStream.<init>(FileInputStream.java:79)
    at java.io.FileReader.<init>(FileReader.java:41)
    at StubMapper.getMapFromCSV(StubMapper.java:27)
    at StubMapper.map(StubMapper.java:50)
    at StubMapper.map(StubMapper.java:14)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:673)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:331)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)

您缺少mapreduce中的重要概念。问题在于以下行

public void map(LongWritable key, Text value, Context context)
            throws IOException, InterruptedException {
// Below is the problematic line
      HashMap<String, String> parsed = getMapFromCSV(value.toString());

也许您假设Text value是CSV filename，因此试图从文件中获取值。

它不能那样工作。映射器的Text value输入是CSV文件的一行。

假设您的CSV在下面：

Code,Description,Code Type,Format Group,Format Subgroup,Category Group,Category Subgroup
111,sample description,codeType1,IN,....

您的代码应该像

@Override
  public void map(LongWritable key, Text value, Context context)
            throws IOException, InterruptedException {
  if(value.toString().startWith("Code,Description")){
      // Skip header line (first line) of CSV
       return;
  }
  String data[] = value.toString().split(",", -1);
  String code= data[0];
  String codeType = data[2];
....
....
and so one

该错误出现在此行中：

BufferedReader in = new BufferedReader(new FileReader(filePath));

检查filePath的值
检查文件是否位于filePath
检查文件的内容是否有效

相关内容

最新更新

热门标签：