The Internets Make Programming Easy

Created 2009-11-15 / Edited 2010-07-07

Tags: C++, Programming, UIUC, Data Mining, Homework

In the class I'm taking (Intro to Data Mining) we get to implement a classifier. Alas... we must use either C++ or Java. I haven't programmed in either for quite a few years, and even with further pursuit I'm not allowed to use OCaml.

That's OK! Best to stretch my mind out a bit. I'm afraid I'm getting rusty at being multi-lingual in my old age anyway, so this is good exercise. First thing first, I have to read in the training data. But best start with the basic Hello World even before that. Google... I figure I'll start with the basics, and now I have:

#include <iostream>
using namespace std;

int main() {
  cout << "Hello World!" << endl;
  return 0;
}

I don't even remember doing that std namespace thing last time I programmed in C++. Oh well. It runs! So from there I just keep searching and adding...

... and just like magic I am now reading the file, parsing it, and counting the frequency of items. Ahh the hive mind! Here's the result... it's messy and unstructured but damn... it works!

// Compile: g++ -o demo main.cpp
#include <iostream>
#include <fstream>
#include <vector>
#include <boost/algorithm/string.hpp>
#include <map>

using namespace std;

typedef map<string, int> ItemSet;

int main() {
  cout << "Hello World!" << endl;
  ifstream fin("data/train1.txt");
  // ifstream fin("data/small_train.txt");
  string s;
  vector< vector<string> > d;

  ItemSet itemset;

  while(getline(fin,s)) {
    vector<string> row;
    // cout << "Read from file: " << s << endl;
    boost::split(row, s, boost::is_space());
    d.push_back(row);
  }

  vector< vector<string> >::iterator d_iter;
  vector<string>::iterator row_iter;

  for(d_iter = d.begin(); d_iter != d.end(); d_iter++) {
    cout << "[ ";
    for(row_iter = d_iter->.begin(); row_iter != d_iter->end(); row_iter++) {
      cout << *row_iter << " ";
      itemset[*row_iter]++;
    }
    cout << "]" << endl;
  }

  cout << "Itemset:" << endl;

  ItemSet::iterator itemset_iter;
  for(itemset_iter = itemset.begin(); itemset_iter != itemset.end(); itemset_iter++) {
    cout << itemset_iter->first << " : " << itemset_iter->second << endl;
  }

  return 0;
}

I'll keep going like this for a bit, but then I'll start organizing into actual objects and such. There is a good chance that I'll switch data structures, or perhaps not even bother keeping all this in memory. Most likely I'm also violating some other C++ socio-political norms. Fun!