Tags: C++, Programming, UIUC, Data Mining, Homework
In the class I'm taking (Intro to Data Mining) we get to implement a classifier. Alas... we must use either C++ or Java. I haven't programmed in either for quite a few years, and even with further pursuit I'm not allowed to use OCaml.
That's OK! Best to stretch my mind out a bit. I'm afraid I'm getting rusty at being multi-lingual in my old age anyway, so this is good exercise. First thing first, I have to read in the training data. But best start with the basic Hello World even before that. Google... I figure I'll start with the basics, and now I have:
#include <iostream>
using namespace std;
int main() {
cout << "Hello World!" << endl;
return 0;
}
I don't even remember doing that std namespace thing last time I programmed in C++. Oh well. It runs! So from there I just keep searching and adding...
... and just like magic I am now reading the file, parsing it, and counting the frequency of items. Ahh the hive mind! Here's the result... it's messy and unstructured but damn... it works!
// Compile: g++ -o demo main.cpp
#include <iostream>
#include <fstream>
#include <vector>
#include <boost/algorithm/string.hpp>
#include <map>
using namespace std;
typedef map<string, int> ItemSet;
int main() {
cout << "Hello World!" << endl;
ifstream fin("data/train1.txt");
// ifstream fin("data/small_train.txt");
string s;
vector< vector<string> > d;
ItemSet itemset;
while(getline(fin,s)) {
vector<string> row;
// cout << "Read from file: " << s << endl;
boost::split(row, s, boost::is_space());
d.push_back(row);
}
vector< vector<string> >::iterator d_iter;
vector<string>::iterator row_iter;
for(d_iter = d.begin(); d_iter != d.end(); d_iter++) {
cout << "[ ";
for(row_iter = d_iter->.begin(); row_iter != d_iter->end(); row_iter++) {
cout << *row_iter << " ";
itemset[*row_iter]++;
}
cout << "]" << endl;
}
cout << "Itemset:" << endl;
ItemSet::iterator itemset_iter;
for(itemset_iter = itemset.begin(); itemset_iter != itemset.end(); itemset_iter++) {
cout << itemset_iter->first << " : " << itemset_iter->second << endl;
}
return 0;
}
I'll keep going like this for a bit, but then I'll start organizing into actual objects and such. There is a good chance that I'll switch data structures, or perhaps not even bother keeping all this in memory. Most likely I'm also violating some other C++ socio-political norms. Fun!