This is how I encountered the beauty of data science and began my career as a data scientist —Part 1: from my childhood to college.

As every kid did in my generation, I loved playing LEGO, which was the KING of toys in my childhood. In the world of LEGO, I was a young and passionate architect, civil engineer, and rocket scientist :) I loved the moment of new inventions and the feeling of immersion. I spent three years in my small lab, researching and building my own world. Yeah, LEGO was the thing that enabled me to imagine and implement what I dreamed of.

One day, my mom brought me a box full of used toys from my older cousin. In the box, I found an interesting toy, MINI 4WD, a miniature model. There were exotic pieces that I hadn’t encountered before. It was way more complicated than my own version of a miniature car built by LEGO blocks. The new toy meant a lot to me because it was the paradigm shift from static LEGO blocks to the moving parts powered by a motor and batteries. I soon dived into the dynamic world of mechanics. I put a lot of efforts on R&D to enhance the speed of the car. Sooner or later, I succeeded and became the supreme leader among the boys in my town. The small mini 4wd gave me the feeling of achievement as well as the fame. I decided to become an engineer of a sort.

Four years later, my secondary school held a science fair. Every kid was given choices of mini-projects related to science & engineering. For the first time, I encountered a line tracer robot appeared in the list of options. I felt the nostalgia. “Oh! It’s a mini 4wd that I played when I was young! Wait. I have to write a program to operate it? How?” Yes, it was again a mind-blowing moment for me. I obviously chose it as my science fair mini project and later I deepened my passion in the line tracer robot and coding. One day, I received an invitation from a gifted children program from which I met amazing engineering teachers and researchers in robotics. I learned programming in C++, and participated in the national line tracer robot competition.

During the competition, I had to not only excel in programming a robot but write an essay about the future of self-driving car, which I thought was very weird. For the first time, I pondered the implications of the self driving car and its ramifications in the future. My memory of that challenge was very vivid and thrilling not only because I won the Golden medal (yay!) but also because I began dreaming of self driving cars in my future.

The time passed by. I got accepted by UC Berkeley and studied in computer science and statistics for fun. To me, writing codes was like building my world using LEGO blocks. By assembling a piece by piece, I created new things. Only the medium was changed. Why statistics? I just had a gut feeling that I could take advantage of it one day. Not for long. I watched a film about DARPA urban challenge driverless car competition . “What the hell! Self driving car is getting close to the reality! How did it work?” I researched the mechanism of self-driving car. SVM? Kalman filter? SLAM? LiDAR? Machine Learning? Under the hood, it was a mixture of computer science and statistics. “Thank, God. I will study it, master it and build my own self driving car”

I found a book called “The Elements of statistical learning”, which was very tough and theory intense by that time. From then on, I set the goal to read the book and grasped as much as I could. So I planned out my undergraduate coursework based on criteria that could help me understand the theories in the book. As I grasped the building blocks of machine learning, I was deeply indulged in it. I especially liked PCA because its concept was quite involved in the first time. Whenever I met my geeky friends, I explained the meaning of eigenvectors, eigenvalues, first principal component and importance of orthogonality. Later, I started looking at Random Forest, SVM and various optimization methods. After a year, I wanted to leverage what I learned.

With a perfect timing, I found a website, Kaggle. Let’s apply what I learned! I tackled on Titanic challenge in which participants should identify the survival using historical record. First, I applied all of the statistical models I knew by that time. However, I soon realized that throwing data into machine learning algorithms and hoping for the good result was silly and not fun at all. I used to watch an animation called detective Conan when I was young. You know, detectives look at the evidence, make assumptions and analyze them using all the possible evidence or data on their hands to solve the problem. They obtain clues by which they unveil the fact gradually. I realized tackling data science problem was the exactly like how detectives solve problems, and machine learning was one of the skills in my toolbox. From then on, I spent more time on exploratory data analysis to acquire insight hidden in the data. The end result was amazingly fulfilling and I learned enormously from the process of trial and error.

To me, data science was the combination of engineering and detective works. Meanwhile, the discipline from statistics helped me understand when and how to apply certain statistical methods. The algorithmic thinking gained from computer science practically helped me implement solutions in codes, and scale up my solutions to tackle the huge and complicated problems.

Up to here is the story about how I encountered the beauty of data science from my childhood to college. In the next part, I will write about how I began my career as a data scientist. Stay tuned!