Where are we headed?
Overview. Data management skills are enormously valuable in the modern world. We're going to give you those skills, show you how to apply them to economic and financial data, and maybe tell a few bad jokes along the way. Join us!
Buzzwords. Python, code, Google fu.
This book -- and the course we developed it for -- is about data. It's also about tools for working with data, which in this case means Python and its data-related tools. Our focus is economic and financial data, which is what we know best, but the same tools can be applied to any data. By the end of the course, you will have a better idea where to find data that's useful to you, and you will have command over tools you can use to do something interesting with it. We think your life will be more interesting, too, but maybe that's just us.
Answers to common questions
Why should I do this? It's an investment in your future. You will learn how to process data and communicate its content effectively and efficiently. You will have more fun. And you will be more valuable to current and future employers.
Can't I do what I need in Excel? Excel is a great program, but once you have a little programming experience it will remind you of doing arithmetic on your fingers. With Python, you will be able do routine tasks more efficiently ("automate the boring stuff," as one guide suggests), handle larger data sets, rearrange datasets at will, and generally do things that spreadsheet programs can't do.
What are the prerequisites? There are none. We start at the very beginning and go from there. What you will need is the courage to take on a challenge and the patience to debug programs that don’t quite work -- and they never work the first time, and often not the second or third time either. Don't panic. Ask for help and remind yourself that patience is a virtue.
What if my quant skills are weak or nonexistent? Then this is the course for you! We do our best to make the material accessible. We're looking beyond quants to marketing, management, and humanities majors. One of our design team was an English major.
Will this turn me into a programmer? You will come out of the course somewhere between Brad Pitt and Jonah Hill in "Moneyball," with a solid foundation for dealing with whatever data comes your way. You will not be ready for a career as a programmer, but you will be able to work effectively with people who know more and be able to do things yourself that Excel users can only dream about.
Will this turn me into a data scientist? Sadly, no. But you will have a solid foundation for pursuing the many technical topics that fall under the rubrics of data science and machine learning. See, for example, the extensive collection of courses on business analytics and data science offered by our IOMS and CS groups.
Should I take this course if I already know how to code? You're welcome to, and you will learn a lot about data and the data components of Python. But please don't scare the other students.
Is there anything Python can't do? Well, it can't swallow a porcupine. Someone is working on pretty much everything else.
Why data?
We're living in a world of data: data about the economy, data about financial markets, data about your business. Data doesn't solve all our problems, but it's a valuable input to better decisions. For example, how to choose a college major.
Many of our former students tell us that data skills keep them in business. One of our alums analyzes television viewer data for a network. The datasets are too large for Excel, so he uses Python. Another manages attendence data for a major league baseball team. A third works for a quantitative hedge fund, where Python is the tool of choice. A fourth is worried that you won't need him after this course. Even students with non-technical backgrounds tell us that basic data and programming skills are, if not required, at least very useful in their jobs. One of our marketing majors, for example, needs to interface with her company's SQL database to get the data she needs to do her job.
Why Python?
Python is a popular general-purpose programming language that has been used for a broad range of applications. Google uses it. So do Instagram and Netflix. Dropbox is written in Python.
We think Python is the language of choice right now if you want a user-friendly introduction to programming and a useful tool for day-to-day data work. It's a high-level language, which means the language does a lot of the work. It has a broad range of applications and an enormous community of users. You'll come to appreciate both. And it's free and open source. Free means you pay nothing. Open source means you can look at the code if you want to see how something works.
Everyone likes Python
Python isn't just a useful language, it's one people like. We're talking about programmers here, for the most part, but even us novices find that its casual simplicity makes coding fun.
Writer and programmer Paul Ford puts it this way:
People love [Python] and want it to work everywhere and do everything. They’ve spent tens of thousands of hours making that possible and then given the fruit of their labor away. That’s a powerful indicator. A huge amount of effort has gone into making Python practical as well as pleasurable to use.
He's alluding here to the vast community of users who are developing tools that allow Python to do all kinds of things. Python's data tools are an example: they're not part of the core langauage, they're add-ons developed by users. He adds: "Python people are pretty cool," so there's that, too.
Our approach
There is a method to our madness. We will start quickly so that we can reach more interesting topics quickly -- On the bright side, this means that our workload will be heaviest when the workload of your other classes is lightest. We promise that it gets easier in spite of being a bit tough up front. These are some of our guiding principles to put what is being taught into perspective.
One step at a time. You can't do this in a day. It pays to work on the fundamentals before moving on to advanced data tools. Our friend Tom Sargent preaches, "Don't skip steps." We try to follow this principal.
Stress the basics, ignore the rest. We think once you understand the basics, you'll be in a good position to work out special cases on your own. That allows us to strip out a bunch of confusing detail, which we think is good for everyone.
Learn to teach yourself. The best way to learn new things is to teach yourself: Google your problem and find the resources you need. We build that attitude in from the start, suggesting ways in which you can solve problems yourself. But remember: it also helps to be in a supportive environment, where you can ask for help when you need it.
Online book preferred. We sometimes print out the pdf ourselves, but the online version comes with live links. We'll update it frequently as new ideas come to mind. We think it's a superior user experience.
Code and applications. We attack data applications and programming together. After covering Python basics, we generally organize things around specific applications, covering the relevant aspects of Python along the way. We think it helps to have a context for what we're learning, but the downside is that it's somewhat harder to use the book as a programming reference. We still think it makes sense. Our goal isn't to produce programmers, but people who know enough about programming to offer an interesting perspective on data (often through a graphic).
Work habits
There are no shortcuts in learning how to code. You simply need to spend hours doing it. Progress will seem slow at first, but if you stick with it things will start to look familiar, and even make sense. You may even start to think of projects as fun, and revel in your new-found power over data.
As you work your way up the learning curve, keep this advice in mind:
Don't panic. Learning a new language takes time, it won't happen in a week.
Stick with it. The secret is to keep working. Trust us, things will start to make sense in a couple weeks. Here's a great example. How can you not love someone who writes: "How I learned to stop worrying and love the code"?
Practice, practice, practice. Any time you have something to do with data, try it out in Python. Play around, try new things, have fun. As you gain experience, you'll find that Python becomes easier.
Ask for help. If you get stuck, ask for help -- from friends, from your Bootcamp classmates (open a discussion on our google group), or from us (the teachers of the course). We love this one: "I failed my first computer class miserably. ... The next time something clicked -- I made the decision to raise my hand in class and admit publicly that I was completely lost. To my surprise, I found that not only the teacher, but also other students in class were eager to help."
Make friends. Coding is hard to learn on your own. A second pair of eyes is indespensible. So work with friends, and make new friends who know how to code. Intense coding sessions are a great way to develop relationships.
Work on your Google fu. With a little help from Google, you will find that many of your questions have been asked before. Even better, they have been answered. One way to find them: Google something like "python [problem]." Don't forget the problem; without it, you get pages and pages of snakes.
Be patient. We know, it's easier to say than do, but it pays to take your time. Coding in a hurry is a recipe for frustration and failure.
Wordplay
Python is named for Monty Python, a group of comedians whose humor appeals to the tech crowd. Idle, a well-know Python editor, is a reference to Python-member Eric Idle. The Python Package Index, a repository of Python packages, is commonly known as the Cheese Shop, a reference to a famous Monty Python skit. The Anaconda distribution (next chapter) is a play on the word python.
Resources
The resources section at the end of each chapter is a collection of (mostly) links to things we've found useful. They're more than you need, but give you some recommended options if you want to follow up on a specific topic.
Here we'll say simply that all of the materials for this book and the associated course are posted online:
- Website. Everything is posted on our class website.
- Book. It's hosted by GitBook.
- Code. We give links to the relevant code at the start of each chapter, but if you want them all, look in the Code directory of the GitHub repo. If you save them, remember to click on the Raw button in the upper right. (This is an oddity of GitHub, which distinguishes between a display of the file and the file iself.)
- Other materials. Pretty much everything else is available within a repository in our GitHub organization.