Be it big data or small data Pandas is my go to library for data cleansing, enhancing and analysis. It is powered by Numpy arrays which is one of the oldest and powerful numerical Python libraries.
In almost every app I'm dealing with data of some kind. Using Pandas allows me to interact with the data in it's own functional language rather than using a general purpose Programming language.
Dealing with Big Data. It's amazing how much data can fit into today's ordinary computer memory if used efficiently. Pandas/numpy allow data to be stored in perfectly efficient data types (eg. if 8 bit integer is enough use it instead of default 64 bit). If the problem benefits from solving in a distributed way I've successfully used Hadoop with Pandas to deal with massive datasets.
Pandas & Numpy
Uses Cases (continued):
Data Engineering: Often the source data is not at the right level for analysis or not all together. Pandas is great for a variety of these tasks:
Merge different datasets to combine all required data into one set.
Pivot data to improve it's shape and ability to be analyzed.
Group By to create hierarchical data from flat data.
Modify data selectively or derive other features from existing data. Example: derive day of week from date.
Data Analysis: Extracting meaningful information from the data:
Find holes in the data, understand and fill them.
Group By to aggregate data up to higher levels.
Search within the data, slice and dice it.
Pandas is also easily speeded up by writing computationally intensive code in C dealing with numpy arrays and compiled with Cython.
I use scikit-learn for solving Regression & Classification predictive analysis problems. The library works great with Pandas.
A typical workflow:
Read the data into memory (Pandas dataframe) and ensure data types are as they should be: time element if any, strings, integers, floats...
Find any holes in the data and figure out how to fill them (delete them, replace with defaults, replace with calculated values - averages, medians etc.)
Analyze the distribution of the data using histograms, boxplots, scatter plots, line & bar plots.
Convert categorical data appropriately. Typically to dummy variables using get_dummies(data,columns)
Choose a regression algo for continuous variable or a classification algo for a multi-class variable and fit the data
Measure accuracy of fit and beware of overfitting by using the KFold test. Constantly plot to ensure reasonable results.
A lot of my photos taken over the years are hosted here.
I started photography in 2000 with my first film SLR camera and my passion at that time was shooting B&W film and developing at home. It was very exciting, a lot of learning, often satisfying but also frustrating at times.
Photography meant a lot of travelling; I did many road trips around the US.
Over the years I moved on to the digital world starting with a Sony 0.5MP camera, moving on to Olympus 1.2MP camera and finally to the DSLRs.
My first SLR was a Canon and I've stuck with it to reuse the lenses as much as possible.
My equipment today consists of a 50D body with a 50mm F1.8 lens, 70-200 F4 L lens and a Panasonic Lumix Point & Shoot.
A lot of my early work appears lost during the many moves and negligence. I've mostly shot landscapes & some portraits. Low light photography has also been of particular interest. After my daughter was born lugging a tripod along was no longer an attractive option and my passion for photograph has since waned.
Still, here are some lessons I can share:
When buying equipment give priority to light (lower F stops) rather than resolution.
Higher resolution isn't better if it means more noise due to physically smaller sensors.
Use Zoom to change perspective rather than to frame. Walk forwards or backwards for better framing instead. Fixed focal lenses are more fun, higher quality & lighter than zoom lenses.
Use post-processing to improve dynamic range & color correct. Shoot RAW in challenging conditions.
My passion for electronics started when I started exploring micro-controllers. I started with some pre-built Arduinos but soon got into the detailed datasheets for the Atmel chips and began programming them at the lowest levels.
I also delved deeper into senors, displays, relays, chips & power supplies.
Soon I was designing circuits, testing them on breadboards and soldering them onto boards to complete projects.
Arduino is a great project that makes programming micro-controllers very easy. Particularly the Atmel ATMega328; and my favourite the ATtiny85 chip.