Data Hack Tuesday

Pandas

Pandas

Still working on datasets with Excel?  Maybe consider giving Python a chance?  Many Data Scientists have moved on to working with Python to manipulate, engineer, and analyze large datasets.  Use the Python library, Pandas, to retain that familiar Excel format but with...

read more
Data Profiling for Business Analytics

Data Profiling for Business Analytics

Quite simply, data profiling is an analysis of the content of a data source. This process allows you to assess the quality of data which may ultimately have an effect on your business analytics. There are three main types of profiling to keep in mind: 1) Content...

read more
Keep Track of Your Data Lineage

Keep Track of Your Data Lineage

Tracking data lineage is as simple as indexing items with keys. This way, the data is tagged with an identifier that will follow it through every process. Examples of this type of identifying information include: names of authors, copyright, date of the original...

read more
Avoid These Time Killers

Avoid These Time Killers

Have your projects been going over your scheduled time lately? A lot of time can slip away just collecting data and preparing it for use. Here are a few time killers to watch out for: -Redundancies in checking the data.-The inability to license data.-Trouble finding...

read more
Develop a Rigid Workflow

Develop a Rigid Workflow

The best way to ensure that your data is as clean as possible is to develop a rigid workflow. Most data scientists rely on the following eight steps: Start by gathering and storing your data.Verify the integrity of the data.Clean and format the data.Analyze it for any...

read more
Keep an internal inventory of data sources

Keep an internal inventory of data sources

It may seem like an obvious point, but it is hard to overemphasize the importance of keeping your own inventory of internal data sources. This will be more than just accounting systems or your web servers' analytic files. Think carefully about the many different types...

read more
Think about Design First

Think about Design First

Today's Data Hack Tuesday tip may seem obvious, but is too often overlooked. Before analyzing data for any project, it is important to consider the design first. This will mean asking yourself a few questions: 1) What metrics apply to this project? 2) Is this the...

read more
Cleaning Data in Excel

Cleaning Data in Excel

We all know the old saying: data scientists spend 80% of their time cleaning and manipulating data and only 20% of their time actually analyzing it. In which case we could all use a few tricks to speeding up that process. Here are today's pointers for cleaning data in...

read more
Cross-validation

Cross-validation

We have a simple tip for today to help evaluate models and avoid overfit. Start by separating your data into two sets. These will be the training seat and testing. The next step is to engage in cross-validation to analyze numerical data without over-fitting. This...

read more