Data Hack Tuesday

DSUM of all things

September 15, 2020

Today’s Data Hack takes you back to basics to make the most out of Excel. You don’t need to be a scientist in order to sort and analyze data in a basic way. Check out this short overview of database functions that will help you parse information from lists:

DCOUNT – Count the number of cells with values

DMAX – Finds the largest value in a list

DMIN – Finds the smallest value in a list

DSUM – Calculate the sum of values matching criteria

These functions as use a three-argument syntax. Here is an example:

=DAVERAGE(database,field,criteria)

Method or Function?

September 8, 2020

Today’s Data Hack is a helpful reminder of the difference between methods and functions in Python and when to use them. Let’s begin with methods:

While a method is called by its name, it is still associated with an object (dependent). This may or may not return data. Another important feature is that a method can operate data contained by the corresponding class. Here is an example:

Basic Python

class class_name
def method_name():
…………
# method body
…………

Method class

class Meth:
def method_meth (self):
print (“This is a method_meth of Meth class.”)
class_ref = Meth() #object of Meth class
class_ref.method_meth()

This differs from functions, which are blocks called by their name (independents). A function can have different parameters and if any data is passed, then it is done quickly. The function, however, does not interact with class. Here is an example:

def function_name(arg1, arg2, …):

#function body

In other words, methods and functions look similar but the central difference is in “Class and its Object.” A function can only be called by its name and is defined independently. For a method, you must invoke the class by reference of the class in which it is defined.

Style with Conditional Formatting

September 1, 2020

If you are looking to adjust the visual styling of a DataFrame, then you should definitely try applying conditional formatting. This can be done using the DataFrame.style property. With this property, you will see a Styler object returned which can be useful for formatting and displaying DataFrames. All of this is done using CSS by writing “style functions” that will take scalars, DataFrames, or Series, and come back with like-indexed DataFrames or Series with CSS: “attribute: value” pairs for the values. We recommend looking more deeply into Styler to learn more about applying styling functions to a DataFrame in different ways.

How to add Table of Contents

August 25, 2020

Bring order to your life by learning how to add a Table of Contents with internal section-links to Markdown documents. This can be done with a simple Python command line script: markdown_toclify.py

Just copy the stand_alone script ./markdown_toclify/markdown_toclify.py to a local directory on your computer. From there, all you need to do is supply a Markdown-formatted input file and the modified Markdown contents will be printed to the standard output screen.

Investigating Unexpected Data Sources

August 18, 2020

As we all know, data sources are plentiful. But are you finding every possible source of data to learn more about your operations and make optimize efficiency? Put on your sleuthing hat as we investigate a few creative sources you may have overlooked (some obvious; some less so):

  • Social media feeds
  • Sales data
  • Website traffic logs
  • Customer service inquiries (questions, complaints)
  • Internal data (ex. exit interview records)
  • Metadata (ex. demographics)
  • Publicly available data (ex. info collected by the Bureau of Economic Analysis)

Outline Your Code

August 11, 2020

Quick tip on accelerating your coding process: outline your code first.  A common issue many coders experience is coding out their program without direction.  This can lead to backtracking and bugs, which can take up plenty of time.  Having an outline of what your code will do and how you get there will speed up your programming time significantly.

How to Deal with Empty Values

August 4, 2020

When working with large data sets, you will potentially come across null or empty values. Here are a few ways to deal with them:

  1. Delete them – Remove the whole data point together as long as it is not too important to keep.
  2. Replace them – Use the mean, median, or mode of the data set (or something else entirely).
  3. Keep them – Some machine learning models are capable of handling null values effortlessly.

Data Frame

July 28, 2020

Quick tip on using data visualization in Pandas. If you want to quickly display your dataframe, then use the pd.DataFrame.plot(). By default, it will plot your data in a line; however, if you need other types of charts for your data then just state “bar”, “hist”, etc within the parentheses. Use this command for when you need a visual representation without the use of another Python library.

Jupyter Notebook

July 21, 2020

A common IDE many data scientists have found themselves using is a Jupyter Notebook. We’re not saying everyone should switch to using a Jupyter notebook but consider giving it a shot. Its features make it easy for developers to display and explain their code. The option to change a cell to utilize Markdown syntax and display allows developers to present their code with headings, titles, and hyperlinks!