Data Hack Tuesday

Encoding Text
October 27, 2020
Here’s a great way to encode 1GB of text in under 20 sec using HuggingFace Tokenization shorturl.at/eixZ4
View MoreHere’s a great way to encode 1GB of text in under 20 sec using HuggingFace Tokenization shorturl.at/eixZ4

Generic codebase
October 20, 2020
Improve the speed of your data hacks by creating a reproducible generic codebase. shorturl.at/gxIR6
View MoreImprove the speed of your data hacks by creating a reproducible generic codebase. shorturl.at/gxIR6

Columns
October 13, 2020
Apply separate transformations on every column using Sklearn’s ColumnTransformer https://scikit-learn.org/stable/modules/generated/sklearn.compose.ColumnTransformer.html
View MoreApply separate transformations on every column using Sklearn’s ColumnTransformer https://scikit-learn.org/stable/modules/generated/sklearn.compose.ColumnTransformer.html

Intro to linear regression
October 6, 2020
Perform linear regression by tapping into Python’s indispensable sklearn library https://scikit-learn.org/stable/index.html using the make_regression() function. Now, you can completely control […]
View MorePerform linear regression by tapping into Python’s indispensable sklearn library https://scikit-learn.org/stable/index.html using the make_regression() function. Now, you can completely control your data’s behavior whether you want a mini random data set or need to debug your algorithm.

Descriptive, prescriptive, or predictive?
September 29, 2020
What type of analytics makes the most sense for your organization? Naturally, this depends on the project and on your […]
View MoreWhat type of analytics makes the most sense for your organization? Naturally, this depends on the project and on your goals. The three main types of analytics that you will want to consider are:
- Descriptive
- Predictive
- Prescriptive
Here’s a handy reminder of how to apply each of these.
Prescriptive analytics will help you to answer the question of what your organization should do. Compare this to descriptive analytics which is the method for understanding your current situation or problem and reviewing how it looked in the past. Unlike prescriptive analytics, descriptive will not address a question specifically and will instead present a clear overview of the problem. This differs considerably from predictive analytics which involves making prediction about future behaviors or results based on current data and trends.

Data Security
September 22, 2020
Thanks to the sudden increase in remote work, companies must now confront a new set of security challenges. One small […]
View MoreThanks to the sudden increase in remote work, companies must now confront a new set of security challenges. One small way to help protect company data is to avoid downloading data to spreadsheets and Excel files.
While this is common enough, a few problems can result from downloading data to a spreadsheet:
-It is no longer possible to control how the data is used or with whom it is shared.
-The files could be exploited.
-The risk of confidential information becoming exposed is greatly increased.

DSUM of all things
September 15, 2020
Today’s Data Hack takes you back to basics to make the most out of Excel. You don’t need to be […]
View MoreToday’s Data Hack takes you back to basics to make the most out of Excel. You don’t need to be a scientist in order to sort and analyze data in a basic way. Check out this short overview of database functions that will help you parse information from lists:
DCOUNT – Count the number of cells with values
DMAX – Finds the largest value in a list
DMIN – Finds the smallest value in a list
DSUM – Calculate the sum of values matching criteria
These functions as use a three-argument syntax. Here is an example:
=DAVERAGE(database,field,criteria)

Method or Function?
September 8, 2020
Today’s Data Hack is a helpful reminder of the difference between methods and functions in Python and when to use […]
View MoreToday’s Data Hack is a helpful reminder of the difference between methods and functions in Python and when to use them. Let’s begin with methods:
While a method is called by its name, it is still associated with an object (dependent). This may or may not return data. Another important feature is that a method can operate data contained by the corresponding class. Here is an example:
Basic Python
class class_name
def method_name():
…………
# method body
…………
Method class
class Meth:
def method_meth (self):
print (“This is a method_meth of Meth class.”)
class_ref = Meth() #object of Meth class
class_ref.method_meth()
This differs from functions, which are blocks called by their name (independents). A function can have different parameters and if any data is passed, then it is done quickly. The function, however, does not interact with class. Here is an example:
def function_name(arg1, arg2, …):
#function body
In other words, methods and functions look similar but the central difference is in “Class and its Object.” A function can only be called by its name and is defined independently. For a method, you must invoke the class by reference of the class in which it is defined.

Style with Conditional Formatting
September 1, 2020
If you are looking to adjust the visual styling of a DataFrame, then you should definitely try applying conditional formatting. […]
View MoreIf you are looking to adjust the visual styling of a DataFrame, then you should definitely try applying conditional formatting. This can be done using the DataFrame.style property. With this property, you will see a Styler object returned which can be useful for formatting and displaying DataFrames. All of this is done using CSS by writing “style functions” that will take scalars, DataFrames, or Series, and come back with like-indexed DataFrames or Series with CSS: “attribute: value” pairs for the values. We recommend looking more deeply into Styler to learn more about applying styling functions to a DataFrame in different ways.