Year over year posts

2020

Python development environment 2020 edition

3 minute read

It’s 2020 and I’m still using Python for most of my stuff. I’ve been keeping an eye on Julia because of its roots in numerical computing, but Python is still...

Free Springer books to download

52 minute read

I came across a large list of free to download books from Springer. It’s awesome! There are tons of good books there.

Encrypted Backup Redux

less than 1 minute read

gpgzip seems to be deprecated, and the recommended tool to replace it is called gpgtar. It’s easy to use it, but the documentation doesn’t make it clear that...

Applying Kanban for Data Science

2 minute read

I’ve been experimenting with how to manage the Data Science part of a larger project. One of the biggest issues is that Agile sprints are a poor fit to data ...

Back to top ↑

2019

New Ubuntu install

3 minute read

Yeah, I tried. Again. For reasons that are not important, I had to install Windows again on my personal laptop. And this time I said, yeah, let’s try it and ...

Anomaly detection - review

12 minute read

This is an introduction of anomaly detection and possible approaches for time series. This article is heavily based on the paper “Anomaly Detection: a Survey...

Docker all the things (at least skype)

less than 1 minute read

I hate using Skype these days. It’s a mess, the interface is all over the place, it requires you to install a ton of things on the system, it crashes, and so...

Postmortem of a data analysis mistake

2 minute read

Making mistakes is something that will happen, no matter what, no matter how many guarantees, no matter how well prepared or designed a plan or a system can ...

A walk in creating a new Python project

2 minute read

Let’s say that you’re worried that your home internet is not working the way it should. Let’s say that you think that the latency should be lower that it is,...

Bullet Journal in Data Science Projects

2 minute read

During this long holiday, the brazilian carnival, I started doing a little bit of retrospective. Thinking on how I’ve been doing my work and organizing it, a...

Chronicles of a Data Scientist

1 minute read

As my readers (if zero can be considered a number…) can see in this blog, my posts follow the same recipe that is common for other blogs: register of things ...

Back to top ↑

2018

Can I replace NaN with zeros?

2 minute read

This is a common question when working with data. There are lot of situations where we get a dataset and there are NaNs (Not a Number) values on it. We want ...

Secure Jupyter Notebook configuration

less than 1 minute read

The first thing that I do after installing my Python environment is to configure Jupyter Lab to run under HTTPS. This is important for several reasons.

Python versions with Pyenv

1 minute read

The great easy of use of Python can be hindered by its own success. Python is a cornerstone of Linux, as many things depend on a running Python interpreter. ...

Back to top ↑

2017

Address Normalization with Python and NLTK

4 minute read

Addresses in databases, especially ones that are inserted by human operators, are prone to a wide range of forms and errors. To be able to correctly identify...

Back to top ↑