Protecting Data Science Projects From Ransomware 

James Phoenix
James Phoenix

Ransomware is an extremely common cybercrime threat that targets both individuals, small teams, larger teams and enterprise-level businesses. 

Ransomware is a form of malware that purposefully encrypts files to prevent access. Once a computer is infected, the target is issued a decryption message instructing them to pay a ransom, often using cryptocurrency

goldeneye ransomware
Goldeneye Ransomware

Research by Sophos indicates that half of surveyed organisations were attacked by ransomware in 2019. In approximately 75% of cases, the attackers managed to encrypt data to instigate the ransom. The majority of surveyed companies retrieved their data, primarily through backups, which were cheaper than the cost of paying the ransom. 

When ransomware targets an individual, it encrypts their files, locking them out of their PC, and threatens to delete them if they fail to pay a ransom. There are many incarnations of ransomware, including the well-known CryptoLocker, CryptoWall, Locky, Wannacry and TeslaCrypt. 

Once the user grants access to the ransomware, e.g. by downloading a malicious file, the issue can quickly spread through a network. This means that ransomware attacks are not necessarily confined to a single device – they can quickly spread through an entire connected network. 

This is a guide to protecting data-centric projects, businesses and freelance operations from ransomware.


Ransomware Examples

Here’s a typical example of ransomware targeting a business:

A business with around 200 employees is attacked with ransomware taking the form of a fake email from a “customer”. 

The fake email contains a link to a file. An employee downloads the file, which encrypts their data while searching for connected networks. The file finds numerous business-critical files on networks and encrypts them too. 

Among the encrypted data are accounting and customer data containing sensitive personally identifiable information (PII). As a result, the business suffers financially and reputationally. 

Wannacry Ransomware

Here’s another example of ransomware in a small team/consultancy setting: 

A small team handles business data for multiple clients. Employees have cloud management apps downloaded on their PC, which links to the same cloud account that contains sensitive client information and data science projects. 

Once a PC is infected via the typical route, the ransomware spreads to the team’s cloud database and encrypts that too. While cloud computing is often seen as protective against ransomware, Dropbox highlight concerns here. While restoring cloud backups is simple, that doesn’t impede the possibility of data loss and damage. 

Here’s a variation of a ransomware attack: 

SaaS tools are ubiquitous in the realms of data science, data engineering, data analytics, DevOps, etc. On average, businesses have some 40 to 60 SaaS tools, and over half are poorly managed. 

Now, if each team member has separate login credentials, you’re talking about a vast quantity of further vulnerabilities. Moreover, one login can grant permission to both the company’s data and their client’s data. 

Access to one tool leads to another – one brute force attack may enable hackers access to a vast quantity of data. Employee responsibility is massive here, as multiple accounts and logins can lead to poor password security – especially when employees aren’t securing their sensitive data. 

How they use that data is up to them. They could use it to hold the business(s) to ransom, sell it on, leak it, commit other forms of fraud, or a combination thereof. While login security such as T4A is helping prevent account take over (ATO) in this setting, it’s down to the business and team members to take advantage of the security features their tools allow. This is seldom the case.


Ransomware and Data Science

Ransomware is particularly problematic for those who store critical data on their computer or networks. As a result, data scientists, engineers, programmers and developers are a strong target for ransomware attackers, especially if the cybercriminals know who or what businesses they’re linked to. 

In many cases, it might be easier to target the data scientist or engineer than the business.

Ransomware attacks typically target the underlying asset of a business of project: the data itself. Anyone who works with valuable business-critical is a potential target.

So, if you work with sensitive data, either alone or as part of a team, then you need to take steps to secure that data from ransom. 

Here are some data science-specific security considerations and solutions:


1: Don’t Store Credentials Inside Github/Gitlab Repositories

Applications require access to various database passwords, authorisation tokens, encryption keys, API keys, certificates, etc. These are often called “secrets”. Storing secrets with code and saving them in Git is a bad idea.

This includes hardcoded secrets into your application, (e.g. storing database password in the source code), and configuration files with secrets alongside source code.

Even if you’re the sole developer, this is good practice that protects your credentials in both the present and future. Keeping secrets away from Git futureproofs your work and enables you to invite collaboration anytime without worrying about what’s in your code.

Instead, use external configuration files and environment variables. Environment variables are set in the environment an app runs in and are much more secure than storing credentials in or alongside code.

Additionally, take advantage of .gitignore to instruct Git to ignore files.


2: Use Secrets For Deployments

In addition to the above, it’s sensible to use secrets for deployments, like Amazon Secrets and Azure Key Vault. These enable you to save all passwords, API keys, TLS certificates, etc, in a vault with a singular key for access.

Once your app starts, it calls all the secrets from the vault. This way, you don’t need to deploy config or env files. It also simplifies the process of updating secrets and enables you to distribute apps horizontally.


3: Limit User Permissions

When you deploy a data science project to the cloud, give the cloud service the minimum number of permissions.

Create a new IAM role for that specific piece of technology, that way, if it does get hacked, then the hacker will only have a limited number of roles.


4: Create a New Google Cloud Platform Project or AWS Project For Each Project

Create fresh projects each time you embark on a new project. That way, if a project is hacked, the hacker has access to only those resources.

Keep your workflow well-organised to avoid creating a single access point for someone to gain control of all your resources. Keeping different passwords for different accounts, tools and projects is an important part of that.


How to Protect Your Freelance Business From Ransomware

Freelance data scientists, engineers and developers should maintain rigorous security for themselves and their clients. Hackers are clever, and just because you’re a one-person-band doesn’t mean you can fly under the radar. 

Hackers and cybercriminals have been known to check social media to find who individuals work for and then target them rather than the business itself. An individual’s systems are likely not as robust as the business’s, providing criminals with an easier route to ransom, fraud and theft. 

Moreover, while an employee who follows protocol is probably not liable for committing any sort of offence by their systems being hacked, the same cannot be said for a freelancer. If you work with sensitive client data and you’re not protecting their data, you’re taking a big risk. 


1: Be Diligent With What Data You Collect

Firstly, be diligent about the data you collect from clients. The client might give you full reign over their systems, for example, in which case you have a moral question to answer about what you download and use. Compliance is an issue here too, as you may not have a legal right to some data, even if you’re physically allowed to access it. 

When dealing with sensitive data, ask yourself what data you really need. Don’t download data for your models or projects if you don’t need it. If you don’t use some data, don’t leave it knocking about on your hard drive – securely delete it instead. 

In general, storing unencrypted sensitive data on your hard drive is a bad idea. Instead, consider using encrypted private clouds and encrypted hard drives to protect data at rest. 


2: Secure Your Clouds

Clouds have overtaken on-prem, but safety is not guaranteed. Just because your data is stored on the cloud doesn’t make it secure, especially when using relatively unprotected cloud management apps installed on insecure operating systems. 

Teams should benchmark their clouds for VMs, storage, database, management consoles and container configuration issues. CIS benchmarks cover many major cloud-based tools, like Google Workspace. 


3: Focus on the Basics

Cybersecurity basics include firewalls, antivirus, antimalware, 2FA, WAF and secure email gateways. Invest in these products and update your personal and team systems regimentally. Regularly discuss data breaches and cybercrime to increase team vigilance. 

Cybersecurity Basics

4: Beware Public WiFi And Man-in-The-Middle Attacks 

Instead of infecting the cloud with ransomware, hackers can simply gain login credentials with a man-in-the-middle attack, change the login and encrypt the files.

That makes it harder to retrieve your files from your cloud backup. Moreover, even if replenishing your cloud files from a backup is simple, do you have peace of mind knowing the hacker owns a copy of your data? That’s still enough to hold many businesses at ransom for fear of leakage. 

Refer to this post on staying anonymous online to discover techniques to protect yourself from man-in-the-middle attacks. Key techniques include: 

  • Avoid unencrypted URLs (e.g. HTTP and not HTTPs). Toggle ‘always use HTTPS’
  • Avoid exchanging or using sensitive data when signed into public networks. 
  • Use a good VPN, but beware that it’s not always sufficient
  • Clear the network from your device when you stop using it
  • Exercise caution when surfing sites with certificate errors
  • Turn off all file sharing

5: Keep Track of Your Systems and Users

Once you add tools and systems to your repertoire, you may lose track of where your data is. In fact, you might not even realise when your data is breached! So, keep track of your data silos and SaaS tools and who’s using them at all times.

For example, you may temporarily hand over logins to someone to collaborate on a project. But what if you forget and add some sensitive data to the tool that you believe you have sole access to? Poor tool and systems management creates these types of issues, so keep a tight record of what data is where and what access you’re granting. 


6: Backups are Essential, But Not Everything 

Keep both cloud and physical backups. There are numerous backup solutions for cloud and on-prem infrastructure. Keep robust, regularly-scheduled backups of everything regardless of your sector or discipline. 

While backups prevent data loss, cybercriminals can still use the data stolen to hold businesses at random. Backups are certainly not a panacea for protecting against ransomware. 


7: Encrypt Data in Motion 

Data engineers and scientists must learn to encrypt data in motion as it flows through pipelines. 

It’s essential to accustom oneself to data security protocols such as SFTP. This is a vital skill in the data engineer’s roadmap.


Scaling Up Efforts For Medium-to-Large Businesses

Medium-to-large businesses and SMEs should take a multi-layered approach to prevent ransomware attacks and stop them from spreading across networks. Stopping network-wide infection and localising the attack is critical. 

The bigger the business, the more important user behaviour, platform management and IT hygiene are. Once SaaS tools proliferate, vulnerabilities multiply exponentially. It’s extremely important to build policies around cybersecurity and foster a culture of protection and responsibility. 

While cybersecurity may come more naturally to technologically adept teams, that doesn’t rule out the possibility of human error. So don’t lull yourself and your team into a false sense of security just because you’re technologically literate! 

Unleash Your Potential with AI-Powered Prompt Engineering!

Dive into our comprehensive Udemy course and learn to craft compelling, AI-optimized prompts. Boost your skills and open up new possibilities with ChatGPT and Prompt Engineering.

Embark on Your AI Journey Now!

Business cybersecurity solutions provide some of the following features for protecting against ransomware: 

  • User behaviour, permissions and rules: It’s essential to prevent access to sensitive data except for those who require it as sanctioned by someone qualified to make that decision. User behaviour analytics is important for preventing insider threats. 
  • Pre-download controls: Prevent malware and ransomware from reaching the endpoint. Pre-download controls should be made available across the entire business network. 
  • Pre-execution controls: Platforms like Cynet enable ML-based analysis to find ransomware patterns in binary files before the user executes them. 
  • Runtime protection: Kills processes that are detected as ransomware, preventing the encryption of files and installation of supportive files. 
  • Sandboxing: Runs files in sandboxes to block and contain ransomware behaviour. 
  • Decoying: A special technique for planting decoy files in the way of ransomware, which is then analysed and deleted to prevent the spread. 
  • Propagation: Detects network activity that indicates the spread of ransomware and blocks hosts from the network to contain the spread. 
  • Threat intelligence: Cybersecurity firms employ threat intelligence to constantly update their systems. 

Businesses should also regularly audit their cybersecurity tools, user permissions, etc. Building a crisis plan is also essential, especially when personally identifiable information (PII) is at stake. Certain businesses have a legal responsibility to inform law enforcement of breaches too, and external investigations are possible with fines imposed on those who don’t comply with expected security standards. 


Enterprise-Wide Cybersecurity 

Cybersecurity for larger businesses becomes considerably tougher when protecting multiple systems across a huge range of user permission tiers spread over different jurisdictions and geographical areas. In addition, business initiatives such as work-from-home, remote hiring and data democratisation further complicate cybersecurity. 

The risks imposed by enterprise-level cloud architecture, hybrid architecture and SaaS tool usage haven’t gone unnoticed by cybersecurity providers. Companies like Rubrik provided advanced tools for network-wide monitoring, threat detection, user management, backups, etc. 

Rubrik provides ransom-proof backups and instant ransomware recovery. In addition, multi-factor user authentication, zero-trust cluster design, and retention lock support help secure architecture, whereas network-wide anomaly detection helps identify ransomware propagation and lockout hosts. In the event of an attack, Rubrik also provides full audits and impact reports. 

The zero trust cybersecurity model is particularly important for enterprise ransomware protection, which involves not trusting anyone by default. 

By monitoring and verifying all activity, enterprises can reduce the risk of ransomware emanating from inside or outside of the business. 

Rubrik also integrates with popular data science and engineering tools, such as:

  • Cassandra
  • Microsoft SQL Server
  • MongoDB
  • MongoDB
  • NAS
  • Oracle
  • SAP HANA
  • VM Backup

And cloud services, such as:

While immutable backups are excellent for avoiding data loss, the possibility of ransom persists when leaked data is sensitive. Relying on any single tool or platform is insufficient. Instead, enterprises need to assess their entire holistic security infrastructure and architecture while continually assessing user permissions and behaviours. 


Summary: Protecting Data Science Projects From Ransomware

Data science is vulnerable to ransomware as data is the main object of a ransomware attack. 

Data is extremely valuable or even priceless in some situations. Moreover, breaches involving personally identifiable information (PII) can even expose businesses to civil and criminal proceedings. 

Protecting projects and data from ransomware is essential and should always be at the forefront of decisions involving sensitive data. 

At large business and enterprise-level, ransomware protection is critical and often involves employing security clouds and other network-wide cybersecurity products.

What is ransomware?

Ransomware is a form of malware that encrypts a target’s files and threatens to delete or publish them unless a ransom is paid. Ransomware often enters a computer as a trojan horse in the form of an infected file, but can also be downloaded from the internet.

How to stop ransomware?

Firstly, practice good email security and never download unsolicited or suspicious attachments. Use a good antivirus software and practice safe browsing habits. Keep passwords safe. In the event of an attack, never disclose personal information.

How can businesses prevent ransomware?

Education is important, as many ransomware attacks result from poor IT hygiene. Ensure user access to sensitive data is appropriately allocated too. A solid IT policy and secure architecture is essential.


More Stories

Cover Image for Why I’m Betting on AI Agents as the Future of Work

Why I’m Betting on AI Agents as the Future of Work

I’ve been spending a lot of time with Devin lately, and I’ve got to tell you – we’re thinking about AI agents all wrong. You and I are standing at the edge of a fundamental shift in how we work with AI. These aren’t just tools anymore; they’re becoming more like background workers in our digital lives. Let me share what I’ve…

James Phoenix
James Phoenix
Cover Image for Supercharging Devin + Supabase: Fixing Docker Performance on EC2 with overlay2

Supercharging Devin + Supabase: Fixing Docker Performance on EC2 with overlay2

The Problem While setting up Devin (a coding assistant) with Supabase CLI on an EC2 instance, I encountered significant performance issues. After investigation, I discovered that Docker was using the VFS storage driver, which is known for being significantly slower than other storage drivers like overlay2. The root cause was interesting: the EC2 instance was already using overlayfs for its root filesystem,…

James Phoenix
James Phoenix