Data Engineer Intern 爬虫工程师实习生

Job description

About us

Dashmote is an AI technology scale-up headquartered in Amsterdam, the Netherlands. We connect the offline and online worlds by decoding the digital footprint of locations, allowing companies including in the F&B industry to understand the on-trade market and make smarter and data-driven decisions.

Today, our company has offices in Amsterdam, Shanghai, Vienna, and New York. Over the past few years, our teams have solved a wide variety of cases, such as analyzing beer drinking and hairstyle trends by utilizing our Visual Recognition Tools, as well as identifying prospective leads by generating intelligence dashboards.

岗位职责

1. 收集和处理原始数据(包括编写脚本,Web抓取, API抓取, SQL查询);

2. 将非结构化数据处理为适合分析的形式;

3. 监控性能并建议任何必要的基础架构更改;

4. 测试数据,以保证机器学习模型中使用的数据转换和数据验证的准确性;

5. 为数据挖掘,数据建模和数据生产开发一套流程;

6. 与队友紧密合作,确保一致性并最大程度地利用数据。

Role Description

1. Gather and process raw data at scale (including writing scripts, web scraping, calling APIs, writing SQL queries, etc.).

2. Process unstructured data into a form suitable for analysis.

3. Monitor performance and advise any necessary infrastructure changes.

4. Testing and validation in order to support the accuracy of data transformations and data verification used in machine learning models

5. Develop set processes for data mining.

6. Collaborate closely with teammates to ensure consistency and maximize the use of data

Besides all these, we offer the opportunity to develop your area of interest and we’ll assign work to support your personal development within that area.

Job requirements

岗位要求

1. 计算机科学,工程或相关学科本科及以上学历;

2. 熟悉Python;具有SQL和NoSQL技术的经验优先;

3. 熟悉网络抓取原理、HTTP协议,了解常见的反爬虫原理;

4. 熟练使用requests,BS4, xpath, regex等工具进行数据抓取;

5. 熟悉数据清洗,能够利用Pandas进行数据处理;

6. 了解版本控制工具(例如git), 数据库管理系统(Mysql, Postgre);

7. 能够使用Scrapy开发网络爬虫优先;

8. 了解软件/ ETL开发环境中的工作内容优先;

9. 英文熟练者优先;

10. 愿意学习和在短时间内掌握新的技能和方法;擅长团队合作。

Job Requirements

1. Bachelor's Degree or above in Computer Science, Engineering or a related field;

2. Familiar with Python; Experience with SQL and NoSQL is preferred;

3. Familiar with web crawling, HTTP, and commonly used anti-crawling methods;

4. Familiar with data scraping with requests, BS4, xpath, and regex;

5. Familiar with data cleaning; Experience with data processing with Pandas;

6. Solid understanding of version control tools (e.g. git) and database management systems (e.g. Mysql, Postgre);

7. Experience with building web spiders with Scrapy is a plus;

8. Familiarity with the development environment of software and ETL is a plus;

9. Proficient English user is highly preferred

10. Fast learner and open-minded to new technologies and methodologies and team-oriented.

What’s in it for you

  • An internship salary of 180-250 yuan per day (depending on your background, skill set, and interview performance);

  • Promised full-time transition if the whole team can recognize your ability after your internship;

  • Office located at the center of Shanghai (near Jiang’an Temple, convenient transportation);

  • Growing company full of opportunities & awarded by Google, McKinsey, and Rocket Internet for best B2B startup in Europe;

  • Working within an international team that genuinely values your contribution;

  • An awesome culture of responsibility and the freedom to turn your ambition into reality - regardless of your role and level;

  • You'll have the freedom to really own your role and implement your own ideas;

  • Flexibility to work from home and work on-site;

  • Exciting work atmosphere with no shortage of fun team events, gatherings, and snacks at the office.

If this sounds like a match, we would love to hear from you!

我们的优势

  • 每日实习工资180-250元人民币;

  • 如果实习期表现出色,你将有转正机会;

  • 工作地址在上海市区,静安寺附近,交通生活便利;

  • 我们是被Google, 麦肯锡和Rocket Internet认证的欧洲最佳B2B创业公司;

  • 你的能力及贡献将会收获团队小伙伴们的真诚认可;

  • 我们会为你提供一个充分发挥才华,施展抱负的舞台;

  • 你的工作你做主,你的任何想法都会得到尊重;

  • 灵活弹性的办公制度,根据实际情况可适度远程办公;

  • 轻松的工作氛围和丰富的团队活动

如果你觉得这个岗位适合你的话,简历赶快投过来吧!