Advice and Resources for New Machine Learning Researchers

Below is a collection of methods and tools to make your life as a machine learning scientist easier and more productive. They have been gathered from many sources collected over time (blog posts, articles, personal experience, discussions with colleagues, etc.).

🧠 Mindset

  • Always be open to new knowledge (see P. Kidger’s detailed blog post).
  • Make contacts and don’t be shy to interact with anyone.
  • Critical thinking and humility.



🔬 Research

Approach

  • During investigations, reduce as much uncertainty as possible to extract clear signals.

  • Read papers entirely this makes you feel you are doing proper research, meticulous and thorough. It also helps you to better understand the research methodology (experiments, metrics, etc.). Moreover, choose landmark papers in your field to avoid loosing yourself in the vast amount of literature.

  • Read paper reviews. This is useful to understand what the community expects from a good paper. This highlights weaknesses of contributions that can be addressed in future/new works. OpenReview is a great platform to access reviews of papers from major ML conferences.

  • Pay attention to HARKing (Hypothesizing After the Results are Known). This is a common pitfall in research that can lead to misleading conclusions. (It happened to me!) A nice paper on the topic in Deep Learning.

  • Know reviewers instructions and criteria for conferences/journals you are submitting to. This helps you understand what is expected from your work. ICML reviewer guidelines here.

  • Review papers for journals or conferences

  • Try to keep a research journal to document your ideas, experiments, and findings. Be verbose! This provides context for humans and AIs. For now, the main vector of information with AI assistants is natural language (text).

  • Follow the Observation - Question/Problem - Hypothesis - Method (to test the Hyp.) - Experiment - Analysis - Conclusion framework for structuring your research process. This discipline helps in maintaining clarity and rigor throughout your research journey. Prevents harking and other biases. Here is a short example of this approach for the Policy Gradient paper. Another modern example extracted from this paper on flow matching (state-of-the-art Gen AI) where the contribution lies in the rejection of a hypothesis.

Bibliography

  • Store your bibliography files and construct your OWN library.
  • Use Zotero or similar tools (Google Drive) to construct this library.
  • I personally use Google Drive synchronised on PDF Expert which allows to annotate documents with automatic synchronisation. The latter tool greatly simplifies scanning and attaching handwritten notes to articles or books.

Writing

  • Microsoft Research (S. P. Jones) presentation on how to write a great paper.
  • Ask your local academic writing center for help.
  • Write before using AI tools (to maintain your writing skills). Grammarly Authorship detects text generated by AI or taken from other sources.
  • Linking words sheet

Software

  • Latex
  • Typst (promising tool)

Talks, presentations and posters

  • One minute by slide rule. Target a duration of 90% of the total time for your talk. This leaves time for unexpected events.
  • Don’t try to put the same level of details in a talk as in a paper. Try to be clear and transmit the essential ideas.
  • If you are an intern, consider yourself as a researcher and play this role in your presentations.
  • Mention your sponsors and collaborators.
  • Add a QR code to your slides/poster to link to your paper or website.

Software for presentations

  • Beamer
  • Keynote (Apple)
  • Google Slides
  • Microsoft PowerPoint

Software for posters

Communicating your research and building your profile

  • Social Media & Blogs
  • Conferences & Workshops
  • Create a personal website

Online social media about AI/ML

Your presence on social media can help you stay updated with the latest research, demonstrate your expertise and interests, connect with other researchers, and share your own work. Here are some popular platforms:

Blog articles

Personal blogs and articles are always a great source of inspiration and knowledge. Some notable mentions are:



🧑‍💻 Programming practices and software tools

Read the official tutorial for any tools (from Vim, VSCode, Git to Pytorch or Docker). Those presentations are often advanced resources which contain important terminology.

Python

Following those principles eliminate lots of bugs and thus reduces uncertainties regarding your results.

Reproducibility

  • Fix the seed !
  • Use configuration files (YAML, JSON, TOML, etc.). Possibly use schemas to validate them.
  • Add timestamps everywhere !
  • Experiment trackers MLFlow, Hydra, Weights & Biases
  • Create a repository containing all your meeting reports and others documents you produce.
  • Monitoring is key: visualise your results as much as possible.

Software Engineering



🧑‍🏫 Teaching

  • Prepare your lectures in advance.
  • Follow training on how to teach effectively.
  • Use your website to share materials.